Health data anonymization, synthetic data and pseudonymization service technology

Life science and healthcare stakeholders need to use sensitive data in several ways. Sensitive data needs to be protected against unauthorized access. Protection of data may be required for legal or ethical reasons, for issues pertaining to personal privacy, or for proprietary considerations. Especially important but complicated this is when sensitive data comes from several sources and countries.

CSC (ELIXIR Finland) has recently launched open beta sensitive data (SD) services to support secure data management through web-user interfaces accessible from the user's own computer. The services include Sensitive Data Connect (SD Connect) and Sensitive Data Desktop (SD Desktop).

VEIL.AI, a University of Helsinki spinout, specializes in health data privacy protection powered by AI-driven technologies. These new technologies enable high-quality row-level anonymized data production with secure and easier data interoperability and federation for better research, education and innovation opportunities in life sciences. The data privacy protection technologies include:

  1. Pseudonymization and consent management: VEIL.AI ensures compliance with GDPR through data pseudonymization and consent management. This enables safe distributed identifier management in research collaborations and biobanking.
  2. Privacy protection through anonymization: The VEIL.AI Anonymization Engine provides automated tools for producing high quality row-level anonymized data for research collaborations. Other unique characteristics of VEIL.AI Anonymization Engine include continuous data collection, and multi-party data collaboration with automated quality optimization and high performance.
  3. Data Synthetization: In addition to anonymization, VEIL.AI de-identification technologies also support data synthetization for e.g. stress testing and assessing risk measures for healthcare technology developments and innovations.

Purpose of this project is to validate use of VEIL.AI technologies as part of the CSC SD solutions delivered for researchers, organizations and educational purposes, and negotiate for a licencing model for technology that would allow long-term sustainability of the technology collaboration and exchange.

Individual technology interests for knowledge exchange include:

  • Deployment of VEIL.AI service stack and APIs in ELIXIR-FI, CSC SD environment
  • Production of de-identified data (pseudonymized, anonymized, or synth data) using VEIL.AI APIs
  • Pooling de-identified data (data lake; delta-lake?)
  • Sharing of de-identified data (delta sharing)
  • Access control to the de-identification resources (API) and results (data)

Licencing of technology from an SME to support service infrastructure building of an ELIXIR node is a broad target. In this scheme CSC and VEIL.AI will produce a focused example of a collaboration scheme. We will first identify technology components, and then discuss what kind of agreement would be necessary, with the intention for a sustainable, long-term collaboration. Outcome will be know-how on how to build and formalise public infrastructure partnering with a (ICT) technology start-up. Optimally, lessons learned could be applicable for other similar arrangements.

Nodes involved
People involved