Empowering users: Orchestrating Sensitive Data access for Interactive Federated Analysis in Virtual Research Environments

Theme: Federated Data Analysis

Through the 1+Million Genomes (1+MG) initiative, Europe is scaling up efforts to build a shared framework and infrastructure to safely access and integrate clinical human data across borders, following regulatory efforts like the General Data Protection Regulation (GDPR) and the European Health Data Space (EHDS). These are pivotal in safeguarding sensitive information, while enabling authorised access for researchers, healthcare professionals and other actors. 

Integral to biomedical data security considerations are the European Genome-Phenome Archive (EGA), in both Central and Federated forms, recognised as the predominant European repository for the secure storage of pheno-clinical and genomics data. Mobilising data for secure analysis in Virtual Research Environments (VREs) remains challenging. Indeed, it is an active focus in ongoing projects like the European Genomic Data Infrastructure (GDI), EOSC-ENTRUST and EOSC4Cancer. 

Galaxy is a popular open-source, community-driven VRE for bioinformatics analysis that represents a unique platform for developing and testing novel strategies for data analysis. A prototyping strategy for the access and processing of sensitive data was demonstrated in a previous ELIXIR implementation study (2021–2023). By adopting GA4GH Crypt4GH encryption standard features, we enabled Galaxy users within Trusted Research Environments (TREs) to decrypt sensitive data for workflow execution without sharing private encryption keys. 

We propose expanding this prototype into a comprehensive solution for secure data analysis in Galaxy, facilitating encrypted data access and transfer from FEGA/EGA repositories to designated TREs, all interactively orchestrated by the users on a public Galaxy server. The proposed solution offers flexibility with different levels of enforced restrictions ranging from scenarios with no limitations on encrypted data transfer and storage, to fully federated analysis scenarios, where analysis occurs near the data. Most of the required infrastructure can also be deployed independent of Galaxy, simplifying the potential implementation of these concepts in other VREs.

Platform/Community
People involved
María Chavero Díez