Data and knowledge management in an automated enzyme characterisation workflow: an industry perspective

Advancements in biological system engineering have introduced new data challenges in biotechnology. These challenges are even more pronounced when high-throughput robotic systems are used in screening and manufacturing. While academic projects and use cases have enabled progress in data management, industrial systems face distinct challenges related to data sharing, security, and process verification. This project aims to investigate industrial data sharing challenges.

 As part of the Microbial Biotechnology Community, a recent joint implementation study between Newcastle and Manchester ELIXIR Nodes explored the data management requirements of biotechnology. This effort resulted in published guidelines in collaboration with the RDMkit team ( The study was extended to consider an automated hybrid in-vitro/in-silico workflow designed to characterise recombinant enzymes and their protein families, with a particular focus on understanding their data characteristics in an academic setting (see figure below).

Workflow to characterise recombinant enzymes and their protein families

Prozomix Ltd., an SME specializing in enzyme-based biocatalysts, is using high-throughput systems to identify novel members of enzyme protein families, within the company and in collaboration with academic partners. This project proposes a technology transfer process from academia to industry, involving the adaptation and deployment of the automated pipeline at Prozomix.

A data requirements analysis will address the specific data challenges faced by the industry, identifying the points at which data could be shared without undermining the company’s intellectual property. The ultimate goal is to develop generic guidelines for industrial partners to make their data FAIR (Findable, Accessible, Interoperable, and Reusable) without compromising intellectual property and data security.

The project's anticipated outcomes include fostering collaboration between the ELIXIR communities and industry, encouraging academic members to explore more commercially focused use cases, and industrial members to think about FAIR data. Additionally, the project aims to highlight the differences in FAIR data needs between industry and academia and create a generic framework to facilitate data sharing in industrial biotechnology.


1 March 2023 to 1 March 2024

Nodes involved: