Scalable Curation

In the future, the research literature will be increasingly open access, with new communication mechanisms such as preprints requiring versions management and new peer review mechanisms. Managing full text article corpora for text mining will be much more challenging than managing just abstracts, and it is unlikely that each and every text mining group will want to invest the necessary time and effort when there are public resources already available. Bringing the compute to the data is commonplace in most informatics workflows, and there is no reason why text mining operations will be different in the long term.

The process of curation, performed by expert biologists, is the life-blood of knowledgebases. Curators need to identify key papers, read the full text of the articles to weigh up the evidence, then extract the most pertinent information. A growing corpus of open access full text articles provides new opportunities to enhance article triage and browsing systems. At the same time, many text mining workflows are mature enough to support curation activities.

This group of tasks aims to build community and infrastructure based on the open full-text research literature. By providing a platform for doing text mining and sharing the outputs, developing standards, and then combining the semantic enrichment with rich article metadata and software tools, we expect to provide scalable support for curation across multiple knowledgebases.

Aim: Maximise support for human curation.

This group will develop the infrastructure around full text article resources to support curator workflows. This will be done by semantically enriching research articles and exploring the development of article triage systems as infrastructure. For example, daily text mining of biological concepts from full text research articles and sharing the annotations for use in search, triage, and crosslinking. The opportunities and role for community curation will also be explored.