From structured dataset to data article

Leveraging our experience and links with the communities, we are now designing an open-source web-based tool – part of an ecosystem of existing annotation and authoring systems – to help researchers to use community standards to describe their (meta)data at the source, and capitalize on their effort to accelerate the creation of a data article.

Download our poster here.

Our motivation

Governments, funders and publishers expect greater Findability, Accessibility, Interoperability and Reusability of the (meta)data that supports research findings, according to the widely accepted FAIR Principles (doi:10.1038/sdata.2016.18), which we helped author. The use of community-developed standards for identification and description of the (meta)data, and the deposition in trusted repositories, underpin FAIRness and reproducible research.

Our foundational work

Since 2007, our group has helped many communities to tackle these requirements via our open-source ISA Tools (isa-tools.org ; isacommons.org), enabling standards-compliant description, deposition and publication- of a variety of experiment types. Since 2011 our group runs FAIRsharing (fairsharing.org), guiding researchers, journals, publishers and other communities to discover, select and use repositories and community-developed standards with confidence.

Create or import

The user will be guided to provide (semi)structured descriptions of the experimental design, and of the post-processed data, to generate, respectively, the Methods and a set of statements to populate the Results section of a manuscript. Datascriptor will work: (i) as a stand-alone tool - for anyone to use - implementing generic metadata models, such as W3C Data Catalog (DCAT) vocabulary; and (ii) as a component of the ISA Tools and the InterMine data-warehouse - for their user communities - implementing the ISA metadata model.

Write and publish

To output short sentences from the (semi)structured input, we will evaluate a mixed data-to-text approach using template-based and neural-based (i.e. machine learning) methods. To further enrich the content of the manuscript, Datascriptor will connect to existing authoring systems, including Substance, Texture, Stenci.la and Manuscripts, and export the result in JATS format. Our plans also include an export as a DAR file and in LaTeX format.

User advisory board and collaborators

We collaborate formally with researchers, journal publishers, repositories and other service providers, via our ISA Tools and FAIRsharing resources. Specifically, the Datascriptor User Advisory Board includes a core group of existing collaborators: Thomas Lemberger (EMBO Press), Scott Edmunds (GigaScience), Holly Murray ( F1000), Varsha Khodiyar (Springer Nature). If you want to collaborate, please contact us.

From structured dataset to data article

Our motivation

Our foundational work

Create or import

Write and publish

Get in touch