Speaker
Description
FAIR scientific data provides benefits for data consumers and producers. Consumers gain access to valuable assets which they could not produce themselves, and they are enabled to integrate foreign data with minimal effort into their workflows. Producers increase the visibility and impact of their research, with data becoming recognized as a scientific contribution on par with written publications.
FAIR Digital Objects (FDOs), which are built upon the FAIR principles, promise additional advantages including enhanced machine-actionability, standardized interfaces for automated processing, and improved provenance tracking across distributed data networks.
However, significant challenges remain. Data producers struggle to comply with the FAIR principles and FDO specifications, in part due to the lack of standardized workflows and best practise examples, and to the substantial expertise required for effective implementation.
This talk presents the transformation of a monolithic data corpus into a network of FDOs. The transformation addresses two main objectives. First, the corpus is divided into data elements that are addressable at a granular scale. This allows precise linking of propositions to specific observations, accurate error reporting and versioning, and flexible recombination of dataset components. Second, controlled semantics are established across all levels of the data model. The goal is to eliminate any ambiguity for data consumers in order to increase reuse efficiency and minize the risk of misinterpretation.
The data was originally acquired within the RoBivaL project which investigated different mobile robot designs in an agricultural setting. Data collection included high-resolution sensor measurements from several modalities, field logbooks containing structured experiment documentation, and specifications providing metadata and context about data structures and used equipment.
The corpus was first made available on Zenodo in an effort to comply with the FAIR principles. While this initial version featured a clear layout and open formats in order to facilitate reuse, it was provided as a monolith which limited its interoperability. Further, though rich semantics were explicitly documented in the initial version, they were not codified in a standardized fashion.
The transformed version implements the corpus as a network of FDOs, using semantic web technologies for data modeling, and Nanopublications for distribution. The data model has three layers:
-
The Experimental Research Ontology (ERO) forms the foundational layer. It models fundamental aspects of experimental data creation in general and is aligned with several upper ontologies.
-
The RoBivaL Specification layer uses ERO to specify RoBivaL's project methodology, define the structure and semantics of the payload data, and provide information about the used equipment.
-
The RoBivaL Payload Data layer uses the Specification layer to capture the values of experiment parameters and sensor measurements.
The transformation was conducted as a use case of the project FDO Connect which develops tools and methodologies to bridge traditional data management practices with emerging FDO ecosystem requirements in order to facilitate the broader adoption of FAIR principles in research communities.
The dataset transformation showcases modular multi-layered data modeling, a practical implementation of the FDO specifications, and a best practice for FAIR-compliant usage of semantic web technologies for distributed scientific data networks.