Training topics explained

The topic codes (G1-G7 and R1-R17) refer to Table 1 of ENVRI-FAIR deliverable report D6.1 (see end of page for more info). To find Learning Resources that have been tagged with any of these, select the appropriate TrainingTopic filter on the search form.

Note: these topic codes were defined during the design of the training catalogue after consultation with the ENVRI-FAIR project partners. In addition, regular maintenance and upkeep of the catalogue contents may result in the removal of outdated individual Learning Resources. This means that there is no guarantee that all of the topic codes listed below are applicable at a given point in time - so if your search turns up no hits, try another combination of topics!

General FAIR-related training topics

Introduction to FAIR principles (G1)

Covers basic explanations of the FAIR (Findable, Accessible, Interoperable, Reusable) guiding principles, and provides best practices that facilitate data producers, data repositories and data end users to make sure their data & metadata achieve a high degree of FAIRness.

Metrics for FAIRness evaluation (G2)

There exist many approaches to estimate the degree of compliance of a given (data) object with the FAIR principles, some based on subjective evaluations by human experts and others relying on machine-actionable “automatic” tests.

Performing a FAIRness self-assessment (G3)

Once one or several metrics for FAIR assessment have been chosen, a RI needs to decide how and when to collect information , with whom to share the results, etc.

GDPR (General Data Protection Regulation) issues related to data sharing (G4)

ENVRIs produce and manage many types of data and metadata, some of which contain personal information. The European GDPR covers what personal data are, what may be collected and for what purposes, where and for how long to store the collected information.

Basic Research Data Management (RDM) (G5)

Basic research data management covers introductions to the different “pillars” and “cross-cutting beams” defined by the ENVRI-Plus project:

Writing (technical) documentation for services (G6)

Clear and comprehensive documentation is absolutely essential for a service to be discoverable and successfully used.  Writing effective documentation starts with understanding the target audience(s) and addressing the aspects that are important to them in a relevant format - for example description fields in catalogues, promotional materials, usage instructions and tutorials, technical specifications and richly commented code.

Other general FAIR-related topics (G7)

A catch-all classification, covering any other FAIR topic that isn’t directly connected to concrete implementations of, or technologies for, data management components.

Research Data Management (RDM) training topics

Access control (Authorization-Authentication-Identification, or AAI) methods (R1)

Providers of services must be able to control who has access to them in general, and what functionalities or resources they can use. To this end, mechanisms need to be in place that allow users to state who they are (identification), if needed backed by some proof of their identity (authentication), and then grant access to relevant resources based on this authenticated identity (authorisation).

API (Application Program Interface) design for data & metadata access (R2)

Application programming interfaces (APIs) are software interfaces designed to facilitate computer-to-computer communications, for example between a repository server hosting data and related metadata, and a script running on the computer of an end user. In the ENVRI context, APIs offer an important means to enhance interoperability between services provided by different sub-domains.

Cataloguing - design & implementation (R3)

Many of the ENVRIs operate repositories of their data holdings - combining storage with a catalogue that contains all relevant metadata describing the digital objects. It is important to build the catalogues on relevant standardised metadata schemas and models, in order to achieve maximum findability and interoperability.

Certification schemes for repositories (CoreTrustSeal) (R4)

Repositories rely on their end users having trust in the management, storage and overall quality of their data and metadata holdings. There exist several expert bodies and initiatives that can evaluate research data repositories, including CoreTrustSeal.

Cloud computing (Virtual Machines & containers) for data processing (R5)

Cloud computing utilises networked resources (storage, computation power etc.) typically provided on-demand by e-infrastructures or other large service providers. Users do not have to install software or work on their local machines, but instead connect to cloud services that often are operated in virtual machine environments that can be scaled to meet demands.

Data Management Plans (R6)

Data management plans (DMPs) are formal documents describing how data and other digital assets are to be handled during and after research projects. The contents and level of detail of a DMP may vary between funders, research disciplines and regions, but there are templates and instructions available to guide researchers through the preparation and maintenance phases.        

Landing page design (R7)

In Research Data Management contexts, a “landing page” is a web page containing relevant metadata describing a digital object such as a dataset. In the case that the digital object has been assigned a persistent identifier (PID), resolving the PID will redirect the user to the landing page., rather than to the bitstream representing the object itself. Designing landing pages requires knowledge on both the metadata itself and the ways to encode this in a way that is machine-interpretable and -actionable.

Licenses & policies for data use (R8)

Any (digital) asset that is intended for reuse or sharing should be assigned an appropriate usage licence, for example a Creative Commons one. In addition, data producing organisations should set up and define clear policies that describe what it is doing to data throughout all parts of the research data lifecycle, as well as by end users.

Linked Data and ontologies (R9)

Linked data (LD) is a way of expressing information in a way that is directly interpretable and actionable by machine-based processes, for example via subject-predicate-object triples that can be serialised using Resource Description Framework (RDF) statements. Linked Data is one of the core pillars of the Semantic Web, which makes use of ontologies, vocabularies and thesauri.

Metadata standards & schemas (including geospatial, instruments, variables) (R10)

Being able to identify and implement relevant metadata standards and schemas both on sub-domain and higher levels is a crucial first step towards achieving interoperability of data products and services from the ENVRIs. Once these are in place, qualified references and cross-walks can be defined to link together terminologies from the different domains.

PID allocation & use (including citation support, bibliometry, provenance) (R11)

Globally unique and resolvable  persistent identifiers (PIDs) play central roles throughout the entire research data lifecycle, and are essential for FAIR research data management. They provide identity, indirection & machine actionability of digital objects, supporting search, referencing and accurate citation. PIDs can be assigned to any digital object, not just data.

Portal design & operation (R12)

Many of the ENVRIs operate their own portals where their data and related services are made available to human and machine-based users. To be optimally useful to the intended end user communities, the design of a portal’s functionalities and user interfaces should be based on elicitation of user requirements.

Provenance tracing (R13)

To enable trustability, reusability and reproducibility of research objects, it is necessary to collect and store detailed information on how they were produced, by whom and when. This lineage is referred to as provenance.

Repository design, operation & sustainability (R14)

A repository for digital objects is typically based on a storage system, a catalogue and some user interface.  When designing and implementing a repository at data centre level, the most sustainable and easy-to-operate options are building on existing commercial or open source solutions.

Virtual Research Environments for data analysis (design & implementation) (R15)

A Virtual Research Environment is a digital platform that offers end users tools and storage for the analysis of data. Importantly, VREs often provide functionality that supports collaboration between researchers, as well as procedures for proper management of results and outcomes.

Workflow engines for automated data processing (R16)

A workflow is defined as an orchestrated and repeatable pattern of activity. In the research data context, workflows are typically built up from software components that are controlled by scripts that can run analysis code, capture provenance information and produce new outputs.

Other RDM training topics (R17)

A catch-all classification, covering any other RDM topic that relates to concepts or technologies used to process or manage digital assets.

More information

Hellström, M., Johnsson, M., Konijn, J., Fiore, N., Quimbert, E., Boulanger, D. & Baker, G.R. (2019). ENVRI-FAIR D6.1 Inventory & gap analysis of FAIR training materials (Version 1). Zenodo.