Section outline
-
-
1. Definition
Metadata allows a more accurate description of the data. it is data about data.Source : Doranum - Parcours interactif sur la gestion des données de la recherche
If we imagine a dataset as a can, then the metadata are the label that describes the consents of the can (date of production, creator, etc.). -
Without metadata
Source : picture by Araceli Jáuregui from Pixabay
With metadata
Source :picture by heberhard from Pixabay
-
2. Metadata in the data life cycle
It is recommended to complete the metadata as the project progresses, with particular attention to:
- the step of data sharing,
- the step of persistent archiving (specific metadata will have to be added).
-
3. Embedded metadata vs enriched metadata
There are two types of metadata: embedded and enriched metadata.
Embedded Metadata
They are automatically produced by devices (cameras, sound recorders, measurement instruments...). This is typically the case for smartphone photos or videos. Examples of generated metadata: GPS data, device type, date, technical calibration, etc.Enriched Metadata
They are added by the author. Examples: keywords, subject, author, laboratory or organization, project name, license, etc.
Don't forget to complete the embedded metadata with enriched metadata. Ideally, this metadata should be filled in as you go along. It is recommended to use disciplinary controlled vocabularies (ontologies, lexicons, thesaurus...). This will increase the ability of the data to be combined with other data.
For example:
- Drugs Codex
- Taxonomic classifications
- IUPAC nomenclature of chemistry
To organize the metadata it is recommended to use a metadata standard specific to your discipline or adapted to your needs. If none exists, a metadata schema will need to be created.
-
4. Difference between metadata schema and metadata standard
- Metadata schema: it is the organization of metadata according to a model designed and created specifically for the needs of a project. This structuring is therefore unique and personalized!
- Metadata standard: a standard is a schema that has been adopted as a model by a set of users: it is recognized, standardized and widely used.
To find standards used in your discipline, you can interview your researcher collaborators or computer scientists or data librarians and see what are the practices in your field.
Several directories and sites can be consulted:
- Digital Curation Center (DCC) – Disciplinary Metadata : https://www.dcc.ac.uk/guidance/standards/metadata
- Research Data Alliance (RDA) - Metadata Standards Directory : https://rd-alliance.github.io/metadata-directory/standards/
Don't forget to consult the information on metadata standards provided by data repositories.
-
5. Examples of metadata standards
Standard linked to the attribution of persistent DOI identifiers. https://schema.datacite.org/
Disciplinary standard for the domain of social, behavioral, and economic sciences.Metadata standard for structural sciences domains (chemistry, materials science, earth sciences, biochemistry). http://icatproject-contrib.github.io/CSMD/
Disciplinary standard in the biodiversity domain. http://rs.tdwg.org/dwc/
Disciplinary standard in the ecology domain: it was largely designed to describe digital resources. It can also be used to describe non-digital resources such as paper maps or other media. https://eml.ecoinformatics.org/
Disciplinary standard in the architecture domain. https://historicengland.org.uk/images-books/publications/midas-heritage/
International standard for describing geographic information and services. https://www.iso.org/standard/53798.html
-
6. Focus on the international Dublin Core standard
The Dublin Core is a widely used international and multidisciplinary standard. Moreover, it is often the base of disciplinary or specific data standards. It contains 15 elements that constitute the minimum required:
- elements related to content
- elements related to intellectual property.
Dublin Core Metadata Element Set
"The original DCMES Version 1.1 consists of 15 metadata elements, defined this way in the original specification:
- Contributor – An entity responsible for making contributions to the resource
- Coverage – The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant
- Creator – An entity primarily responsible for making the resource
- Date – A point or period of time associated with an event in the lifecycle of the resource
- Description – An account of the resource
- Format – The file format, physical medium, or dimensions of the resource
- Identifier – An unambiguous reference to the resource within a given context
- Language – A language of the resource
- Publisher – An entity responsible for making the resource available
- Relation – A related resource
- Rights – Information about rights held in and over the resource
- Source – A related resource from which the described resource is derived
- Subject – The topic of the resource
- Title – A name given to the resource
- Type – The nature or genre of the resource.”
Source: https://en.wikipedia.org/wiki/Dublin_Core
"The fifteen basic elements are considered as a common denominator and in most cases are not sufficiently precise. The basic elements have been extended (or specified) by a set of other terms called "qualifiers".
Two classes of qualifiers are recognized:
- element refinements that explain the meaning of an element;
- encoding schemes or controlled vocabularies."
Source: Extract translated from " Présentation des standards: le Dublin Core" by Elizabeth CHERHAL (Cellule MathDoc, UMS5638, CNRS/Université Joseph Fourier, Grenoble) - https://www.enssib.fr/bibliotheque-numerique/documents/1236-presentation-des-standards-le-dublin-core-dc.pdf
-
7. Focus on DataCite metadata schema
Content from DataCite Metadata Working Group. (2019). DataCite Metadata Schema Documentation for the Publication and Citation of Research Data. Version 4.3. DataCite e.V. https://doi.org/10.14454/7xq3-zf69
The Metadata Schema
DataCite’s Metadata Schema has been expanded with each new version. It is, nevertheless, intended to be generic to the broadest range of research datasets, rather than customized to the needs of any particular discipline.
DataCite Metadata Properties
There are three different levels of obligation for the metadata properties:- Mandatory (M) properties must be provided,
- Recommended (R ) properties are optional, but strongly recommended for interoperability and
- Optional (O) properties are optional and provide richer description.
Researchers who wish to enhance the prospects that their metadata will be found, cited and linked to original research are strongly encouraged to submit the Recommended as well as Mandatory set of properties. The properties listed in Table 1 have the obligation level Mandatory, and must be supplied when submitting DataCite metadata.Table 1: DataCite Mandatory Properties
ID Property Obligation 1 Identifier (with mandatory type sub-property) M 2 Creator (with optional given name, family name, name identifier and affiliation sub-properties) M 3 Title (with optional type sub-properties) M 4 Publisher M 5 Publication year M 10 Ressource type (with mandatory general type description sub-property) M
The properties listed in Table 2 have one of the obligation levels Recommended or Optional, and may be supplied when submitting DataCite metadata.
Table 2: DataCite Recommended and Optional Properties
ID Property Obligation 6 Subject (with scheme sub-property) R 7 Contributor (with optional given name, family name, name identifier and affiliation sub-properties) R 8 Date (with type sub-property) R 9 Language O 11 AlternateIdentifier (with type sub-property) O 12 RelatedIdentifier (with type and relation type sub-properties) R 13 Size O 14 Format O 15 Version O 16 Rights O 17 Description (with type sub-property) R 18 GeoLocation (with point, box and polygon sub-properties) R 19 FundingReference (with name, identifier, and award related sub-properties) O DataCite Properties
Table 3 provides a detailed description of the mandatory properties, which must be supplied with any initial metadata submission to DataCite, together with their sub-properties. [...] The third column, Occurrence (Occ), indicates cardinality/quantity constraints for the properties as follows:
- 0-n = optional and repeatable
- 0-1 = optional, but not repeatable
- 1-n = required and repeatable
- 1 = required, but not repeatable
Table 3
-
8. Focus on an example of a metadata model in the environmental domain
"Here is a model you can use by choosing the metadata fields suitable to your context, the repository where you upload your data, and which will convey minimum and sufficient information to help others understand and reproduce your data." * The required fields about protocols can be repeated if several protocols have been implemented consecutively (e.g. sampling, sample preparation, measurements, data processing, etc.)
Source: Extract of "Guide of Good practices - Research data management and promotion" by ARNOULD Pierre-Yves (OTELo), JACQUEMOT-PERBAL Marie-Christine (Inist-CNRS) -
9. In brief
Metadata are useful for:
- Understand the origin of the data
- Understand the context of the creation or collect of data
- Improving harvesting by machines (search engine)
- Ensuring interoperability
- Know the conditions for reusing and sharing data
- Access useful information when data are not shared or destructed.
To conclude, see the summary sheet on metadata proposed by DataOne:
Source: https://dataoneorg.github.io/Education//lessons/07_metadata/L07_DefiningMetadata_Handout.pdf
-