Section outline

    • 1. Definition


      There are many definitions of research data. The OECD (Organisation for Economic Co-operation and Development) definition is the most commonly used:

      "Research data" are defined as factual records (numerical scores, textual records, images and sound) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings
      Source : OECD, OECD Principles and Guidelines for Access to Research Data from Public Funding, Paris, 2007.

      As example, the University of Leeds describes research data as:

      Any information that has been collected, observed, generated or created to validate original research findings. Altough usually digital, research data also includes non-digital formats such as laboratory notebooks and diaries."
      Source : University of Leeds Library.

      In the context of open science, these definitions can be complemented by broadening the scope of research data produced by researchers, which can also allow other researchers to conduct new research projects.

    • 2. Diversity of research data

      Depending on the project, the research data may be:
      • produced or collected: these are the data created, elaborated, generated during research activities (observations, measurements, etc.)
      • pre-existing: these are already existing data (corpus, archives...) which are used for the project. The data used may initially have been collected in a context other than the research, but they are used as research data within the framework of the project.
      These data can be qualitative (interview data, observation data, open-field questionnaire etc.) or quantitative (measurement table, scored evaluation questionnaire, thermometer etc.). Depending on the context in which they were created (capture or production), their exploitation, analysis and processing, research data may be of different kinds, contained in various media and of all types.

      There are several descriptive classifications. One of these is:
      • the source of the data 
      • the form of the data 

    • What is the source of the data?


      Observation data

      Observation data are captured in real time. They are captured by observing a behaviour or activity and are therefore most often unique and impossible to reproduce. This is the case with sensor data, neuroimaging, astronomical photography or survey data.


      Experimental data

      Experimental data are obtained from laboratory equipment. They are often reproducible but this can be costly. Chromatographs and DNA chips fall into this category.


      Computational or simulation data

      Computational or simulation data are generated by computer or simulation models. They generate more important metadata. They are often reproducible provided that the model is properly documented. For simulations data, the test model wich is used is often as important than the data generated from the simulation and sometimes even more so. Examples include meteorological models, seismic simulation models and economic models.


      Derived or compiled data

      Derived or compiled data are derived from the processing or combination of raw data. They are often reproducible but expensive. This is the case for data obtained by text mining, 3D models or compiled databases.


      Reference data

      Collection or accumulation of small datasets that have been peer reviewed, annotated and made available.

    • What form does this data take?

       Textual data : Field or laboratory notes, survey responses... 

       Digital data : Tables, measures...

       Audiovisual data : Images, sounds, videos…

       Computer codes

       Discipline-specific dataFor example FITS in spatial data or CIF in crystallography...

       Specific data produced by some instruments


    • 3. Why manage and share your data

      • Quantity: a good management is necessary because of big data and especially to avoid data loss.
      • Quality: sharing data requires good data management practices, which improves the quality of research work.
      • Validation of research results: sharing data contribute to validate research results. More and more publishers ask researchers to make available all underlying data mentioned in the submitted article.
      • Integrity: making data available ensures a better security against scientific fraud.
      • Valorisation: data sharing allows the researcher to enhance the value of his data and increase its visibility (citation).
      • Funding: data sharing (based on the principle of "as open as possible, as closed as necessary") may be a condition for project funding.
      • Reproducibility and reuse: the cost of creating, collecting and processing data can be very high. Reusing existing data rather than recreating them reduces time and cost of research.
      • Interdisciplinarity: databases allow better search, extraction, cross-references and visualization of data, particularly from different disciplines.
      • Exhumation of "fossilized" data: publications provide access to about 10% of the data. The 90% remaining stays on computer hard drives and are not used. They are called "fossilized data". Proper management and sharing of this data would prevent the loss of unique data.
      • Patrimonial value: Some research data can have a scientific patrimonial value. It is particularly important to organize a good management and sharing of these data.
    • 4. Research Data Life Cycle

      The data life cycle is the set of steps involved in the management, preservation and dissemination of the research data, associated with research activities. This cycle guides researchers through the research data management process to enable them and their stakeholders to make the most of the research data generated.
      Source : DoRANum

      It can be divided into six different phases: Planning, Collecting, Analysing, Publishing, Preserving, and Reusing.

      Cycle de gestion des données : planification, collecte, traitement, préservation, réutilisation, publication. Favorise access


      Source : Adaptation of Research data lifecycle – UK Data Service