Course: MOOC Basics of managing and sharing research data

Section outline

Select section Welcome !

Collapse Expand
Welcome !

Collapse all Expand all
The scientific world has embrace digital technology in its research, publication and communication practices. It is now technically possible to open up science to the greatest number of people, by providing open access to publications and - as far as possible - to research data.

This course introduces you to the challenges of Research Data Management and sharing (RDM) in the context of Open Science (OS).

It was created within the framework of the Erasmus+ Oberred project in 2019. Other courses from the Oberred project are available on this platform.

OBERRED project

This course was carried out in the context of the Oberred project, co-funded by the Erasmus+ Programme of the European Union.

Oberred is an acronym for Open Badge Ecosystem for the Recognition of Skills in Research Data Management and Sharing. The aim of the Oberred project is to create a practical guide that includes the technical specifics and issues of Open Badges, roles and skills related to RDM, and principles for the application of Open Badges to RDM.

Find out more about the Oberred project here: http://oberred.eu/

This course is open access!

No account creation or registration is required, however you will only be able to browse it in read-only mode.

To participate in the activities (exercises, forum...) and get the badge(s), you must register for the course.
Register for the course

For optimal use of this course, we recommend using the Google Chrome browser.
- Select activity COURSE STRUCTURE LESSON 1 : CONTEXT AND STAKES OF ...
  
  Course structure
  
  Lesson 1 : context and stakes of RDM
  
  This first lesson is an introduction to Research Data Management. It will enable you to grasp the context in which research data management takes place and give you an overall vision of the stakes involved in opening up and sharing such data.
  
  Data and society
  
  Data and science
  
  Science and society : Covid-19 example
  
  Open science and RDM
  
  Evaluation1
  
  Lesson 2 : concepts and processes of RDM
  
  The second lesson will enable you to better understand the different steps of research data management, and to know the practices to be implemented and the tools to be used.
  
  Understanding the data life cycle
  
  FAIR principles
  
  Data Management Plan (DMP)
  
  Legal and ethical aspects
  
  Metadata
  
  Persistents identifiers
  
  The 3 distincts stepsof data storage
  
  Reuse and valorisation of data
  
  Evaluation 2
  
  Learning objectives
  
  This course should provide you with a good understanding of the context in which research data management and sharing takes place:
  
  What are the issues and benefits of controlled data management?
  
  What concepts are related?
  
  How is data management organized and which actors are involved?
- Select activity Auteur(s) / Formateur(s): Viêt Jeannaud, Nicolas H...
  
  Auteur(s) / Formateur(s): Viêt Jeannaud, Nicolas Hochet, Yvette Lafosse, Pierrette Paillassard, Claire Sowinski, Coralie Wysoczynski, Marta Blaszczynska, Mateusz Franczak, Michel Roland, Tomasz Umerle, Beata Koper, Barbara Wachek, Lucas Ricroch
  Public cible: everyone
  Durée estimée: 1 week
  Prérequis: none
  Licence: CC BY-NC-SA
  Open badge: Yes
  Nombre d'inscrits: 7
  Catalogue: No
- Select activity Glossary
  
  Glossary
Select section 1.1 Data and society

Collapse Expand
1.1 Data and society
- Select activity The "datafication" of everyday life Infogram The w...
  
  The "datafication" of everyday life
  Infogram
  
  The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This in principle makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. Changes to how we live, work, socialise Advances in digital technology changed how we communicate (Facebook, Twitter, Instagram, WhatsApp, Skype, etc.), work (videoconference, mails, Google Drive, Microsoft Teams, etc.), eat (Uber eats, etc.), travel (Uber, Couchsurfing, Booking.com, Airbnb, etc.), entertain ourselves at home (Netflix, streaming books/podcasts, etc.).
- Select activity ------------------------- The impact of these tech...
  
  The impact of these technologies on our lives is therefore vast and seems rather well accepted according to the 2017 Eurobarometer. “75% of respondents think the most recent digital technologies have a positive impact on the economy, 67% - on their quality of life, 64% - on the society. 76% who use Internet every day say the impact of these technologies on their quality of life has been positive, compared to 38% who never use the Internet.”
  
  Source: European Commission, Attitudes towards the impact of digitisation and automation on daily life
- Select activity The European Commission is taking this digital con...
  
  The European Commission is taking this digital context into account in its plans, as is the data strategy :
  
  The European data strategy aims to make the EU a leader in a data-driven society. Creating a single market for data will allow it to flow freely within the EU and across sectors for the benefit of businesses, researchers and public administrations.
  
  So data is everywhere, so much so that we talk about Big Data. But what do we really mean by Big Data?
- Select activity BIG DATA It is very difficult to have a common def...
  
  Big data
  
  It is very difficult to have a common definition of Big Data. Big Data identifies a very unclear group of digital data stored for commercial, administrative and scientific objectives.
  Big Data consists of 3 main characteristics, called the 3Vs: Volume - Velocity - Variety
  Big Data means different things to different people. Regardless of the sources of the digital data, such as books, social media, databases, audio, and video, big data exhibits the characteristics of high-volume, high-velocity (speed of data in and out), and high variety (range of data types and sources).
- Select activity This new type of data ENRICHES RESEARCH PROSPECTS ...
  
  This new type of data enriches research prospects and has potential to advance research in the humanities and social sciences in the following ways:
  Advanced big data collection tools, such as web scraping, and innovative analytic techniques, such as machine learning, may help establish new research methodologies;
  New types of data may reveal new patterns and insights into human society, politics, and economics;
  New types of data may lead to new kinds of research questions that are beyond the perspectives of established theories.
- Select activity A few terms related to big data ARTIFICIAL INTELLI...
  
  A few terms related to big data
  
  Artificial Intelligence (AI)
  
  Set of concepts and technologies (see below) that use intelligent behaviour based on algorithms (set of rules to be followed in the resolution of a specific problem).
  
  Machine Learning (ML)
  
  Automated analytical systems that learn over time, as more data they acquire.
  
  Deep Learning (DL)
  
  Algorithms that use neural networks to learn from unstructured data (images, audios, videos, posts on social networks...).
  
  Cognitive Computing (CC)
  
  Self learning systems that use sets of complex algorithms to mimic processes occurring in the humain brain.
- Select activity OPEN DATA. OPENING OF ADMINISTRATIVE AND POLITICAL...
  
  Open data. Opening of administrative and political data
  
  Open data are freely accessible micro-data that can be used and reused freely by everyone. The term open data first appeared in 1995 in a document from an American scientific agency; it referred to the dissemination of geophysical and environmental data, but the idea that the empirical basis on which knowledge is built is a public good that should be available to all is much older.
  
  Don’t lock it away, do something useful with it notbrucelee, CC BY-SA
- Select activity “the availability of open data creates opportuniti...
  
  “the availability of open data creates opportunities for all kinds of organisations, government agencies and not-for-profits to come up with new ways of addressing society’s problems. These include predictive healthcare, and planning and improving London’s public transport system”
  
  Source : The Conversation, The future will be built on open data – here’s why
- Select activity EXAMPLE: A big boost to open data came in the late...
  
  Example: A big boost to open data came in the late 2000s, when first the OECD invited member country governments to open their data in 2008 and then the United States government launched the datasite.gov in 2009, a web address designed to provide full access to databases and time series that were held by states of the Union and federal agencies.
  
  The European Union launched Open Government Partnership in 2011, an initiative for openness, transparency and civic participation, with the involvement of 65 governments that are committed to activating an action plan on five thematic areas - participation, transparency, integrity, accountability and technological innovation.
  
  The main difference with the past is that, while before some public bodies made all macro-data - that is, the aggregated data - available through publications, online documents, DVDs, etc., open data are micro-data which is downloadable from the internet free of charge, already in matrix format (generally in .csv or .xml format) and immediately usable for secondary analysis.
  
  These are generally data that have great relevance for the planning, monitoring and evaluation of public policies, and which are made open to all with a dual cognitive and regulatory objective. They provide technicians and experts with knowledge bases to redirect and improve policies and also allow citizens to find out whether the policies implemented have had the announced effects or not.
  
  Open data are also a consequence of the importance that transparency and accountability (the obligation for a subject to account for their decisions and to be responsible for the results achieved) are gaining nowadays.
  
  Although open data have created new opportunities for secondary research, it should be emphasized that there are limits to their use. Firstly, there is a problem of issues: if in principle open data can deal with any topic, to date completely public data are almost exclusively economic, geographic and related to transport. A second limitation concerns the way in which data is opened. It would be necessary to complement the matrices with a series of additional information on the methodological choices that have been made to produce those data and indications on the various aspects of their quality. Often this information is missing and this makes it difficult to analyze the data effectively.
- Select activity WHY DATA SHOULD BE OPEN? 1. TRANSPARENCY: In a wel...
  
  Why data should be open?
  
  1. Transparency: In a well-functioning democratic society citizens need to know what their government is doing. To do that, they must be able to freely access government data and information and share that information with other citizens. Transparency isn’t just about access, it is also about sharing and reuse — often, to understand material it needs to be analyzed and visualized and this requires the material to be open so that it can be freely used and reused.
  
  2. Releasing social and commercial value: In the digital age, data is a key resource for social and commercial activities. Everything from finding your local post office to building a search engine requires access to data, much of which is created or held by government. By opening up data, government can help drive the creation of innovative business and services that deliver social and commercial value.
  
  3. Participation and engagement: Participatory governance or, for business and organizations, engaging with your users and audience. Much of the time citizens are only able to engage with their own governance sporadically — maybe just at an election every 4 or 5 years. By opening up data, citizens are enabled to be much more directly informed and involved in decision-making. This is more than transparency: it’s about making a full “read/write” society — not just about knowing what is happening in the process of governance, but being able to contribute to it .
- Select activity Zone texte et média
- Select activity EXTERNAL RESOURCES * The data-driven society, Alex...
  
  External Resources
  
  The data-driven society, Alex “Sandy” Pentland
  
  European data strategy, European Commission
  
  The future will be built on open data – here’s why, The Conversation
  
  Open Government Data, Working Group on Open Government Data at the Open Knowledge Foundation
  
  Algorithms and AI, State of Open Data, Tim Davies
  
  Attitudes towards the impact of digitisation and automation on daily life, European Commission
Select section 1.2 Data and science

Collapse Expand
1.2 Data and science
- Select activity EVOLUTION OF SCIENCE The way science looks today d...
  
  Evolution of science
  The way science looks today differs greatly from the scientific practices of the past. The colossal amount of data and the tools for handling them have a dramatic effect on the way science is done.
  
  Big Data is changing science in two ways:
  
  Science can gather increasing amounts of data from the society that may be used for analysis.
  
  Scientific activities themselves also produce larger amounts of data than ever before. Big data and science
  
  We live in a data-driven world. At any time we have access to a huge amount of digital information, which is growing daily. The increase in the amount of available data has opened the door to a new area of research based on big data - huge data sets that contribute to the creation of better operational tools in all sectors as well as develop scientific research.
- Select activity DATA DRIVEN SCIENCE: A NEW PARADIGM? Science is th...
  
  Data driven science: a new paradigm?
  Science is the pursuit and application of knowledge and understanding of the natural and social world following a systematic methodology based on evidence: observation, experiment, induction, repetition, critical analysis, verification and testing.
  
  Since the beginnings of science, different scientific methodologies have emerged. Some have profoundly changed the way research is conducted, leading to paradigm shifts. The impact of data on science is also causing profound changes. We speak of data driven science, an empirical research method which aims at making inferences from to huge amounts of data.
  
  The debate on the advent of a fourth paradigm remains open. For some, it is not so much a new paradigm as a method which is complementary to traditional approaches and is needed because of the presence of large volumes of data.
  
  In any case, science is increasingly focused on data which, because of their openness and exponential growth, must now be taken into account in the scientific research process.
  
  Let's focus now on the consequences of the consideration of data according to disciplines.
- Select activity CONSEQUENCES ACCORDING TO DISCIPLINES The term 'da...
  
  Consequences according to disciplines
  
  The term 'data' intuitively seems to be more prevalent in natural and social sciences (e.g. survey data, experimental data). Today's humanities researchers seem more inclined to consider their sources and results as research data due to the widespread use of digital means in the academic workflows.
  
  Disciplinary specificities: the digital humanities
  
  Digital Humanities is an emerging field of science where scholars from across the humanities (historians, linguists, artists, media scholars, etc.) work in tandem with librarians, computer and data scientists.
- Select activity ------------------------- By Calvinius — Personal ...
  
  By Calvinius — Personal work: http://www.martingrandjean.ch/wp-content/uploads/2013/10/HumanitesNumeriques.jpg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=29275453
  
  At the beginning, the digital humanities were mainly curating and analyzing data that were born analogue (texts, objects and images) but subsequently archived into digital forms that could be searched for automated guide analysis and visualization. Today, digital humanities consist of the use of sophisticated tools of curating and sharing data, augmenting the scale of research across a more vast range and volume of sources. Rather than concentrating on a basket of sources to analyze, it becomes possible to manage thousands of cultural products (paintings, books, photos, articles, etc.). Counting, classifying, graphing and mapping these data may offer new insights and raise interest in humanities as a field of science.
  
  Some common practices in Digital Humanities are Text and Data Mining and Data visualization.
- Select activity TEXT AND DATA MINING Text mining, or Text and Data...
  
  Text and Data Mining
  
  Text mining, or Text and Data Mining (TDM), is a field which, with the use of appropriate tools, deals with text analysis, exploration, preparation of summaries, clustering and categorisation of documents, finding groups of words with similar meaning or automatic recognition of complex expressions.
  
  By using text-mining methods it is possible to obtain data from the text that are suitable for quantitative statistical analysis. Using text mining represents a completely different approach to text data. They are no longer treated as purely qualitative data, but as a specific source of quantitative data - above all, on the frequency of occurrence of individual words in the analysed text. Text mining allows relatively automated searches of very large portions of text for keywords, their density and so on. This makes it possible to apply new methods of data analysis and to obtain new types of information concerning, among other things, the nature of the analysed texts or the variation in the frequency of keywords over time.
  
  Gabriel Gallezot, Marty Emmanuel. Le temps des SIC. MIÈGE, Bernard, PELISSIER, Nicolas et DOMENGET. Temps et temporalités en information-communication: Des concepts aux méthodes., L’Harmattan, pp.27-44, 2017, 10.5281/zenodo.1000778. sic_01599944
- Select activity DATA VISUALIZATION This modernised technology (and...
  
  Data visualization
  
  This modernised technology (and at the same time methodology) is increasingly present in every sphere of human activity: from research and development to business, social activities and art. It offers practical knowledge of how to graphically "master" huge sets of data that describe a given aspect of reality.
  
  Example of a data vizualisation from a research on Icos Carbon Portal
  
  The purpose of data visualization is to show information in a way that allows its accurate and effective understanding and analysis. This is because people easily recognize and remember the images presented to them (shape, length, construction etc.). Thanks to visualization we can combine large data sets and show all the information at the same time, which greatly facilitates analysis. We can also use visual comparisons, thanks to which it is much easier to find many facts. Another advantage is the ability to analyse data at several levels of detail.
  
  Here is an example of data visualization from the "Republic of Letters". Researchers map thousands of letters exchanged in the 18th century and can learn very rapidly what it once took a lifetime of study to comprehend.
  
  We deal with visualization at every step of our lives. Graphic representation is used on television, in the press and in any other source of information (excluding radio stations) whenever there is numerical data. Visualization is necessary when we want to show a certain currency rate at a certain time (linear chart), election results (histograms) or the weather forecast. However, these are not the only examples of graphic representation of data. While it can serve to make it easier to see certain properties, it also makes it easier to discover them. This above all applies to large data sets compiled over many years which can be used for subsequent research.
- Select activity EXTERNAL RESOURCES * The Fourth Paradigm: Data-Int...
  
  External Resources
  
  The Fourth Paradigm: Data-Intensive Scientific Discovery, Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.
  
  Mapping the Republic of Letters, Before email, faculty meetings, international colloquia, and professional associations, the world of scholarship relied on its own networks.
Select section 1.3 Science and society: COVID-19 example

Collapse Expand
1.3 Science and society: COVID-19 example
- Select activity THE SOCIAL MEDIA, SCIENCE AND POLITICS: HOW COVID-...
  
  The social media, science and politics: how COVID-19 made science processes mainstream
  
  The recent COVID-19 pandemic has highlighted the long-standing need for more openness in science. This includes collaboration, knowledge-sharing, and exchange of ideas. A major aspect of this phenomenon has been the need for open research data in order to “accelerate the pace of research critical to combating the disease” (source: Why open science is critical to combatting COVID-19).
  
  The emergence of the new virus that we knew very little about put an enormous pressure on the governments to make quick decisions based on scarce data.
  
  Moreover, public opinion was swayed by the fake news, emerging discussions by non-experts - including celebrities - on social networks as well as a number of harmful conspiracy theories.
  
  As this global crisis shows, the world of science cannot be perceived as an ivory tower. Instead, we need to see scientists as important actors who interact both with societies and with politicians in an active fashion.
  
  They need to be actively involved in the public debate, present their research results in an accessible way and offer recommendations for public policy.
- Select activity THE LANCET ARTICLE AFFAIR: A QUESTION OF DATA The ...
  
  The Lancet article affair: a question of data
  The medical journal The Lancet is one of the oldest and most reputable journals in its field. It has recently fallen from its pedestal following thewithdrawal of a study on hydroxychloroquine based on unreliable data. A look back at this case highlights the importance of access to source data.
  
  In relation to this case, researchers from around the world wrote an open letter to report numerous irregularities related to the number of COVID-19 cases in different countries.
  
  Quoting from the open letter:
  The authors have not adhered to standard practices in the machine learning and statistics community. They have not released their code or data. There is no data/code sharing and availability statement in the paper. The Lancet was among the many signatories on the Wellcome statement on data sharing for COVID-19 studies.
  Thus, the whole issue revolves around the question of data, or rather the lack of access to data and the impossibility of verifying it, which is synonymous with the inability to conduct reliable scientific research.
- Select activity CONSEQUENCES This appears undoubtedly as a major s...
  
  Consequences
  
  This appears undoubtedly as a major scandal in the field of medical research. It is all the more acute because it concerns an emergency situation - a pandemic caused by a previously unknown virus, which, of course, causes fear, but is also connected with the rise of conspiracy theories and the appearance of fake news. Currently, we are facing not only the health and economic consequences of a pandemic caused by a new coronavirus, but also - as many experts emphasize - an equally dangerous phenomenon accompanying it. This is the so-called infodemic which is a flood of false or misleading information about the virus, disease, etc.
  
  This phenomenon is intensified especially in a situation where scientific publications and experts have been discredited and part of the society has stopped trusting them.
  Also, hasty political decisions were made as a result of blind faith in scientific publications. The consequences turned out to be catastrophic, and eventually the WHO decided to "restart" research programs on hydroxychloroquine.
  The issue concerns the basic principles that should govern science: transparency and rigorous evaluation of results before publication. In this case, both elements were missing.
- Select activity ------------------------- At the same time, howeve...
  
  At the same time, however, the positive aspects of this situation should be noted. As a consequence, the general public was acquainted with the issues and problems of scientific publications. Moreover, for the first time the issue of open access to data, as well as sharing and managing it, has reached such a large group of people not involved professionally in scientific research. As a result of this case, a large part of the society began to take an interest in the issue of accessibility to data and could understand its importance not only in modern science but also in everyday social life.
- Select activity EUROPEAN POLICY Without any doubt, the COVID-19 cr...
  
  European policy
  Without any doubt, the COVID-19 crisis has challenged the way we view data and its relationship with contemporary society. This needs to be mirrored in European policy. Some actions were taken immediately.
  
  For instance, a dedicated section of the European Data Portal (EDP) was created in order to offer verified information and data on COVID-19. It was updated between April and July 2020 and aimed “to ensure that everyone – even people without extensive data skills – can understand and gain insights from the available data”.
  
  This also allowed the citizens themselves to become powerful actors who could collect, organise and analyse data and create useful datasets.
  
  62 datasets and 60 data initiatives were collected as a result.
  
  The EDP editorial team concluded by stating a need for “a cultural change, in which all of us become more data literate and embrace it as a valuable means of information for evidence-based decision-making”.
- Select activity EXTERNAL RESOURCES * Why open science is critical ...
  
  External Resources
  
  Why open science is critical to combatting COVID-19, OECD
  
  James Watson. (2020, May 28) An open letter to Mehra et al and The Lancet. Zenodo., James Watson
  
  Key takeaways of the European Data Portal for COVID-19: Reflections from the editorial team, European Data Portal
Select section 1.4 Open Science and RDM

Collapse Expand
1.4 Open Science and RDM
- Select activity WHAT IS OPEN SCIENCE? The opening to scientific kn...
  
  What is Open Science?
  The opening to scientific knowledge materialised during the 17th century, when the first academic journals were created. These journals provided access to knowledge for society, and enabled different scientific groups to share their resources and conduct their work collaboratively.
  
  Open Science can be seen as a movement in this continuity of access to scientific knowledge. It seeks to make scientific research and the data it produces accessible to all and at all levels of society.
  
  Open Science represents a novel approach to scientific development, based on cooperative work and information distribution through networks using advanced technologies and collaborative tools. Open Science seeks to facilitate knowledge acquisition through collaborative networks and encourage the generation of solutions based on openness and sharing.
  Source : European Commission, "Study on Open Science: Impact, Implications and Policy Options” by Jamil SalmiAugust –2015
- Select activity Open access to publications may immediately come t...
  
  Open access to publications may immediately come to mind when you think about Open Science. However, Open Science is about more than Open Access only and also include MOOCs, Open source software, Citizen science, Open peer-review and Open Data.
- Select activity OPEN SCIENCE INITIATIVES IN EUROPE Since 2016, the...
  
  Open Science initiatives in Europe
  
  Since 2016, the European Commission has organised its Open Science policy according to eight ambitions:
  Open data,
  European Open Science Cloud (EOSC),
  Next Generation Metrics,
  Future of scholarly communication,
  Rewards,
  Research integrity,
  Education and skills,
  Citizen science.
- Select activity EXAMPLES OF OS INITIATIVES Many initiatives are un...
  
  Examples of OS initiatives
  Many initiatives are underway in Europe. Here are some of them.
- Select activity WHY IS RESEARCH DATA MANAGEMENT STRONGLY RELATED T...
  
  Why is Research Data Management strongly related to Open Science
  One of the major challenges for Open Science therefore concerns the opening up of data. But to be effective, data opening must go hand in hand with good data management. In order to be reusable, research data must indeed be rigorously processed (e.g. it must be well documented, described by metadata and recorded in open formats).
  
  There is no simple definition for Research Data Management because it depends on many factors such as the specificity of the project, type of data and others. However, the definition below makes it quite clear what Research Data Management is.
  
  Research data management (or RDM) is a term that describes the organization, storage, preservation, and sharing of data collected and used in a research project. It involves the everyday management of research data during the lifetime of a research project (for example, using consistent file naming conventions). It also involves decisions about how data will be preserved and shared after the project is completed (for example, depositing the data in a repository for long-term archiving and access).
  Source: https://pitt.libguides.com/managedata
  
  And as stated in the Guidelines on FAIR Data Management in Horizon 2020, "Good data management is not a goal in itself, but rather is the key conduit leading to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data publication process".
- Select activity BENEFITS OF RDM AND SHARING The benefits of good d...
  
  Benefits of RDM and sharing
  The benefits of good data management and openness are numerous! In a few points:
  New requirements and opportunities for researchers
  Researchers can better promote their research and be cited, as the data enter the scientific publishing process (data repository, publication of data papers).
  Data sharing may be a condition for obtaining funding for scientific projects or for the publication of an article.
  New perspectives for science
  Making data available offers a better guarantee against scientific fraud.
  Sharing data requires the adoption of good data management practices (describing data, documenting them, making them sustainable, etc.), which improves the quality of research work.
  The cost of creating, collecting and processing data can be very high. Reusing existing data rather than recreating makes research profitable, accelerates innovation and the return on investment in Research and Development.
  The creation of databases allows data mining (Text Data Mining), extraction, cross-checking and the construction of visualizations. These new processes make it easier to initiate new research initiatives and their interdisciplinary nature.
  The deluge of digital data (Big Data) is having an impact on the way scientific research is carried out. We talk about Data Driven Science, an approach that automates discoveries by harnessing the power of computers to find correlations among large amounts of data.
  A better use of public money and a return for society
  Publicly funded research must be open to all. Opening up data makes research more transparent, builds citizens' trust and enables them to get involved (e.g. in citizen science).
  The data generated by Open Data and Big Data provide a field for scientific research, which in turn can inform society about its most recent developments.
- Select activity EXTERNAL RESOURCES * Study on Open Science, Europe...
  
  External Resources
  
  Study on Open Science, European Commission
  The eight ambitions of Open Science, Open science policy has developed progressively in the EU. It concerns all aspects of the research cycle.
  Research Data Management @ Pitt, University of Pittsburg
  H2016-2020 Programme, European Commission
  Horizon Europe - 2021-2027
  Horizon Europe Guidelines (2021-2027)
  Open Access Explained!, Piled Higher and Deeper (PHD Comics)
  Unlock the power of research data, EU Science Hub - Joint Research Centre
  What is Open Peer Review? Tony Ross-Hellauer, FOSTER OPEN SCIENCE
  Citizen Science: Opening up science to society, Action Project
  What is Open Source Software, Brian Daigle
Select section Evaluation 1 - Context and stakes of Research Data Management (RDM)

Collapse Expand
Evaluation 1 - Context and stakes of Research Data Management (RDM)
- Select activity ------------------------- [Badge OBERRED Research ...
  
  Test your knowledge on this first part of the introduction to data management and sharing.
  
  Success in this test is rewarded by an Open Badge! To pass this test, you must be enrolled in the course
  Check
- Select activity PLEASE ANSWER THESE 5 QUESTIONS RELATED TO THE LES...
  
  Please answer these 5 questions related to the lesson to test your knowledge.
Select section 2.1 Understanding the data lifecycle

Collapse Expand
2.1 Understanding the data lifecycle
- Select activity 1. DEFINITION There are many definitions of resear...
  
  1. Definition
  
  There are many definitions of research data. The OECD (Organisation for Economic Co-operation and Development) definition is the most commonly used:
  
  "Research data" are defined as factual records (numerical scores, textual records, images and sound) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings
  Source : OECD, OECD Principles and Guidelines for Access to Research Data from Public Funding, Paris, 2007.
  
  As example, the University of Leeds describes research data as:
  
  Any information that has been collected, observed, generated or created to validate original research findings. Altough usually digital, research data also includes non-digital formats such as laboratory notebooks and diaries."
  Source : University of Leeds Library.
  
  In the context of open science, these definitions can be complemented by broadening the scope of research data produced by researchers, which can also allow other researchers to conduct new research projects.
- Select activity 2. DIVERSITY OF RESEARCH DATA Depending on the pro...
  
  2. Diversity of research data
  Depending on the project, the research data may be:
  
  produced or collected: these are the data created, elaborated, generated during research activities (observations, measurements, etc.)
  
  pre-existing: these are already existing data (corpus, archives...) which are used for the project. The data used may initially have been collected in a context other than the research, but they are used as research data within the framework of the project.
  
  These data can be qualitative (interview data, observation data, open-field questionnaire etc.) or quantitative (measurement table, scored evaluation questionnaire, thermometer etc.). Depending on the context in which they were created (capture or production), their exploitation, analysis and processing, research data may be of different kinds, contained in various media and of all types.
  
  There are several descriptive classifications. One of these is:
  
  the source of the data
  
  the form of the data
- Select activity WHAT IS THE SOURCE OF THE DATA? OBSERVATION DATA -...
  
  What is the source of the data?
  
  Observation data
  
  Observation data
  
  Observation data are captured in real time. They are captured by observing a behaviour or activity and are therefore most often unique and impossible to reproduce. This is the case with sensor data, neuroimaging, astronomical photography or survey data.
  
  Experimental data
  
  Experimental data
  
  Experimental data are obtained from laboratory equipment. They are often reproducible but this can be costly. Chromatographs and DNA chips fall into this category.
  
  Computational or simulation data
  
  Computational or simulation data
  
  Computational or simulation data are generated by computer or simulation models. They generate more important metadata. They are often reproducible provided that the model is properly documented. For simulations data, the test model wich is used is often as important than the data generated from the simulation and sometimes even more so. Examples include meteorological models, seismic simulation models and economic models.
  
  Derived or compiled data
  
  Derived or compiled data
  
  Derived or compiled data are derived from the processing or combination of raw data. They are often reproducible but expensive. This is the case for data obtained by text mining, 3D models or compiled databases.
  
  Reference data
  
  Reference data
  
  Collection or accumulation of small datasets that have been peer reviewed, annotated and made available.
- Select activity WHAT FORM DOES THIS DATA TAKE? TEXTUAL DATA : FIEL...
  
  What form does this data take?
  Textual data : Field or laboratory notes, survey responses...
  Digital data : Tables, measures...
  Audiovisual data : Images, sounds, videos…
  Computer codes
  Discipline-specific data : For example FITS in spatial data or CIF in crystallography...
  Specific data produced by some instruments
- Select activity 3. WHY MANAGE AND SHARE YOUR DATA * QUANTITY: a go...
  
  3. Why manage and share your data
  
  Quantity: a good management is necessary because of big data and especially to avoid data loss.
  Quality: sharing data requires good data management practices, which improves the quality of research work.
  Validation of research results: sharing data contribute to validate research results. More and more publishers ask researchers to make available all underlying data mentioned in the submitted article.
  Integrity: making data available ensures a better security against scientific fraud.
  Valorisation: data sharing allows the researcher to enhance the value of his data and increase its visibility (citation).
  Funding: data sharing (based on the principle of "as open as possible, as closed as necessary") may be a condition for project funding.
  Reproducibility and reuse: the cost of creating, collecting and processing data can be very high. Reusing existing data rather than recreating them reduces time and cost of research.
  Interdisciplinarity: databases allow better search, extraction, cross-references and visualization of data, particularly from different disciplines.
  Exhumation of "fossilized" data: publications provide access to about 10% of the data. The 90% remaining stays on computer hard drives and are not used. They are called "fossilized data". Proper management and sharing of this data would prevent the loss of unique data.
  Patrimonial value: Some research data can have a scientific patrimonial value. It is particularly important to organize a good management and sharing of these data.
- Select activity 4. RESEARCH DATA LIFE CYCLE The data life cycle is...
  
  4. Research Data Life Cycle
  
  The data life cycle is the set of steps involved in the management, preservation and dissemination of the research data, associated with research activities. This cycle guides researchers through the research data management process to enable them and their stakeholders to make the most of the research data generated.
  Source : DoRANum
  
  It can be divided into six different phases: Planning, Collecting, Analysing, Publishing, Preserving, and Reusing.
  
  Source : Adaptation of Research data lifecycle – UK Data Service
- Select activity EXTERNAL RESOURCES * Research data lifecycle, UK D...
  
  External Resources
  
  Research data lifecycle, UK Data Service
  Passport For Open Science, CoSo
Select section 2.2 FAIR principles

Collapse Expand
2.2 FAIR principles
- Select activity 1. DEFINITION It’s recommended to respect the 4 FA...
  
  1. Definition
  It’s recommended to respect the 4 FAIR principles in order to ensure an optimal use of research data and associated metadata, both by people and by machines.
  
  By SangyaPundir — Personnal work, CC BY-SA 4.0
  
  FAIR data are those that are Findable, Accessible, Interoperable and Reusable.
  What do each of these terms mean in a practical sense and how can you tell if your own research data is FAIR?
- Select activity 2. EXPLAINING THE FAIR PRINCIPLES A FAIR data is a...
  
  2. Explaining the FAIR Principles
  A FAIR data is a data...
  
  F - Findable
  
  Principle F is implemented through the use of persistent identifiers (for example: DOI), rich metadata, by listing in catalogs, in repositories...
  
  A - Accessible
  
  Principle A means implementing long-term storage of data and metadata, with facilitated access and/or download (standardized and open communication protocols), and specification of access and use conditions.
  
  I - Interoperable
  
  Principle I means that the data is downloadable, usable, intelligible and combinable with other data, by humans and machines, through the use of standard formats, vocabularies and ontologies.
  
  R - Reuseable
  
  Principle R relies on characteristics that make data reusable for future research or other purposes (teaching, innovation, reproduction/scientific transparency). This is made possible by a rich description that specifies the data provenance, the use of community standards, and the addition of licenses.
  
  The gradual adoption of these FAIR principles will make data easier to share and reusable by both humans and computer systems.
  
  Examples of implementation of FAIR principles
  Many recommended actions for the management and sharing of research data are fully or partially compliant with FAIR. Some examples are:
  
  As a researcher in Social and Human Sciences: I securely save and store my data throughout the project using SHAREDOCS and Huma-Num Box.
  
  I work in the field of ecology: I document the metadata associated with my data according to the EML (Ecological Metadata Language) standard.
  
  I organize and name my files in the same way as all project partners.
  
  As an archaeologist: I use a disciplinary controlled vocabulary, the PACTOLS thesaurus.
  
  I apply an Etalab license to my datasets.
  
  I deposit my genomics data in the GenBank data repository.
  
  My datasets are uniquely and persistently identified by a DOI.
  
  I communicate my source codes.
  
  I make my files available in .csv rather than .xls, that is, in an open and non-proprietary format.
  
  As an ethnologist: during my research project, I conducted interviews that have significant heritage value. I deposit my datasets in the CINES permanent archiving platform.
  
  These various actions contribute to making my data FAIR!
- Select activity 3. TEST YOUR DATA Test your data with this checkli...
  
  3. Test your data
  Test your data with this checklist created by Sarah Jones & Marjan Grootveld, EUDAT (2017):
- Select activity 4. PLAY WITH FAIR PRINCIPLES How do you think the ...
  
  4. Play with FAIR principles
  How do you think the FAIR principles benefits the researcher and the scientific community?
  
  Instructions: A researcher has produced data within a research project in accordance with FAIR principles. This offers immediate benefits within the framework of their project and career, but can also benefit the scientific community later. Place each card on one of the two zones identified "For the Researcher" and "For the Scientific Community".
- Select activity EXTERNAL RESOURCES * Assessing the FAIRness of Dat...
  
  External Resources
  
  Assessing the FAIRness of Data, Mooc Foster Open Science
  
  FAIR Principles, GO FAIR
  
  FAIR Cookbook, ELIXIR community
  
  Study Materials
  How-FAIR-are-your-data-flyer-2.pdf
Select section 2.3 Data Management Plan (DMP)

Collapse Expand
2.3 Data Management Plan (DMP)
- Select activity Here is a video that explain what does a DMP conta...
  
  Here is a video that explain what does a DMP contain and when it should be written:
- Select activity 1. DEFINITION DMP means Data Management Plan. It i...
  
  1. Definition
  
  DMP means Data Management Plan. It is a management tool structured in headings wich synthesize the description and evolution of data sets during your research project. it allows to prepare the sharing, reuse and long term preservation of data
  Source: DoraNum
  
  Adaptation of Research data lifecycle – UK Data Service
- Select activity 2. DMP: AN ESSENTIAL DOCUMENT The DMP has become a...
  
  2. DMP: an essential document
  The DMP has become a widespread management tool. It is increasingly recommended or required worldwide.
  
  In Europe, projects funded by the European Commission are required to deliver a data management plan: the initial version of the DMP is included among the deliverables six months after the start of the project (Horizon 2020 models, ERC-European Research Council).
  
  To promote the management and sharing of research data, a lot of European initiatives have been deployed, including tools and infrastructures (e.g. the Zenodo repository, the OpenAIRE infrastructure, etc.). In the same way, many institutes, organizations and institutions propose institutional DMP models available to their communities.
- Select activity 3. ACTORS AND CONTRIBUTORS The researcher is not a...
  
  3. Actors and contributors
  The researcher is not alone to write his DMP. The DMP is an opportunity of collaboration with the various stakeholders of the project: scientists, IT specialists, data librarians, project managers, lawyers... Data management requires a collective effort!
  
  Designed by Freepik ; Designed by macrovector / Freepik Designed by makyzz / Freepik
  Moreover, universities, infrastructures and research organizations often issue recommendations to their research communities.
  Funders (such as the European Commission or Science Europe) and certain publishers can give precise recommendations (for example: obligation to write a DMP within 6 months of the start of the project for projects funded by Science Europe) or propose advices (for example: the European Commission indicates the existence of the Zenodo repository in its Horizon 2020 guide).
- Select activity 4. DMP: A PROJECT MANAGEMENT TOOL It is an evolvin...
  
  4. DMP: a project management tool
  It is an evolving, dynamic and continuously updated document (introduction of a new dataset, patent deposit, changes in the consortium...). It is also a project management tool that facilitate the organization and description of data. It allows to define responsibilities, resources and produce FAIR data.
  
  Data Organization
  The DMP helps to organize data well throughout the project.
  
  Evolving Document
  Start drafting the DMP from the beginning of the project, with already known or planned elements. Then complete the DMP progressively. Plan for at least 2 versions: at the beginning and end of the project. For projects longer than 30 months, an intermediate version is required.
  
  Data Description
  In the DMP, describe how data will be obtained, processed, organized, stored, secured, preserved, shared... (data lifecycle).
  
  Responsibilities
  In the DMP, designate the person(s) responsible for data management for all project stages and within the partnership if applicable: data entry; metadata production; data quality control; data storage, sharing and archiving; DMP updating. Individuals can be named specifically or a function can be indicated if the person occupying it might change during the project.
  
  Resources
  Evaluate the necessary resources (budget, allocated time, personnel) to implement the actions described in the DMP: Time needed to prepare data for storage, sharing and archiving; equipment costs and personnel remuneration; storage costs (dedicated servers, processing, maintenance, security, access...), sharing costs (website, publication...) and data archiving expenses.
  
  Reliable Data
  The DMP allows data producers to ask themselves the right questions and thus improve the reliability of their data.
  
  DMP contribute to initiate very early a collective work on good practices and to anticipate questions related to data management (such as the choice of a repository, how to document the data ...).
- Select activity 5. DMP STRUCTURATION Many DMP models have been cre...
  
  5. DMP structuration
  
  Many DMP models have been created by organizations, institutes and funders for their users, in order to respond to specific features or local contexts…Nevertheless, these models contain the same elements in accordance to the data life cycle:
  
  Administrative informations
  
  Description of data
  
  Documentation, metadata, standards
  
  Data storage during the project
  
  Sharing data in a repository
  
  Persistent archiving
  
  Data security
  
  Legal and ethical aspects
  
  Responsibilities
  
  Costs
  
  Consult the “data management checklist” created by ETH-Bibliothek; Bibliothèque EPFL :
  Source : https://zenodo.org/record/3332363#.Xz44rDaP5aQ
- Select activity 6. DATA MANAGEMENT PLANNING TOOLS – DMPONLINE
  
  6. Data management planning tools – DMPonline
- Select activity 7. IN BRIEF To conclude, see the summary sheet on ...
  
  7. In brief
  
  To conclude, see the summary sheet on the DMP proposed by DataOne:
  
  Source: https://dataoneorg.github.io/Education//lessons/03_planning/L03_DataManagement_Handout.pdf
- Select activity EXTERNAL RESOURCES * Managing and Sharing Research...
  
  External Resources
  
  Managing and Sharing Research Data, Mooc Foster Open Science (s'ouvre dans un nouvel onglet)
  
  Data Management Plan Catalogue, LIBER Research Data Management Working Group (s'ouvre dans un nouvel onglet)
  
  Practical Guide to the International Alignment of Research Data Management, Science Europe (s'ouvre dans un nouvel onglet)
  
  Study materials
  
  DMP-Checklist.pdf (s'ouvre dans un nouvel onglet)
  
  L03_DataManagement_Handout.pdf (s'ouvre dans un nouvel onglet)
Select section 2.4 Legal and ethical aspects

Collapse Expand
2.4 Legal and ethical aspects
- Select activity 1. DATA PROTECTION AND ETHICS
  
  1. Data protection and ethics
- Select activity 2. COMMUNICABILITY OF DATA The communicability of ...
  
  2. Communicability of data
  The communicability of the data can be conditioned by:
  
  nature or type of data,
  
  origin of data,
  
  their use(s).
  
  The communication of data may be restricted for a period of time or permanently. Any restriction must be mentioned and explained in the DMP.
  
  Required communication for certain disciplines:
  For example, the INSPIRE Directive obliges to communicate spatial and environmental data.
  
  Possibility of communication if certain conditions are respected:
  
  data protected by copyright or by contract,
  
  personal data,
  
  statistics...
  
  Communication prohibited:
  
  professional confidentiality,
  
  secrecy,
  
  security of the establishment...
- Select activity 3. DATA ACCESSIBILITY There are several ways to li...
  
  3. Data accessibility
  There are several ways to limit access to your data:
  
  Access
  
  Depending on the progress of the research project, access modalities may differ. During the project, it may be necessary, or even crucial, to limit data access to research team members only. Once the project is completed, it can be equally important to limit data access. Four modes of data access are distinguished: open, with embargo, restricted, and closed. Depending on the desired access mode, different modalities can be implemented:
  
  password,
  
  access limited to certain people (to consortium members or a scientific community, for example) time-limited,
  
  access with an embargo.
  
  Security
  
  Data itself can be secured through:
  
  encryption
  
  pseudonymization
  
  anonymization.
  
  Particular attention must be paid to sensitive data and personal data!
  
  Attention, pseudonymized does not mean anonymized! Unlike anonymization, which is an irreversible action, pseudonymization is reversible and can therefore potentially allow the identification of a natural person.
  
  It is recommended to use encryption of your sensitive data to prevent hacking.
- Select activity 4. MOOC FOSTER OPEN SCIENCE “OPEN LICENSING” THREE...
  
  4. Mooc Foster Open Science “Open Licensing”
  Three steps to data licensing
- Select activity 5. TOOLS Here are some useful tools to help you ch...
  
  5. Tools
  Here are some useful tools to help you choose a license:
  Licence Selector:http://ufal.github.io/public-license-selector/
  Choose an open source licence: https://choosealicense.com/
- Select activity 6. TO REMEMBER
  
  6. To remember
- Select activity EXTERNAL RESOURCES * Data protection and ethics, M...
  
  External Resources
  
  Data protection and ethics, Mooc Foster Open Science
  Open Licensing, Mooc Foster Open Science
  Licence Selector, Tool, Institute of Formal and Applied Linguistics
  Choose an open source licence, Tool
  RDA & CODATA Legal Interoperability Of Research Data: Principles And Implementation Guidelines, Research Data Alliance
  INSPIRE Knowledge Base, European Commission
  European legislation on open data, European Commission
  Horizon 2020 Programme Guidance How to complete your ethics self-assessment , European Commission
  Ethics for researchers, European Commission
  Online Manual: Funding & tender opportunities, European Commission
  European Network of Research Ethics Committees - EUREC, European Commission
  Ethics case studies, ESRC
  Research ethics, ESRC
  Ethics and data protection, CESSDA
  Data protection Rules for the protection of personal data inside and outside the EU, European Commission

2.5 Metadata

Select activity 1. DEFINITION Metadata allows a more accurate desc...

1. Definition

Metadata allows a more accurate description of the data. it is data about data.
If we imagine a dataset as a can, then the metadata are the label that describes the consents of the can (date of production, creator, etc.).
Source : Doranum - Parcours interactif sur la gestion des données de la recherche
Select activity ------------------------- WITHOUT METADATA [Boites...
Without metadata

Source : picture by Araceli Jáuregui from Pixabay

With metadata

Source :picture by heberhard from Pixabay
Select activity 2. METADATA IN THE DATA LIFE CYCLE It is recommend...
2. Metadata in the data life cycle

It is recommended to complete the metadata as the project progresses, with particular attention to:
the step of data sharing,
the step of persistent archiving (specific metadata will have to be added).

Adaptation of Research data lifecycle – UK Data Service
Select activity 3. EMBEDDED METADATA VS ENRICHED METADATA There ar...
3. Embedded metadata vs enriched metadata
There are two types of metadata: embedded and enriched metadata.

Embedded Metadata
They are automatically produced by devices (cameras, sound recorders, measurement instruments...). This is typically the case for smartphone photos or videos. Examples of generated metadata: GPS data, device type, date, technical calibration, etc.

Enriched Metadata
They are added by the author. Examples: keywords, subject, author, laboratory or organization, project name, license, etc.

Don't forget to complete the embedded metadata with enriched metadata. Ideally, this metadata should be filled in as you go along. It is recommended to use disciplinary controlled vocabularies (ontologies, lexicons, thesaurus...). This will increase the ability of the data to be combined with other data.

For example:

Drugs Codex

Taxonomic classifications

IUPAC nomenclature of chemistry

To organize the metadata it is recommended to use a metadata standard specific to your discipline or adapted to your needs. If none exists, a metadata schema will need to be created.
Select activity 4. DIFFERENCE BETWEEN METADATA SCHEMA AND METADATA...
4. Difference between metadata schema and metadata standard

Metadata schema: it is the organization of metadata according to a model designed and created specifically for the needs of a project. This structuring is therefore unique and personalized!

Metadata standard: a standard is a schema that has been adopted as a model by a set of users: it is recognized, standardized and widely used.

To find standards used in your discipline, you can interview your researcher collaborators or computer scientists or data librarians and see what are the practices in your field.

Several directories and sites can be consulted:

Digital Curation Center (DCC) – Disciplinary Metadata : https://www.dcc.ac.uk/guidance/standards/metadata

Research Data Alliance (RDA) - Metadata Standards Directory : https://rd-alliance.github.io/metadata-directory/standards/

Don't forget to consult the information on metadata standards provided by data repositories.
Select activity 5. EXAMPLES OF METADATA STANDARDS DUNBLIN CORE It ...
5. Examples of metadata standards

Dunblin Core

It is an interdisciplinary standard for describing digital resources.

http://dublincore.org

DataCite Metadata Schema

Standard linked to the attribution of persistent DOI identifiers.

https://schema.datacite.org/

DDI (Data Documentation Initiative)

Standard linked to the attribution of persistent DOI identifiers. https://schema.datacite.org/

Disciplinary standard for the domain of social, behavioral, and economic sciences.

http://www.ddialliance.org/

CSMD-CCLRC Core Scientific Metadata Model

Metadata standard for structural sciences domains (chemistry, materials science, earth sciences, biochemistry). http://icatproject-contrib.github.io/CSMD/

DwC (Darwin Core)

Disciplinary standard in the biodiversity domain. http://rs.tdwg.org/dwc/

EML (Ecological Metadata Language)

Disciplinary standard in the ecology domain: it was largely designed to describe digital resources. It can also be used to describe non-digital resources such as paper maps or other media. https://eml.ecoinformatics.org/

MIDAS-Heritage

Disciplinary standard in the architecture domain. https://historicengland.org.uk/images-books/publications/midas-heritage/

ISO 19115

International standard for describing geographic information and services. https://www.iso.org/standard/53798.html
Select activity 6. FOCUS ON THE INTERNATIONAL DUBLIN CORE STANDARD...
6. Focus on the international Dublin Core standard

The Dublin Core is a widely used international and multidisciplinary standard. Moreover, it is often the base of disciplinary or specific data standards. It contains 15 elements that constitute the minimum required:

elements related to content

elements related to intellectual property.

Dublin Core Metadata Element Set

"The original DCMES Version 1.1 consists of 15 metadata elements, defined this way in the original specification:

Contributor – An entity responsible for making contributions to the resource

Coverage – The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant

Creator – An entity primarily responsible for making the resource

Date – A point or period of time associated with an event in the lifecycle of the resource

Description – An account of the resource

Format – The file format, physical medium, or dimensions of the resource

Identifier – An unambiguous reference to the resource within a given context

Language – A language of the resource

Publisher – An entity responsible for making the resource available

Relation – A related resource

Rights – Information about rights held in and over the resource

Source – A related resource from which the described resource is derived

Subject – The topic of the resource

Title – A name given to the resource

Type – The nature or genre of the resource.”

Source: https://en.wikipedia.org/wiki/Dublin_Core
"The fifteen basic elements are considered as a common denominator and in most cases are not sufficiently precise. The basic elements have been extended (or specified) by a set of other terms called "qualifiers".
Two classes of qualifiers are recognized:
element refinements that explain the meaning of an element;
encoding schemes or controlled vocabularies."
Source: Extract translated from " Présentation des standards: le Dublin Core" by Elizabeth CHERHAL (Cellule MathDoc, UMS5638, CNRS/Université Joseph Fourier, Grenoble) - https://www.enssib.fr/bibliotheque-numerique/documents/1236-presentation-des-standards-le-dublin-core-dc.pdf

7. Focus on DataCite metadata schema

Content from DataCite Metadata Working Group. (2019). DataCite Metadata Schema Documentation for the Publication and Citation of Research Data. Version 4.3. DataCite e.V. https://doi.org/10.14454/7xq3-zf69

The Metadata Schema

DataCite’s Metadata Schema has been expanded with each new version. It is, nevertheless, intended to be generic to the broadest range of research datasets, rather than customized to the needs of any particular discipline.

DataCite Metadata Properties

There are three different levels of obligation for the metadata properties:

Mandatory (M) properties must be provided,
Recommended (R ) properties are optional, but strongly recommended for interoperability and
Optional (O) properties are optional and provide richer description.

Researchers who wish to enhance the prospects that their metadata will be found, cited and linked to original research are strongly encouraged to submit the Recommended as well as Mandatory set of properties. The properties listed in Table 1 have the obligation level Mandatory, and must be supplied when submitting DataCite metadata.

Table 1: DataCite Mandatory Properties

ID	Property	Obligation
1	Identifier (with mandatory type sub-property)	M
2	Creator (with optional given name, family name, name identifier and affiliation sub-properties)	M
3	Title (with optional type sub-properties)	M
4	Publisher	M
5	Publication year	M
10	Ressource type (with mandatory general type description sub-property)	M

The properties listed in Table 2 have one of the obligation levels Recommended or Optional, and may be supplied when submitting DataCite metadata.

Table 2: DataCite Recommended and Optional Properties

ID	Property	Obligation
6	Subject (with scheme sub-property)	R
7	Contributor (with optional given name, family name, name identifier and affiliation sub-properties)	R
8	Date (with type sub-property)	R
9	Language	O
11	AlternateIdentifier (with type sub-property)	O
12	RelatedIdentifier (with type and relation type sub-properties)	R
13	Size	O
14	Format	O
15	Version	O
16	Rights	O
17	Description (with type sub-property)	R
18	GeoLocation (with point, box and polygon sub-properties)	R
19	FundingReference (with name, identifier, and award related sub-properties)	O

DataCite Properties

Table 3 provides a detailed description of the mandatory properties, which must be supplied with any initial metadata submission to DataCite, together with their sub-properties. [...] The third column, Occurrence (Occ), indicates cardinality/quantity constraints for the properties as follows:

0-n = optional and repeatable
0-1 = optional, but not repeatable
1-n = required and repeatable
1 = required, but not repeatable

Table 3

Select activity 8. FOCUS ON AN EXAMPLE OF A METADATA MODEL IN THE ...

8. Focus on an example of a metadata model in the environmental domain
"Here is a model you can use by choosing the metadata fields suitable to your context, the repository where you upload your data, and which will convey minimum and sufficient information to help others understand and reproduce your data." * The required fields about protocols can be repeated if several protocols have been implemented consecutively (e.g. sampling, sample preparation, measurements, data processing, etc.)

Source: Extract of "Guide of Good practices - Research data management and promotion" by ARNOULD Pierre-Yves (OTELo), JACQUEMOT-PERBAL Marie-Christine (Inist-CNRS)
Select activity 9. IN BRIEF Metadata are useful for: * Understand ...
9. In brief

Metadata are useful for:

Understand the origin of the data

Understand the context of the creation or collect of data

Improving harvesting by machines (search engine)

Ensuring interoperability

Know the conditions for reusing and sharing data

Access useful information when data are not shared or destructed.

To conclude, see the summary sheet on metadata proposed by DataOne:

Source: https://dataoneorg.github.io/Education//lessons/07_metadata/L07_DefiningMetadata_Handout.pdf
Select activity EXTERNAL RESOURCES * Disciplinary Metadata, Digita...
External Resources

Disciplinary Metadata, Digital Curation Center (DCC)
Metadata Standards Directory, Research Data Alliance (RDA)
FAIRsharing standards, FAIRsharing

Study materials

L07_DefiningMetadata_Handout.pdf
Tableau_metadata_PYA-MCJB.pdf
Oberred_Table-3-DataCite-Metadata.pdf

Select section 2.6 Persistent identifiers

Collapse Expand
2.6 Persistent identifiers
- Select activity Persistent identifiers are assigned to the data at...
  
  Persistent identifiers are assigned to the data at the sharing step.
  
  Source: Adaptation of Research data lifecycle – UK Data Service
- Select activity 1. DEFINITION An identifier is a unique associatio...
  
  1. Definition
  
  An identifier is a unique association between an alphanumeric code and an entity or a ressource. On the web, ressources are located by URLs. However, these URL's are not stable. If the resource is moved and/or renamed, it is no longer accessible. The browser then displays the 404 error code. Persistent identifiers guarantee a stable link to the online resource. The persistency is obtained by an active management of URLs.
  This management is ensured by recognized organizations, support by human and technical infrastructures. The identity of the resource is matched to its location on the web. The hypertext link access will be guaranteed and will never be broken.
  The role of persistent identifier is to facilitate the tracking, to locate, access and cite the results of research production:
  
  Persistent identifiers allows a sure identification (to a resource, an author...).
  
  Persistent identifiers for publications and data allow to access them over the long term.
  
  They link published articles to the underlying data sets.
  
  They also help to discover, share, reuse and cite the results of research and scientific production.
  
  Source : Doranum - Persistents identifiers
  
  The ideal identification is a combination of several identifiers:
  
  PID for publications
  
  PID for data
  
  PID for authors
  
  PID for research organizations
  
  Source : Doranum - Identifiants pérennes : FICHE SYNTHÉTIQUE
  
  For publications, the attribution of a perennial identifier is a well-established and systematized procedure. Most publishers and open archives automatically assign a persistent identifier to each article. This is most often a Handle or DOI. The latter is assigned through the CrossRef agency.
  
  Source : Doranum - Les identifiants pérennes : un aperçu
  
  Identifiers are often assigned to your data when they are deposited in a repository: it can be a local identifier, or a unique global identifier.
  
  In this course, we will not talk about PID for publications.
- Select activity 2. PID FOR DATA It is recommended to assign a pers...
  
  2. PID for data
  
  It is recommended to assign a persistent identifier to each dataset.
  
  Persistent identifiers for data are assigned to resources resulting from scientific production, for example datasets, images, sounds, physical objects...
  
  With the deployment of the Internet and the online availability of research data, identifiers better adapted to the digital world have been put in place such as:
  
  DOI (Digital Object Identifier)
  
  Handle
  
  PURL (persistent URL)
  
  ARK (Archive Resource Key)
  
  ePIC (European Persistent Identifier Consortium)…
  
  Focus on DOI: https://datacite.org/index.html
- Select activity 3. PID FOR AUTHOR Having an author ID allows: * to...
  
  3. PID for author
  
  Having an author ID allows:
  
  to make the link with his scientific productions
  
  to be well identified and cited.
  
  The most widely used is ORCID, an international, neutral and independent identifier.
  
  There are also several types of identifiers dedicated to authors and contributors involved in research:
  
  Commercial publishers assign local identifiers for their database: for example Clarivate with the ResearcherID identifier or Elsevier with Scopus Author ID,
  
  Social networks such as ResearchGate and Academia.edu assign each registrant his or her own identifier,
  
  open archives can propose the creation of a local identifier, for example arXiv author ID for the Arxiv open archive,
  
  in the worldof libraries: the ISNI identifier (International Standard Name Identifier) is an international identifier attributed to persons and institutions involved in literary, artistic and intellectual production in the broadest sense. It is defined by an ISO standard.
  
  Focus on ORCID: https://orcid.org/
- Select activity 4. PID FOR RESEARCH ORGANIZATIONS There are persis...
  
  4. PID for research organizations
  
  There are persistent contributor identifiers for the authors but also for the research organizations:
  
  they allow to link the author with his organization
  
  they are important for research organizations to identify all the scientific productions of their researchers
  
  Focus on ROR: https://ror.org/
- Select activity 5. PLAY WITH PID Instructions: Place each card on ...
  
  5. Play with PID
  Instructions: Place each card on one of the four zones identified "Author", "Data", "Research organization" and "Publication". Some cards appear twice.
- Select activity EXTERNAL RESOURCES * DOI, Digital Object Identifie...
  
  External Resources
  
  DOI, Digital Object Identifier, DataCite
  ORCID, Open Research Contributor ID
  ROR, Research Organization Registry
Select section 2.7 Three distinct steps (secure backup, depositing in a repository, long-term archiving)

Collapse Expand
2.7 Three distinct steps (secure backup, depositing in a repository, long-term archiving)
- Select activity STORAGE AND SECURE BACKUP, SHARING, LONG-TERM ARCH...
  
  Storage and secure backup, sharing, long-term archiving occur at different steps of the data lifecycle, and have distinct functions.
  Here is a scheme to understand the difference between these 3 steps:
  
  Source: DoRANum - Stockage, partage et archivage : quelles différences ?
- Select activity 1.STORAGE AND SECURE DATA BACKUP DURING THE PROJEC...
  
  1.Storage and secure data backup during the project
  
  The first step is the storage and secure backup of the data during all the project:
  
  The objectives are to:
  ensure data security
  facilitate access for all project collaborators
  Storage and secure data backup in the data lifecycle
  This concerns the first part of the data life cycle.
  
  Adaptation of Research data lifecycle – UK Data Service
- Select activity SECURE DATA BACKUP MEASURES TO BE IMPLEMENTED Effi...
  
  Secure data backup measures to be implemented
  
  Efficient backup means duplicating and storing data in different locations on different media in a time frame relevant to the project.
  
  The best is to apply the 3-2-1 rule, which means:
  
  keep 3 copies of the data,
  
  anticipate 2 distinct supports or technologies,
  
  1 of which is off-site.
  
  In any case, it is necessary to organize and plan these backups, taking care to manage versions. At each step of the project, select the data to be backed up or deleted. The different states of the data are kept in correlation with the different processing steps, allowing to return to a previous version if necessary.
  
  This also requires the choice of a hosting and backup policy adapted to the needs of the project concerning the specificities of data backup (for example in case of sensitive data, big volumes...). This can be on local servers (virtual machines), an institutional cloud with secure access etc.
  
  To improve your knowledge about "durability of storage media", see this infography:
  
  Source: von Rekowski, Thomas. (2018, October). Durability of Storage Media. Zenodo.
  
  Folder structure and file naming
  
  Reliable access requires some rules to organize folder structure and unique and accurate naming of data files:
  
  File Formats
  
  The choice of a format can be guided by:
  
  the recommendations of an institution,
  
  the uses of the scientific community of the discipline,
  
  the software or equipment used.
  
  The ideal is to opt for file formats that are as open as possible (non-proprietary), standardized and durable, for example:
  
  prefer .csv over .xls
  
  prefer .odt to .doc
  
  prefer .jpg over .tif
  
  In any case, it is necessary to mention in the DMP which formats will be used.
- Select activity 2. DEPOSITING DATA IN A REPOSITORY FOR SHARING Thi...
  
  2. Depositing data in a repository for sharing
  
  This step occurs often after the project (but you can share your data earlier): it is necessary to deposit the datasets in a repository.
  A repository allow to storage, access and reuse of data.
  The sharing of data in a repository provides a wide access to the scientific community, for a short and medium term (5 to 10 years).
  
  Sharing data in the data lifecycle
  
  Data sharing is often complementary to scientific publication during and after the end of the research project.
  
  Adaptation of Research data lifecycle – UK Data Service
- Select activity HOW TO PREPARE DATA ACCORDING TO FAIR PRINCIPLES T...
  
  How to prepare data according to FAIR principles
  
  The goal is to share the research data of the project in optimal conditions.
  
  All the data must be prepared according to FAIR principles, even if they are shared partially or with restricted access.
  The data must be deposited in the chosen repository with metadata, and eventually the source codes necessary for reading and understanding them.
  
  Here is a checklist to prepare efficiently the data:
  
  1. Select data to share
  
  Not all data necessarily needs to be shared. The research team must select the datasets they wish to share and, for each of them, define access modalities.
  
  2. Data Formats
  
  Check the compatibility and interoperability of data formats, Migrate if necessary to an appropriate, as open as possible format.
  
  3. Source Codes
  
  Prepare source codes (e.g., scripts) if necessary to read and process the data.
  
  4.Metadata
  
  Complete and enrich metadata according to the chosen repository: if not already done, choose a metadata standard, if no suitable standard exists, create a metadata schema, complete the fields for each dataset, following the adopted standard.
  
  To improve your knowledge about "preparing your data collection for deposit", watch this video by UK DATA SERVICE :
  
  How to choose a data repository
  
  There are different categories of repositories:
  
  publisher-specific
  
  discipline-specific
  
  institution-specific
  
  and multidisciplinary repositories.
  
  Most often, the repository is recommended by the institutions (e.g. the French repository Data INRAE), by funders (e.g. Zenodo recommended by the European Commission) or by the scientific community (e.g. GenBank, Pangaea, Dryad, etc.). It is sometimes imposed by an editor (e.g. Gene Expression Omnibus).
  If there is no recommendation, choose it in a directory (e.g. re3data, OAD, OpenDOAR, FAIRsharing, etc.).
  In any case, a data librarian can help the research team to choose a relevant repository.
- Select activity EXAMPLE OF A SEARCH IN THE RE3DATA DIRECTORY -----...
  
  Example of a search in the re3data directory
  
  https://www.re3data.org/
  
  Filters can be used to search in this directory.
  
  For each repository, a short descriptive sheets presents
  
  the subject,
  
  the type of content,
  
  the country,
  
  a small summary,
  
  icons of the criteria.
  
  Example for the 4TU repository: https://www.re3data.org/repository/r3d100010216
  
  Tip: The search engine Google Dataset Search is also a simple tool to search for data repositories.
- Select activity 3. LONG-TERM ARCHIVING Long-term archiving is the ...
  
  3. Long-term archiving
  
  Long-term archiving is the ultimate step in saving and storage research data.
  
  Long-term archiving in the data lifecycle
  
  Long-term archiving generally concerns only a part of the data produced by a project. For some projects, it is not necessary to archive data.
  
  Adaptation of Research data lifecycle – UK Data Service
  
  Definition
  
  The question of long-term archiving only concerns data:
  
  with a scientific value for all the community
  
  requiring preservation for at least 30 years.
  
  It is an expensive operation that needs an allocated budget. This is the responsibility of the laboratory and not the researcher.
  Concretely, long-term digital archiving consists of preserving the document and its content:
  
  in its physical and intellectual aspects,
  
  over the very long term,
  
  to be always accessible and understandable.
  
  Long-term archiving services in Europe
  
  At the European level, there are several infrastructures that specifically propose long-term archiving services.
  The European Open Science Cloud (EOSC) Portal is an integrated platform that allows easy access to lots of services and resources for various research domains along with integrated data analytics tools. It includes services for long-term archiving, for example:
  
  EGI Archive storage: Archive Storage allows you to store large amounts of data in a secure environment freeing up your usual online storage resources. The data on Archive Storage can be replicated across several storage sites, thanks to the adoption of interoperable open standards. The service is optimised for infrequent access. Main characteristics: Stores data for long-term retention; Stores large amount of data; Frees up your online storage.
  
  B2SAFE: this is a robust, safe and highly available EUDAT service which allows community and departmental repositories to implement data management policies on their research data across multiple administrative domains in a reliable manner. A solution to: provide an abstraction layer which virtualizes large-scale data resources, guard against data loss in long-term archiving and preservation, optimize access for users from different regions, bring data closer to powerful computers for compute-intensive analysis.
  
  Selection of data to be archived
  
  To select the data that will be archived for the long term, it is important to consider the value of the data:
  
  Scientific Value of Data
  
  Are the data unique, non-reproducible (or at too high a cost)?
  Do the data have historical value, i.e., do they represent a landmark in scientific discoveries?
  Do the data include changes in processing methods, new standards, or create precedents?
  Do the data support ongoing projects or scientific trends?
  Are the data likely to meet future needs/directions of the scientific community (reuse potential)?
  Are the data likely to be cited or referenced in a publication?
  ...
  
  Data Quality Control Measures
  
  The quality and compliance of data collection must be controlled and documented. This may include processes such as calibration, sample or measurement repetition, standardized data capture, data entry validation, peer review...
  Quality, physical integrity of data (undamaged, readable...)
  
  Political / Institutional Considerations
  
  What is the policy of the funder, the institution?
  Are the data compliant with the institution's strategy?
  
  Legal / Statutory Considerations
  
  Is there a legal or legislative reason to preserve the data?
  Is there an obvious reason why the data might be used in litigation, public inquiries, police investigations, or any report or document that could be challenged in court?
  Are there financial or contractual obligations that require data preservation?
  
  Financial Considerations
  
  When considering data preservation, the cost of conservation (identified not only as storage, but also management, sharing, access, backup, and long-term data maintenance) must be weighed against evidence of potential data reuse.
  
  Archiving and Retention Rules
  
  Consult the research archives management reference guide, Association of French Archivists, Aurore Section.
  
  Source :
  NERC Data Value Checklist. https://www.ukri.org/publications/nerc-data-value-checklist/(opens in a new tab)
  DoRANum : Données de la recherche : apprentissage numérique [En ligne]. France : DoRANum; 2017. Le Référentiel de gestion des archives de la recherche [publié le 13/05/2019]. Disponible(opens in a new tab) i(opens in a new tab)ci(opens in a new tab).
  
  Preparation of the data to be archived
  
  Here is a checklist to prepare your data for long-term archiving:
  
  Selection of datasets: The datasets (and associated metadata) selected may be different from the shared datasets.
  
  Volume: Evaluate the volume of data and the necessary budget.
  
  Data treatment: Treatment of some data may be necessary. For example, personal data requires anonymization.
  
  File formats: Check the validity of data file formats according to the recommendations of the archive selected.
  
  Software: Document and perhaps also provide the software used to access the data.
  
  Metadata: Complete and enrich the metadata if necessary, according to the recommendations of the archive selected.
- Select activity 4. PLAY WITH THE 3 STEPS OF DATA STORAGE
  
  4. Play with the 3 steps of data storage
- Select activity EXTERNAL RESOURCES * UK Data Service, Delivering q...
  
  External Resources
  
  UK Data Service, Delivering quality social and economic data resources
  re3data, Registry of Research Data Repositories
  Google Dataset search, Dataset search engine
  EOSC Portal , European Open Science Cloud
  EGI , Archive storage
  B2SAFE by EUDAT, Long term storage service
  Repository Finder, DataCite
  FAIRsharing, ELIXIR infrastructure
  OpenDOAR, JISC
  Open Access Directory, Simmons University
  Core Certified Repositories, CoreTrustSeal
  
  Study materials
  
  FoDaKo_Data_durability_handout_final.pdf
  Infographic_Data_Management_v02-02_KH.pdf
Select section 2.8 Reuse and valorization of data

Collapse Expand
2.8 Reuse and valorization of data
- Select activity 1. REUSE AND ENHANCEMENT OF DATA IN THE DATA LIFEC...
  
  1. Reuse and enhancement of data in the data lifecycle
  
  This is the final step in the data life cycle but also the starting point of a new cycle if the data are reused for a new research project.
  
  Adaptation of Research data lifecycle – UK Data Service
  
  It is important to prepare the data for sharing in order to make it FAIR. This way other researchers can use them for new research projects.
- Select activity 2. REUSE AND CITATION OF DATA ON THE RESEARCHER'S ...
  
  2. Reuse and citation of data
  
  On the researcher's side
  
  In order to ensure that the research data they have generated can be reused under good conditions, researcher must adopt several good practices:
  
  On the user’s side
  
  There are several ways for a researcher to find reusable datasets:
  
  Links :
  
  re3data
  
  OAD
  
  OpenDOAR
  
  FAIRsharing
  
  In all cases, re-users must respect certain rules:
  
  Respect the intellectual property of the authors as mentioned in the licence
  
  Cite the data if the license requires it (it is recommended to always cite its sources)
  
  Link data to publications.
  
  Tip: there are tools to help you cite a dataset correctly, such as:
  
  The "Cite all versions" part proposed by the Zenodo repository
  
  The DOI Citation Formatter service which allows you to simply and automatically obtain a complete citation from a DOI.
  
  To complete, see the summary sheet on data citation proposed by DataOne:
  Source : https://dataoneorg.github.io/Education//lessons/08_citation/L08_DataCitation_Handout.pdf
- Select activity 3. DATA PAPERS AND DATA JOURNALS WRITING AND PUBLI...
  
  3. Data papers and data journals
  
  Writing and publishing a data paper is a good way for researchers to add value to their research data.
  
  A data paper is a publication that describes research datasets and associated metadata. It follows the same editorial process as traditional scientific articles:
  
  Elements common to classic articles (title, abstract, keywords…)
  
  Specific data elements (data types, formats, production processes and methods, metadata, reuse…)
  
  A data paper can be published in a data journal (journal dedicated to this type of publication) or in a classic journal that accepts data papers.
  
  Access to data from the data paper can be done in two ways:
  
  the data are integrated in the article and published as supplementary data
  
  the data are deposited in a repository and this is the persistent identifier (example: the DOI) that links data paper with data.
  
  An example of data papers in the domain of environment
  
  Two data papers were written on photographic data to study the evolution of vegetation phenology in different ecosystems across North America. The data are derived from automated digital images (taken every 30 minutes), collected via the PhenoCam network. The data are time series characterizing the color of the vegetation, including the degree of greening. The PhenoCam Explorer interface has been developed to facilitate data exploration and visualization, from which the user can also download data on a site-by-site basis. The images are also available in real time through the PhenoCam project web page.
  
  Tracking vegetation phenology across diverse North American biomes using PhenoCam imagery: This data paper published in 2018 presents version 1.0 of a series of datasets, all assembled, constituting about 750 years of observations (more than 15 million images produced by about 400 automated digital cameras). The data were deposited in NASA's ORNL DAAC repository.
  
  Tracking vegetation phenology across diverse biomes using Version 2.0 of the PhenoCam Dataset: This data paper published in 2019 presents version 2.0 of a series of datasets of 1783 site-years and 393 digital cameras located in different ecosystems in a wide range of plant functional types, biomes, and climates. The quality of the datasets has been improved. The data were deposited in NASA's ORNL DAAC repository.
- Select activity 4. DATA EXPOSURE AND VISUALIZATION In addition to ...
  
  4. Data exposure and visualization
  
  In addition to depositing data in a repository, and perhaps publishing a data paper, exposing the data is another good way to add value.
  
  Indeed, expose data in visual form (maps, graphs, etc.) via a platform is indicated especially in the case of large and complex data.
  
  Example 1
  
  These data (available on the ICOS Carbon Portal) comes from time series of values on hundreds of parameters. With visualization tools, we can see the evolution of CO2 concentrations over a year, coupled with the origin of the air mass. This would be very difficult to understand without data visualization.
  
  Example 2
  
  CoReA: a digital library has been created with the Omeka tool for the archaeological documentation of the CNRS French Centre Camille Julian. It allows to navigate easily through archaeological corpus and resources.
  
  Omeka is an open-source web publishing platform for sharing digital collections and creating media-rich online exhibits.
  
  From raw research data, the tool allows the creation of editorialized collections (structured, accessible, and visible on the web). The tool offers a great modularity of functionalities thanks to numerous plugins, and handles various multimedia objects (texts, images, sounds, videos).
  
  The tool offers several technical advantages:
  
  the interface is simple and intuitive;
  
  the metadata can be harvested, allowing in particular the referencing in other databases;
  
  an Omeka collection can be connected to other services thanks to a REST API.
  
  Example 3
  
  See the example of data visualization from the "Republic of Letters", where researchers map thousands of letters exchanged in the 18th century and can learn very rapidly what it once took a lifetime of study to comprehend (Seen in the Lesson 1 Unit 2: Data and Science): https://www.youtube.com/watch?v=nw0oS-AOIPE.
- Select activity 5. PLAY WITH DATA REUSE AND ENHANCEMENT INSTRUCTIO...
  
  5. Play with data reuse and enhancement
  
  Instructions: Place each card on one of the two zones identified "On the researcher's side" and "on the user's side".
- Select activity EXTERNAL RESOURCES * re3data, Registry of Research...
  
  External Resources
  
  re3data, Registry of Research Data Repositories
  OAD, Open Access Directory
  OpenDOAR, Global Directory of Open Access Repositories
  FAIRsharing, resource on data and metadata standards
  Omeka, open-source web publishing platform
  
  Study materials
  
  L08_DataCitation_Handout.pdf
Select section Evaluation 2 - Concepts and processes of RDM

Collapse Expand
Evaluation 2 - Concepts and processes of RDM
- Select activity ------------------------- [Badge OBERRED Research ...
  
  Test your knowledge on this first part of the introduction to data management and sharing.
  
  Success in this test is rewarded by an Open Badge! To pass this test, you must be enrolled in the course
  Check
- Select activity PLEASE ANSWER THESE 5 QUESTIONS RELATED TO THE LES...
  
  Please answer these 5 questions related to the lesson to test your knowledge.