These are the specific things I highlighted in my written teacahing notes when I taught this section for the Arctic Data Center training in Oct 2020. These notes are meant to complement, not replace, the written material.
Introduction
Who you are and what you do for NCEAS/Arctic Data Center
Going to go through best practices for data and metadata, then go through an example of creating metadata and submitting to a repository.
What is metadata?
Mentimeter questions - for word cloud:
- Before we jump into the lesson, what do you think some best practices are for your data and your metadata?
- How often would you say you and/or your lab follow those best practices?
- What gets in your way or prevents you from following "best practices" for data / metadata?
Quick discussion about the answers to all three questions.
Transition - hopefully this lesson will give you some tools to circumvent the things that get in your way.
Overview
Good data management is important for all types of data - small or large.
Don't need a fancy database system to have well formatted data.
First - why both? Why is this important?
Start early and often for good data management but it's never too late to go back to your data.
Organizing Data
High points of the linked papers:
- Use a scripted program
- Open file formats - computers change but open formats will live on
- Keep your raw data
- Descriptive names
- Plain text
With these guidelines, others can start with your raw data and take the same steps as you did.
Design your data to be tidy.
Metadata
We defined metadata earlier in the lesson as data about data.
Good metadata contains lots of details so it's good to compile this info as you go.
Go through bibliographic, discovery, interpretation, data structure, and rights details, emphasizing why each piece is important:
- Biblio - you want credit for this data
- Discovery - you want others to discover your data so it can be used in more studies
- Interpretation - you want your data to be interpreted correctly so it isn't used out of context
- Structure - define variables in your metadata so that your data can be found by others who want to use it
- Rights - you want others to use your data appropriately
EML is what we'll be working with today.
Data Identifiers
DOIs refer to the exact version you use even if later on you need to update it - this helps us track uses of the dataset, like views, citations, and downloads.
Data Citation
Talk about data citation at the Arctic Data Center as why this is important.
Provenance
Many repos want to preserve more than just data and metadata - we're one of those, and we're able to preserve software and provenance as well.
Does anyone know what provenance is in the context of data and metadata?
Preserving provenance and code is a cool way to help researchers build on the work you did - standing on the shoulders of giants.
This is why one of the best practices is to clean your data on your script programmatically rather than just deleting cells from Excel.
Data Documentation and Publishing
Reusing data is the goal but we can't get there without sharing data, and we can't get there without a good data management plan.
Data repositories
Highlight that Github isn't an archival location - researchers should want a repo that gives them a DOI for their data.
Highlighted that we're working on a game to help researchers learn more about what repo to choose for their data, as well as building a centralized hub of resources. Not ready yet, so feel free to skip.
Metadata
Fundamentally important for future understanding of your data.
It takes time to preserve data well but it's worth the effort - and it's easier if you do it as you go. Don't think about it as doing the minimum required steps - you want others / future you to really understand the data.
Structure of a data package
Identifiers are important because the help the researcher cite the exact version of the dataset used.
Transition - we are a member of the DataONE federation, so let's zoom out from thinking about the Arctic Data Center and think about the larger repository landscape.
DataONE
Transition - Now, onto the hands on piece. We're going to randomly assign you to breakouts and an NCEAS staff member will walk you all through uploading some sample data into the Arctic Data Center.
Hands on exercise
Check for completeness when everyone's logged in with their ORCID and at other points throughout.
Ask for questions throughout as well.