Giter VIP home page Giter VIP logo

nceas-training's Introduction

NCEAS Training

This repository contains lessons used in NCEAS training events. The lessons are all written in RMarkdown and set up so that they build as a bookdown.

To contribute, see our contributing document

Customizing Materials

To create a custom book for a specific training, create a new branch for the training event (eg 2019-11-RRCourse). In that branch, you can make changes to _bookdown.yml to specify which content to include, and you can modify chapters. The built book should be hosted on another repository specific to that training event, not this repository. Please do not commit built versions of the book. Additionally, when adding material please carefully consider file size. PDF presentations should be compressed, and data files, if absolutely necessary, should be small (< 1MB).

Updating Materials

Changes to chapters that would be beneficial to other training events should be merged back into the master branch.

nceas-training's People

Contributors

aebudden avatar amoeba avatar andypbarrett avatar angiegarciaa avatar benmarwick avatar brunj7 avatar camilavargasp avatar carmengg avatar dlebauer avatar dvirlar2 avatar erinlynmclean avatar hdolinh avatar jeanetteclark avatar jessicaguo avatar justinkadi avatar kameyer avatar maggieklope avatar mbjones avatar nhchavez avatar rcurty avatar samanthacsik avatar tracykteal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nceas-training's Issues

Remaining NEON edits / changes

  • General: All external links to open in a new window
  • Section 1.1: Video frames not rendering in online. Fine in local copy. Video in Section 4.4 is fine both online and in local build
  • Section 3.1.3.2: Have paragraphs start under the bold headers
  • Section 3.1.4: Create visual separation between the two exercises
  • Section 3.1.4: Horizontal division bar is sitting within the exercise box
  • Section 3.3.2.7: Add some buffer between the RHS of the image and the text
  • Section 3.3.2.9: Missing image / code error
  • Section 3.3.2.9: Image credit separated from image location
  • Section 3.4.1: Capitalize first title word
  • Section 3.4.3: Typo 'resources' under DLC image

Session 8 fixes

  • 8.1.4 Joins in dplyr notes using read.csv but in the code chunk it uses read_csv
  • stringsAsFactors is no longer needed in R >= 4.0.0 (also noted in #89)
sites_df <- data.frame(site = c("HAW-101",
                                "HAW-103",
                                "OAH-320",
                                "OAH-219",
                                "MAI-039"),
                       stringsAsFactors = FALSE)

Other not as important items, and more of a stylistic choice:

  • show that you can actually separate mutates with a comma instead of adding an additional call to mutate:
catch_clean <- catch_data %>% 
  mutate(Chinook = ifelse(Chinook == "I", 1, Chinook)) %>%
  mutate(Chinook = as.integer(Chinook))

catch_clean <- catch_data %>% 
  mutate(Chinook = ifelse(Chinook == "I", 1, Chinook),
               Chinook = as.integer(Chinook))
  • as an aide note - dplyr::if_else vs. base::ifelse

reorder git setup into the git lessons

In the past, we used to introduce the concepts behind github before we set up the tools. During today's arctic lesson, we set up the complex git setup before explaining any of it. We should rearrange the git setup to be included in the git intro lesson itself, which will allow it to follow some of the intro material. It will also allow us to shorten the intro setup section to get into the more interesting RMarkdown sooner.

remote lesson plans: Data modeling

Delivery Format

proposed

  • Synchronous presentation (30 min)
  • Small group exercise in breakout rooms with facilitators (30 min)
  • Main room takeaways from breakout sessions (15 min)

Resources Needed

  • Zoom with breakout rooms
  • HackMD
  • Excalidraw?

tidy data for social science surveys

Need to write a lesson on how to create tidy data structures out of survey data

key points to hit:

  • entities, observations, variables (survey population, individual, question response)
  • consistent coding of variables
  • open formats

our example dataset might feature:

  • excel format with tabs?
  • inconsistent coding variables
  • other?

@mbjones would like your input here

remote lesson plans: Collaboration, authorship, and data policies

Delivery Format

Proposed

  • Synchronous presentation (30 min)
  • Small breakout discussion/questions with facilitators (15 min)
  • Regroup to deliver anything that came up in breakout discussions to entire group (15 min)

OR

  • Synchronous presentation (30 min)
  • Addtl. discussion of questions that came up in HackMD (30 min)

Resources Needed

  • Zoom with breakout rooms
  • HackMD

Session 9 Fixes

  • missing a return in this section? - ggplot vs base vs lattice vs XYZ…
* ggplot2 All of them work! I use base graphics for simple, quick and dirty plots. I use ggplot2 for most everything else. ggplot2 excels at making complicated plots easy and easy plots simple enough.
  • consider adding some help/ hints for the 9.2.2 challenge question? For example, adding functions they could use and maybe to try starting with getting the year?
  • might be worth differentiating the color and fill varaibles
  • add section on saving with ggsave ?

Replace lead-in image in Session 5 (git collab+conflict)

The first image in session 5 (git-collaboration-conflicts.Rmd, really) is a picture of an RStudio git commit dialog:

Screen Shot 2021-07-08 at 6 18 44 PM

but I think we meant to have a picture of the hub-and-spoke workflow described in the preceding paragraphs. The image I'm thinking of is

It looks like it got yoinked out by accident maybe? in b680927#diff-bf084fad119baf4391b358d39b465153ccba987417166c74e58d605d91b09e46. @jeanetteclark what do you think?

I'm going to make the change once I hit submit on this so the book builds for tomorrow but I'll leave this open for comments.

Remote lesson plans: Git conflicts

Delivery Format

proposed

  • Asynchronous recording (1 hour)?

  • Breakout room practice session (1 hour)
    OR

  • Office hours (2 person hours)

Resources Needed

  • Screencasting software

Alexandra Etheridge (USGS) training inquiry

on Dec 30 Alexandra inquired about hosting a training for USGS in Sacramento

She is currently on furlough (federal shutdown). Need to follow up with estimated costs when government reopens

Session 7: Data Modeling Fixes

  • 7.1.4 - spelling wold - Note that, should one encounter a new species in the survey, we wold have to add new columns to the table. This is difficult to analyze, understand, and maintain.
  • 7.1.6 Challenge - spelling diatram -Draw a new ER diatram showing this re-designed data structure
  • 7.1.6 Challenge - still references excalidraw, should be invision? - Using the Excalidraw live session your instructor start
  • 7.1.6 Challenge Maybe mention that the first identifier column is created automatically by the table and not part of the original data?

Fix copying and pasting

Copying and pasting works inconsistently for some people. Not sure which combinations of OS/browser don't work but I'll test. Make sure we can copy the text in the block and click the copy button too.

Integrate my git intro slides into course book somehow

The intro slides I've given the last two times for the git module help me out a lot in teaching the module and setting up the why and what of git but they aren't integrated into the lesson.

I'd like to take a look at integrating them in one or more of the not-mutually-exclusive ways:

  • Modifying the introductory text (if it even needs it) to get across some of the same messaging
  • Providing annotated slides in the appendix
  • Linking a PDF of the slides from the lesson and storing the PDF in the book repo

remote lesson plans: creating functions

Delivery Format

proposed

  • Synchronous presentation (30 minutes)
  • Breakout room exercise (45 minutes)
  • Main room wrap up (15 minutes)

Resources Needed

  • Zoom with breakout rooms
  • HackMD

remote lesson plans: Intro to ADC policies

Delivery Format

Proposed

  • Synchronous presentation (30 min)
  • Small breakout discussion/questions with ADC facilitators (15 min)
  • Regroup to deliver anything that came up in breakout discussions to entire group (15 min)

OR

  • Synchronous presentation (30 min)
  • Addtl. discussion of questions that came up in HackMD (30 min)

Resources Needed

  • Zoom with breakout rooms
  • HackMD to track questions as they come up

Jeffrey Blanchard (UMASS) training inquiry

Hi,

I am interesting in learning more about hosting an "Trainings in
Environmental Data Science" workshop here at UMass for the broader
community in the area of New England. It is similar to the modular
workshops that are graduate students are requesting.

Regards, Jeff

remote lesson plans: Data visualization (ggplot/leaflet)

Delivery Format

proposed

  • Asynchronous presentation (30 minutes)
  • Office hours

OR

  • Synchronous presentation (30 minutes)
  • Breakout group exercise (30 minutes)

Resources Needed

  • Zoom with breakout rooms
  • HackMD
  • Screencasting software

remote lesson plans: Submitting metadata

Delivery Format

proposed

  • Asynchronous presentation (30 minutes)

Resources Needed

  • Screencasting software

NB: This can also be a resource for the Arctic Data Center in general

Teaching Notes: Best Practices Data and Metadata

These are the specific things I highlighted in my written teacahing notes when I taught this section for the Arctic Data Center training in Oct 2020. These notes are meant to complement, not replace, the written material.

Introduction

Who you are and what you do for NCEAS/Arctic Data Center
Going to go through best practices for data and metadata, then go through an example of creating metadata and submitting to a repository.
What is metadata?
Mentimeter questions - for word cloud:

  • Before we jump into the lesson, what do you think some best practices are for your data and your metadata?
  • How often would you say you and/or your lab follow those best practices?
  • What gets in your way or prevents you from following "best practices" for data / metadata?

Quick discussion about the answers to all three questions.
Transition - hopefully this lesson will give you some tools to circumvent the things that get in your way.

Overview

Good data management is important for all types of data - small or large.
Don't need a fancy database system to have well formatted data.
First - why both? Why is this important?
Start early and often for good data management but it's never too late to go back to your data.

Organizing Data

High points of the linked papers:

  • Use a scripted program
  • Open file formats - computers change but open formats will live on
  • Keep your raw data
  • Descriptive names
  • Plain text

With these guidelines, others can start with your raw data and take the same steps as you did.
Design your data to be tidy.

Metadata

We defined metadata earlier in the lesson as data about data.
Good metadata contains lots of details so it's good to compile this info as you go.
Go through bibliographic, discovery, interpretation, data structure, and rights details, emphasizing why each piece is important:

  • Biblio - you want credit for this data
  • Discovery - you want others to discover your data so it can be used in more studies
  • Interpretation - you want your data to be interpreted correctly so it isn't used out of context
  • Structure - define variables in your metadata so that your data can be found by others who want to use it
  • Rights - you want others to use your data appropriately

EML is what we'll be working with today.

Data Identifiers

DOIs refer to the exact version you use even if later on you need to update it - this helps us track uses of the dataset, like views, citations, and downloads.

Data Citation

Talk about data citation at the Arctic Data Center as why this is important.

Provenance

Many repos want to preserve more than just data and metadata - we're one of those, and we're able to preserve software and provenance as well.
Does anyone know what provenance is in the context of data and metadata?
Preserving provenance and code is a cool way to help researchers build on the work you did - standing on the shoulders of giants.
This is why one of the best practices is to clean your data on your script programmatically rather than just deleting cells from Excel.

Data Documentation and Publishing

Reusing data is the goal but we can't get there without sharing data, and we can't get there without a good data management plan.

Data repositories

Highlight that Github isn't an archival location - researchers should want a repo that gives them a DOI for their data.
Highlighted that we're working on a game to help researchers learn more about what repo to choose for their data, as well as building a centralized hub of resources. Not ready yet, so feel free to skip.

Metadata

Fundamentally important for future understanding of your data.
It takes time to preserve data well but it's worth the effort - and it's easier if you do it as you go. Don't think about it as doing the minimum required steps - you want others / future you to really understand the data.

Structure of a data package

Identifiers are important because the help the researcher cite the exact version of the dataset used.
Transition - we are a member of the DataONE federation, so let's zoom out from thinking about the Arctic Data Center and think about the larger repository landscape.

DataONE

Transition - Now, onto the hands on piece. We're going to randomly assign you to breakouts and an NCEAS staff member will walk you all through uploading some sample data into the Arctic Data Center.

Hands on exercise

Check for completeness when everyone's logged in with their ORCID and at other points throughout.
Ask for questions throughout as well.

mermaid syntax for ER diagrams

Consider using Mermaid for ER diagrams and other technical drawings in the lessons.

Here's an example ER diagram from the data modeling lesson:

  erDiagram
    Site ||--o{ SpeciesObservation : contains
    Site {
        int site
        string name
        float temp
    }
    SpeciesObservation {
        int id
        string date
        int site
        string spcode
        string height
    }

The syntax should also allow specification of primary and foreign keys, but when used, I see GitHub rendering issues, so this needs to be explored further. For example:

  erDiagram
    Site ||--o{ SpeciesObservation : contains
    Site {
        int site PK
        string name
        float temp
    }
    SpeciesObservation {
        int id PK
        string date
        int site FK
        string spcode
        string height
    }

I think the cause for this issue has been identified upstream in mermaid issue mermaid-js/mermaid#2548.

Sara Miller (ADFG) training inquiry

Sara originally wanted us to come to the AFS meeting to teach our course but the timeline was too short

Need to follow up with her with a proposed budget/location to see if ADFG would be interested in hosting a training

Remote lesson plans: Introduction to git

Delivery Format

proposed

  • Synchronous presentation (45 minutes)
  • Breakout room practice (30 minutes)
  • Main room discussion and wrap up (15 minutes)

Resources Needed

  • Zoom with breakout rooms
  • HackMD

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.