Giter VIP home page Giter VIP logo

lterdatasampler's Introduction

Package Site R-CMD-check CRAN status CRAN RStudio mirror downloads

lterdatasampler

The mission of the Long Term Ecological Research program (LTER) Network is to “provide the scientific community, policy makers, and society with the knowledge and predictive understanding necessary to conserve, protect, and manage the nation’s ecosystems, their biodiversity, and the services they provide.” A specific goal of the LTER is education and training - “to promote training, teaching, and learning about long-term ecological research and the Earth’s ecosystems, and to educate a new generation of scientists.

The goal of this package is to provide a sampler to gather feedback from the community of what will be a larger package containing 28 datasets - one from each of the existing US LTER sites. Those datasets are subsets of the original data and have been updated - sometimes substantially - from the raw data. They are aimed to be useful for teaching and training in environmental data science. This content is thus not suitable for research and should only be used for teaching purposes.

We encourage you to explore existing LTER teaching and training initiatives, and the many other available LTER datasets which can be accessed via the Environmental Data Initiative. Please contact cited researchers directly to discuss using data for research purposes or in publication.

Installation

You can install the CRAN version of lterdatasampler with:

install.packages("lterdatasampler")

You can install the development version of lterdatasampler from GitHub with:

# install.packages("remotes")
remotes::install_github("lter/lterdatasampler")

The dataset samples

Dataset samples currently included in the package are summarized below; see individual Articles for data and source details. Note: the three letter prefix for each dataset indicates the LTER site (see full list of site abbreviations).

  • and_vertebrates: Records for aquatic vertebrates (cutthroat trout and salamanders) in Mack Creek, Andrews Experimental Forest, Oregon (1987 - present)
  • arc_weather: Daily meteorological (e.g. air temperature, precipitation) records from Toolik Field Station, Alaska (1988 - present)
  • hbr_maples: Sugar maple seedlings at Hubbard Brook Experimental Forest (New Hampshire) in calcium-treated and reference watersheds in August 2003 and June 2004
  • knz_bison: Bison masses recorded for the herd at Konza Prairie Biological Station LTER
  • luq_streamchem: stream chemistry data for the Quebrada Sonadora (QS) location part of the Luqillo tropical forest LTER site
  • ntl_icecover: Ice freeze and thaw dates for Madison, Wisconsin Area lakes (1853 - 2019), North Temperate Lakes LTER
  • ntl_airtemp: Daily average air temperature data for Madison, Wisconsin (1869 - 2019), North Temperate Lakes LTER
  • nwt_pikas: Pika observations for habitat and stress analysis at Niwot Ridge LTER, Colorado
  • pie_crab: Fiddler crab body size recorded summer 2016 in salt marshes from Florida to Massachusetts including Plum Island Ecosystem LTER, Virginia Coast LTER, and NOAA’s National Estuarine Research Reserve System

Which data sample should I use?

These data samples are selected because they have features we feel are commonly useful in introductory environmental data science and statistics courses.

In the table below, we list some introductory methods / skills, then share which data samples in this package we think are well-suited to use when teaching or learning them! It is not comprehensive - there are many different analyses & skills that these data samples would facilitate. Here we highlight a few that we think would be commonly useful

Recommended data samples for introducing selected topics

Data sample For example you could:
Linear relationships `pie_crab` Model the relationship between fiddler crab size and latitude using `pie_crab` , while learning about Bergmann's Rule!
`ntl_icecover` Investigate the relationship between winter temperatures and ice cover duration for Wisconsin lakes using `ntl_icecover`
`hbr_maples` Explore seedling height-mass relationships for sugar maples using `hbr_maples`
Non-linear relationships `knz_bison` Model the relationship between bison age and mass for male and female bison using `knz_bison`, for example estimating parameters in the Gompertz model
`and_vertebrates` Model the length-mass relationships for cutthroat trout and salamanders in Mack Creek, Oregon
Time series analysis `arc_weather` Explore seasonality, wrangling dates, or practice forecasting using daily meteorological records from Toolik Station, Alaska
`luq_streamchem` Investigate the impact of a hurricane on stream water chemistry
Spatial data introduction `nwt_pikas` Introduce basics of spatial data (e.g. CRS, projections) and tools for working with spatial data by visualizing pika locations at Niwot Ridge in the Colorado Rockies
Comparing groups `hbr_maples` Compare sugar maple seedling heights in previously calcium-treated versus untreated watersheds using `hbr_maples`, using the exercise as an opportunity to think about acid rain and soil acidification
`and_vertebrates` Explore differences in size and abundance of cutthroat trout and salamanders in old growth versus previously clear cut forest sections (2 groups) or in different conditions (> 2 groups, e.g. pool, cascade, riffle) of Mack Creek, Oregon

How to provide feedback

The best way to provide feedback on this package is to open an issue and assign the feedback label. Thank you!

Acknowledgements

Thank you to the amazing students who contributed to this project: Sam Guo, Adhitya Logan, Lia Ran, Sophia Sternberg, Karen Zhao as part of their UCSB Data Science capstone project. Thank you also go to their Course Advisor Prof. Sang-yun Oh.

People / organizations who supported this project:

  • LTER Network Office
  • LTER Information Managers
  • LTER Education Committee
  • All the LTER Researchers and Site PIs
  • Cyber-infrastructures: EDI and DataONE

We gratefully acknowledge all authors and contributors of the roxygen2, usethis, pkgdown, devtools, tidyverse and metajam packages. This website relies heavily on themes created by Dr. Desirée DeLeon and Dr. Alison Hill.


lterdatasampler's People

Contributors

actions-user avatar adhil0 avatar allisonhorst avatar brunj7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lterdatasampler's Issues

Typo in `and_vertebrates` Documentation

Hi! I am using the and_vertebrates data (and loving it!) but I realized there is a small typo in the documentation for that dataset. The column clip lists the different abbreviations for which cutthroat trout fin was clipped.

The abbreviations that include a "V" say that this refers to the "ventrical fin" but I believe that fish have a "ventral fin". I could definitely be wrong but I wasn't able to find results on Google for 'trout + "ventrical fin"' so I think it may be a typo.

See ?lterdatasampler::and_vertebrates and scroll down to clip (third from last column) to see the bit I'm talking about.

Thanks to y'all for putting together a great package though and I'm excited for it to be submitted to CRAN!

We are on CRAN!!

Yep, lterdatasapler is on cran 🎉

  • Add cran badge
  • Update instructions on how to install the package
  • Add a git tag for the release
  • Clean up branches
  • Add a developer branch as a buffer

Need to clean the README from the gt code

the README.md can not render the code generated by the gt table package.

We decided to thus create a dedicated page for the matrix on the website and switch to a more simple table for the README

NTL - Ice Cover improvements

From Hillary --

A few minor things:

  • In the scatterplot filter line, ice_duration is not a variable, it's avg_ice_duration. Also maybe the x-axis should read 'average air temperature' so it's not confused with water temp (I realize you have a title)
  • When you look at the objects, can they be in their own code box. I ask this because when you copy the code it's annoying to get the output copied as well.
  • Be consistent on avg vs ave
  • Not sure it's a great idea to include Wingra in the average given it has a longer duration and a bunch of missing data

Suggestion on the comparison between air temp and ice duration. Taking an annual air temp is not the best because ice freezes over the winter, so it's more useful to look at water year. Furthermore, it's mostly impacted by fall and spring temperatures, so taking mean Dec:April temps gives a lot better sense of the relationship. Plot/code below.

library(lubridate)

ntl_airtemp_avg <- ntl_airtemp %>%
  mutate(hydroyear = if_else(month(sampledate) < 10, year-1, year)) %>%
  filter(month(sampledate) %in% c(12,1:4)) %>%
  group_by(hydroyear) %>%
  summarise(ave_air_temp_adjusted = mean(ave_air_temp_adjusted))

ntl_joined_avg <- ntl_icecover_avg  %>%
  left_join(by = c("year" = "hydroyear"), ntl_airtemp_avg)

ggplot(data = ntl_joined_avg ,
       aes(y = avg_ice_duration, x = ave_air_temp_adjusted)) + geom_point(alpha = 0.8) +
theme_minimal() +
labs(
  title = "Air Temperature and Ice Duration of Lakes in Madison, WI",
  y = "Ice Duration (Days)",
  x = "Mean Air Temperature Dec-April (Celsius)",
  subtitle = "North Temperate Lakes LTER"
) +
geom_smooth(
  method = "lm",
  color = "black",
  se = FALSE,
  size = 0.3
)

Image size

Currently, the images used for illustration purposes are of very variable size. We need to lower the size of the largest ones

dataset from SBC, MCR

Hi Julien -
Are you collecting dataset suggestions? you might be interested in this dataset that was explicitly created for education: https://portal.edirepository.org/nis/metadataviewer?packageid=knb-lter-sbc.88.1

Its organization as an excel spreadsheet probably won't work for any automated ingestion you plan, but the original SBC data would be part of this dataset:
https://portal.edirepository.org/nis/metadataviewer?packageid=knb-lter-sbc.6003.5
There are similar datasets from MCR; they would be able recommend which one.

Reach out to IM Managers

  • KNZ - Konza Prairie Bison (Allison)
  • AND - Andrews Vertebrates (Allison)
  • HUB - Hubbard Brooks Sugar Maples (Allison)
  • PIE - Plum Island Ecosystem (Julien)
  • NWT - Niwot Pikas (Julien)
  • NTL - North Temperate Lakes Icecover (Julien)

A cool logo

We had talked about a box with animals packed in it and some starting to get out (like a crab for example)

CAP - Birds

The bird dataset could be a good example of how to use KML and csv files to create geospatial information and showcasing extrapolation (points to grid)

Here is an example we could build on: https://static.sustainability.asu.edu/docs/explorers/lesson-plans-new/15-min-Bird-Distribution.pdf

We could select one species that lives in the city and one outside the city with the impact of urban environment on birds as a background story.

Dataset: https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-cap&identifier=46

Allison to-dos (late January)

  • Reach out to HBR Sugar Maples folks
  • Reach out & finalize bison data
  • Add acknowledgements sections to AND, HBR, KNZ
  • Draft hex sticker
  • Start with methods matrix (maybe sketch this?)
  • Add Toolik Weather data

Link Reference (man) and articles (vignettes)?

README.Rmd: Links to reference pages and vignette pages are formatted the same (pie_crab) and do not link to each other. I spent a long time trying to find the metadata from the vignette page for a dataset without realizing that each dataset had both pages (foolish in retrospect!)

Add a footer with logos

Add a footer with LTER, Bren, and NCEAS logos at least to the readme and potentially to the pkgdown site

update pie crab vignette

  • Make variable names more intuitive
  • Update wrangling formatting
  • Edit text
  • Finalize visualizations

Allison to-dos 2022-02-17

  • Suppress messages in and_vertebrates
  • Work on logo
  • Add acf() and maybe decomposition to arc_weather
  • Seriously make the homepage table!
  • Reach out to Plum Island LTER IM re: vignette feedback
  • Incorporate feedback from ARC LTER when received
  • Reach out to Andrews LTER IM re: vignette feedback
  • Move topics table to separate tab on pkgdown site
  • Review blog post for launch

add message for vignettes (point to pkgdown site)

Expecting that we'll add the vignettes to buildignore (so that we can include images, dependencies, etc. in vignettes), have a message appear when vignettes are called pointing users to the pkgdown site?

Shorten vignettes name

The vignette names are too long to look good on the website.

2 options:

  • shorten the title to the vignette to theme + site, e.g. Lake Ice Cover -- NTL, and add a subtitle
  • modify the pkgdown yaml file to overwrite the vignette's name

Let me know if you have a preference @allisonhorst !

data from BLE?

Hullo Allison and Julien,

BLE is wondering if you've already gotten data from us or if not, you'd like to work together on finding a dataset. If you've gotten data then disregard this, we probably forgot! Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.