Giter VIP home page Giter VIP logo

ioos / bio_data_guide Goto Github PK

View Code? Open in Web Editor NEW
45.0 24.0 21.0 56.93 MB

Standardizing Marine Biological Data Working Group - An open community to facilitate the mobilization of biological data to OBIS.

Home Page: https://ioos.github.io/bio_data_guide/

License: MIT License

TeX 0.37% R 0.15% HTML 0.01% CSS 0.01% Jupyter Notebook 99.13% MATLAB 0.33% Python 0.01%
data data-management marine-biology marine-data tutorials darwin-core obis

bio_data_guide's People

Contributors

7yl4r avatar albenson-usgs avatar bbest avatar br-johnson avatar daltonkell avatar dylan-pugh avatar emiliom avatar gbaillie-onc avatar mathewbiddle avatar mlonneman avatar mstoessel avatar timvdstap avatar zmonteith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bio_data_guide's Issues

Preferred technology for building and contributing to the guide

I've been trying to build the book using Rstudio and it hasn't been the easiest experience.

My thought was that if I could get this GHA (on my fork) to work , we wouldn't have to build the site locally then push. It would all be automated, making it easier to contribute material.

Well, I keep running into issues when compiling 03-application.Rmd and I have no clue how to debug bookdown.

So, I'd like to ask the users and contributors to the guide: What is a good solution to create a guide and facilitate contributions?

Is everyone comfortable with bookdown? Should we look at github pages? jupyterbook? gitbook? something else?

Open for discussion.

hakai_salmon_data: event numbers different

There are 695 eventIDs in the event file but only 435 in the occurrence file. Is it that not all the events have occurrences associated with them? If yes, why is that? Is it delineated in the data somewhere? Should we consider limiting the event file to only events with occurrences?

Expand ERDDAP documentation to provide MBON best practices in configuration

We should be documenting some of the nuances when serving DwC data via ERDDAP.

For example, if the date is only captured to the year, the ERDDAP configuration for that time variable should only treat it as a string, not a time in seconds since 1970.

I know there are other items, similar to above, which should be documented. I'll look at some of my notes and compile them here.

Here is the current section that should be updated: https://ioos.github.io/bio_data_guide/intro.html#erddap

add relational diagrams for example datasets

I have previously used quickDBD to create diagrams of how the occurence, MoF, event, etc files/tables connect with each other. I find these views tremendously useful and would like to encourage including them for the example datasets.

Then again... does anyone else find database diagrams useful or is it just me?

bare-minimum example (tier 0):

image

more detailed example:

image

hakai_salmon_data: lat/lons missing

There are 138 events missing latitude and longitude in the hakai_salmon_data event file. These events and their associated occurrences can't be included in OBIS. Is it possible to find out what these geolocations are?

hakai_seagrass: # events = # occurrences?

@zmonteith I may have done something wrong, I didn't see the output files in the hakai_seagrass folder so I ran the R script to create the seagrassEvent and seagrassOccurrence data tables, but there are the same number of observations in the event file and the occurrence file. Is this accurate? You only have one species at each collection event?

hakai_seagrass: emof table

@zmonteith I couldn't get the emof section of code to work on my machine. Can you load the hakaiSeagrassDwcEmof.csv into the folder so I can review it?

enable github pages?

@mwengren Hi Micah! Would you mind enabling github pages for this repository? I'd like to host our Standardizing Marine Bio Date Guide there. Thanks!

Annotated bibliography for publications using data from OBIS/GBIF

During our December call, we discussed the fact that what's most beneficial for sharing the data (and creating the metadata) was thinking about what analyses would be using these data and what they would need to know. In order to get a better understanding of this, Brett suggested we create an annotated bibliography showcasing how the data are being used.

Add a section on sharing data with OBIS

Once you have a DwC-A compliant package, how do you submit to OBIS/GBIF?

Right now, one process is send an email with attachments to the appropriate person.

Can we identify a more fluid, open, less intensive process?

Land of the lost (vocabulary terms)

Sometimes despite our best efforts, vocabulary terms that match our data are nowhere to be found. List them here so we can work with BODC or another vocabulary entity to get them added. Or maybe someone will know where to find the term you need and can match you up! Either way list your lost terms here.

Add instructions for building the Standardizing-Marine-Biological-Data book

@Br-Johnson In CONTRIBUTING.md there is this statement:

I build all the chapters into the book using bookdown and the technologies wraped by that.

Could we include the instructions for building the book? Maybe stick those in the wiki page?

As an after thought, I'm wondering if there is an easier technology to use, like JupyterBook, which could facilitate more contributions?

Or, maybe we setup GitHub actions to build the chapters?

hakai_salmon_data: occurrence file

The occurrence file seems to have extraneous information from the WoRMS lookup included that don't align to Darwin Core terms (eg isMarine, isBrackish, Qualitystatus, TSN, Citation, etc). I would recommend removing these from the occurrence file unless you have a reason to keep them in.

Links to resources

hakai_seagrass: occurrenceID

@zmonteith there are 3097 observations in the occurrence file but only 3013 unique occurrenceIDs. There needs to be a unique occurrenceID for each observation.

List of hakai seagrass measurements and controlled vocabularies

On the December monthly call, Zach offered to create a csv of the measurements for the hakai data and the crosswalk to the BODC NERC vocab for each of them as well as any that do not have a match in NERC. Once we have a list of missing terms we will share them with BODC so they can add them to the vocabulary.

hakai_seagrass: WoRMS output info

@zmonteith I think we should drop most of the columns from the WoRMS output. All we really need from there is the LSID which is the scientificNameID. We can keep the taxonomic hierarchy (Kingdom, Phylum, Class, etc) but everything else should be dropped out (isMarine, ScientificName_accepted, Authority_accepted, etc). Again this might just be an artifact of me running the code since the output csvs aren't in the folder.

Create a how-to document for navigating the NERC Vocabulary

During our December call we discussed the difficulties and vagaries in navigating the NERC Vocabulary. It's still unclear to me at least what the best way to find the term you are looking for is. It's not as simple as doing a search for that term. Or at least I haven't found it to work that way. It would be great to create a how-to document on navigating the NERC Vocabulary so others don't need to flail around as we have searching for what we're looking for (and making the final determination that it can't be found if the term is not in there).

OBIS most-used vocabs for habitat

On our discussion today I mentioned a query I did to visualize use of the "habitat" term. I graphed most common "habitat" values for global anthazoa and for all species within the Florida Keys National Marine Sanctuary.

image

The code I used to make this bar graph is in marinebon/obis2index/obis2index/top_column_values.py
Taking this to the next step I think it would be worth seeing which vocabularies for "habitat" are most common.

The OBIS API contains a way to query for records using a specific vocabulary (thanks @MathewBiddle); example:

https://api.obis.org/dataset?measurementtypeid=http://vocab.nerc.ac.uk/collection/P01/current/SDBIOL05/

It may be useful to see a graph of which vocabs are most used for a particular subquery - like for a particular species, region, or term subset.

The OBIS MOF viewer may also be useful for exploring this topic more quickly:

https://mof.obis.org/

I think this data will help us see which "habitat" vocabulary this group should recommend. Any code used could be generalized to do the same assessment for other MoF terms.

hakai_salmon_data: occurrenceID missing from eMoF file

occurrenceID is blank for all the rows in the measurement or fact file. Is this accurate? None of the measurements are for particular occurrences? They are only measurements of the events? This doesn't seem right as the measurementType length must be for particular occurrences?

hakai_seagrass: site_id and survey

@zmonteith I noticed site_id and survey are in the event output and site_id is in the occurrence output. Are those there just to create the eventID? Maybe we should remove them before writing the csv? Or should we try to find a Darwin Core term to align them to?

Change repository name?

Is bio_data_guide still appropriate? Per our conversation last month, the scope of this repo is strictly standardizing to DwC?

If that's true, should we change the repo name to dwc_bio_data_guide to reduce confusion?

Add fake data that goes with IOOS_DMAC_DataToDwC_Notebook_event.R

Abby needs to add the fake data that goes with the R script IOOS_DMAC_DataToDwC_Notebook_event.R. That R script was created for the Ann Arbor training event as an example of how to crosswalk a dataset to Darwin Core. The R script is there but the data are not so it's not possible to run the script locally.

Repository name change/restructure?

Hi All,

This is a great resource! Thank you all for spending the time to build this. As I'm reviewing the content in this repository, I have some questions. From what I can tell, this repository is primarily focused on providing examples and resources for translating biological data into DwC. Is that the intention? Or, is the plan to expand this repository to cover all types of biological data (eg. acoustics, video, eDNA, etc.)? Some of the language indicates it's the latter, but the content doesn't reflect that.

My personal opinion would be to keep this as is but change the name and some of the content to explicitly state this repository is a "DwC data guide". Then, as other pieces of the biological landscape build out, we can create more data guides.

Any thoughts?

hakai_salmon_data: occurrenceID

occurrenceID column name is missing an "r" in the occurrence file. It's a silly thing I know but might as well fix it :-)

Document different tiers of data/metadata needed/wanted

During our December call, we discussed metadata. OBIS (and GBIF) use EML metadata for their datasets. While extensive metadata is always best, sometimes this can prevent data from being shared because the task of documenting the metadata becomes too daunting. We need documentation that lays out what is required, what is really good to have, and what is nice to have if there is time. Similarly we need this for the data and the associated Darwin Core fields.

mention call / gdoc in readme?

Should we add something about the monthly call or the google doc notes in the readme?
Just in case someone stumbles onto this repo without first knowing about the call.

hakai_salmon_data: measurementUnit and measurementUnitID blank

There are no data in the measurementUnit or measurementUnitID columns in the measurementOrFact file. These are key pieces of information that need to be filled in. It would also be good to add measurementTypeID so that people can be clear what is meant by "length" or "weight".

Resolution of coordinates for transects and quadrats

For most protocols, MarineGEO collects two sets of coordinates: the beginning and end of a transect. I'm curious what, if any, processing will be necessary for providing the decimalLatitude and decimalLongitude values for transect and quadrat event observations in our event core tables.

  1. For a line, such as a transect, should decimalLatitude and decimalLongitude represent the centerpoint? Because "coordinateUncertaintyInMeters" is the radius of the smallest circle encompassing the entire feature, I assume this is the case. Either way, I'm planning on including a WKT footprint based on the beginning and end coordinates along with the lat/long values.

  2. Do we need to calculate an estimated decimalLatitude and decimalLongitude for quadrats or is providing the coordinates at the transect level enough?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.