Giter VIP home page Giter VIP logo

issues's People

Contributors

mslarae13 avatar ssarrafan avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

issues's Issues

List and prioritize data sources appropriate for semi-automated biosample ingest pipelines

As part of the biosample ingest squad, we'd like to enumerate sources that we could build pipelines for in the semi-automated ingestion system. For each source it would be good to have a sense of:

  • what's the priority of ingesting the data
  • what are the technical details around fetching the data
  • what information would need to be provided by a data wrangler to start an ingest

Design validation for IDs

Is your feature request related to a problem? Please describe.
Trying to make sure identifiers are accurate and in the right slots

Describe the solution you'd like
Design a way to validate and verify that correct IDs are used in their designated alternate identifiers slots

Describe alternatives you've considered
Will be part of design

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Who will use this feature/enhancement? Team will use to make sure identifiers are accurate
When will they use it? As needed
How will they use it? Mark can you add something here?
How will they test it to make sure it's working? Mark can you add something here?
Is the request achievable? During one sprint? Design only for one sprint
What is your definition of done for this request? That there is a design for how to do the verification and validation of identifiers in designated slots

Ambassador Playbook updates

Ambassador playbook needs to be updated, and Ambassador handbook needs updating before Ambassadors are onboarded

Investigation and learning JAWS for NMDC use case

Describe the solution you'd like
Learning the JAWS code and how the NMDC workflows can use JAWS to run more efficiently.

Describe alternatives you've considered
Continue as is.

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Who will use this feature/enhancement? To start Mark Flynn will be training and learning about JAWS.
When will they use it? Testing and seeing if the NMDC workflows will work effectively using JAWS.
How will they use it? To use the infrastructure created by JAWS so we don't duplicate work for NMDC.
How will they test it to make sure it's working? Running NMDC workflows
Is the request achievable? During one sprint? Learning will start this sprint, likely will continue.
What is your definition of done for this request? For Mark to document his learning and determine the best way to leverage JAWS for NMDC workflows. Future issues can cover implementation, Edge use case, etc.

JGI Plate layout format requirements

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

  • Currently, the plate layout required format in the submission portal is letter-number. Is it possible to add "can't be this letter-# combo" restrictions? JGI requires that the corners (A1, A12, H1, H12) be blank.

"Plate location (well #): If you have indicated that the sample will be shipped in a plate, list the well location (ie A4, B5). The corner wells must be blank. For partial plates, fill the plate by columns rather than rows. Leave blank if the sample will be shipped in a tube. For more information on submitting samples in plates, please review the “Plate-based sample requirements” document at http://jgi.doe.gov/user-program-info/pmo-overview/project-materials-submission-overview/."

Describe the solution you'd like
A clear and concise description of what you want to happen.

-Add additional plate layout formatting requirments

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

  • What filling out the plate layout location in the template, A1 will flag as a formatting error
  • Add the above cells to the "Guidance" field for the column
  • Make plate layout / well # optional & not required (or add NA) (could submit via tube)

Who will use this feature/enhancement? -JGI submitting users
When will they use it? - When submitting samples to JGI and using NMDC template
How will they use it?
How will they test it to make sure it's working? - Test by doing validation
Is the request achievable? During one sprint? - Yes
What is your definition of done for this request? - Plate layout now has more rigorous formatting requirements that better reflect JGI requirements.

Update repo readme

Add info to the repo readme to clarify nmdc_schema from src/schema

@mslarae13 can you fill in which repo and other details?

Partnership Criteria

  • Brainstorm Criteria
  • Test criteria against partners in tracking sheet
  • Apply criteria to partners

GROW - Ingest GOLD study Gs0149396

We can fetch basic information from the GOLD API study

curl -X 'GET' \
  'https://gold-ws.jgi.doe.gov/api/v1/studies?studyGoldId=Gs0149396' \
  -H 'accept: */*'

slots to be populated via ingest scripts or change sheets post ingest

ess_dive_datasets
https://data.ess-dive.lbl.gov/view/doi:10.15485/1603775
https://data.ess-dive.lbl.gov/view/doi:10.15485/1729719

@cmungall @SamuelPurvine @emileyfadrosh are there other ESS-DIVE identifiers besides the ones I've listed?

doi
https://doi.org/10.46936/10.25585/60001289

websites
https://www.pnnl.gov/projects/WHONDRS
https://narrative.kbase.us/#org/grow
https://orcid.org/0000-0003-0434-4217
https://microbialecosystemslab.com/

funding_sources
“This study used data from the Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS) under the Perturbation Response Traits project at the Pacific Northwest National Laboratory (PNNL). This research was supported by the U.S. Department of Energy (DOE) Early Career Research Program. A portion of this work was performed at the U.S. Department of Energy Environmental Molecular Science Laboratory User Facility. PNNL is operated by Battelle Memorial Institute for the U.S. Department of Energy under Contract No. DE-AC05-76RL01830."
"This study used data from the Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS) under the River Corridor Science Focus Area (SFA) at the Pacific Northwest National Laboratory (PNNL). This research was supported by the U.S. Department of Energy (DOE), Office of Biological and Environmental Research (BER), Environmental System Science (ESS) Program. A portion of this work was performed at the U.S. Department of Energy Environmental Molecular Science Laboratory User Facility. PNNL is operated by Battelle Memorial Institute for the U.S. Department of Energy under Contract No. DE-AC05-76RL01830.”
"The work (proposal: https://doi.org/10.46936/10.25585/60001289) conducted by the U.S. Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy operated under Contract No. DE-AC02-05CH11231.”

Ensure that soil biosamples have two depth values (minimum and maximum)

Compare Bioscales metaG to biosample IDs

Is your feature request related to a problem? Please describe.
This is to address consistency in the bioscales data

Describe the solution you'd like
Ability to compare Bioscales metaG metadata to biosample IDs Stan provided in Data Harmonizer

Describe alternatives you've considered
N/A

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Who will use this feature/enhancement?
When will they use it? When getting data prepared for upload to portal
How will they use it? Use it for checking data consistency and accuracy
How will they test it to make sure it's working? ?
Is the request achievable? During one sprint? ?
What is your definition of done for this request? comparison completed and inconsistencies addressed in bioscales metadata

Determine lifecycle of nmdc minted ids and determine if they can be assigned within submission portal

Is your feature request related to a problem? Please describe.
Need a process to see relationships between shoulders and classes

Describe the solution you'd like
Generate shoulders or document relationships between shoulders and classes

Describe alternatives you've considered
N/A

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

@mslarae13 can you fill out the following when it's time to work on this:

Who will use this feature/enhancement?
When will they use it?
How will they use it?
How will they test it to make sure it's working?
Is the request achievable? During one sprint?
What is your definition of done for this request?

Develop roadmap for NMDC website harmonization

We are currently in the very early stages of thinking about how to make the various NMDC product (data portal, EDGE) and documentation websites feel more cohesive and unified. The overarching goal is improved user experience in navigating, using, and understanding the various NMDC products. This issue is to organize the initial planning of those efforts.

Acceptance Criteria

  1. Assess current situation (what websites exist, how they are linked, how they are built and managed, etc) and come up with ideas for improvement with relevant stakeholders
  2. Based on initial assessment, create new GitHub issues for implementation tasks. It would be good to distinguish between short term tasks ("low hanging fruit") and longer term plans.

Is the request achievable? During one sprint? Yes
What is your definition of done for this request? Complete acceptance criteria items

mint NMDC `id`s of class FieldResearchSite for the trees

Mint FieldResearchSite IDs for the trees

NMDC slots
name - should be the tree name (ie BESC-905-CL1_32_20)
description - "Tree name. Tree name is a composite of genotype, site, and grid coordinates. Using BESC-905-CL1_32_20 as an example the components are 'BESC-905' is the genotype designation in the GWAS collection. 'CL1' is the replicate number in one of two plantation location where 'CL' is Clatskanie Oregon plantation and 'Co' is Corvallis Oregon plantation. '32-20' is the grid position in that plantation where the tree is planted, in this case row 32, column 20."

JGI data for bioscales

Ingest JGI sequencing and analysis data

@aclum if any of the template below is relevant please help to fill out.

Is your feature request related to a problem? Please describe.
Import JGI metagenomes for Gs0154044.

Describe the solution you'd like
The sequencing project tab of https://docs.google.com/spreadsheets/d/1ijV2j7Z79qvftRT3QerLXtLdGYidBc3a/edit#gid=1063921502 lists which sequencing projects have a status of Complete and their sequencing project ids. These IDs can be used with existing scripts to query jamo directly. Raw data, filtered data, assembly, annotation, and bins should be ingested.

Describe alternatives you've considered
It is possible the JDP API is ready as an alternative to directly querying jamo. We are waiting confirmation from Steve Wilson.
Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Raw reads, filtered data, assembly, annotation, and binning files are available on the NMDC data portal. Sequencing where useable=false or jat records where publish=false should not be ingested. Files should include readme or info files for filtering, assembly, and annotation.

Depends on

ingest bioscales study

Proposed plan is to use the information in nmdc currently for this study to make a new study submission record, modulo using an NMDC study id. The existing study record is https://data.microbiomedata.org/details/study/gold:Gs0154044

This is more complete than the GOLD study record avaliable via the GOLD API and will reduce manual modifications via change sheets. The current study information

(curl -X 'GET' \
  'https://api.microbiomedata.org/studies/gold%3AGs0154044' \
  -H 'accept: application/json')

Also include MASSIVE study link

MASSIVE identifiers were added to the schema in microbiomedata/nmdc-schema#642
slot is massive_study_identifiers the value should be MASSIVE:MSV000090886
which resolves to https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=83574f41458a4b259621d5c32a4d82f9

Develop Partnership Tracking Mechanism

Current tracking of partners is distributed and inefficient, by creating a centralized place to hold this information we will have a better concept of how to move forward with partners.

This spreadsheet will make a go to resource to understand who we have interacted with since the beginning of the NMDC

  • Add all partners for the Partnership Drive into spreadsheet
  • Work with leadership team to ensure all informal partners are added to the drive

Website revamp - Landing Page

This issue is focused on updating the Landing Page on the NMDC website. Steps include:

  • mock up new landing page
  • get feedback on mock ups (from NMDC team, champions, ambassadors?)
  • update landing page in test
  • update based on feedback
  • deploy to production

Sujay follow up with Donny on SPRUCE query for Bioscales and MetaG

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
Would like for Sujay to understand Donny's SPRUCE query so he can replicate for other data like Bioscales and metaG

Describe alternatives you've considered
N/A

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Who will use this feature/enhancement? Sujay
When will they use it? As needed
How will they use it? For query other data sets
How will they test it to make sure it's working? ?
Is the request achievable? During one sprint? ?
What is your definition of done for this request? For Sujay to understand how to do this query and to document this and be able to do the query for bioscales and metaG data

Identifiers mural

Is your feature request related to a problem? Please describe.
The problem is making sure everyone understands how identifiers are used throughout NMDC metadata flow

Describe the solution you'd like
A mural created to show identifiers flow and where they are needed

Describe alternatives you've considered
N/A

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Who will use this feature/enhancement? Entire team
When will they use it? In every meeting when identifiers comes up
How will they use it? Used in meetings and other ways to make sure team is clear and understands how identifiers work for NMDC overall
How will they test it to make sure it's working? check accuracy with other team members
Is the request achievable? During one sprint? yes
What is your definition of done for this request? A complete and accurate mural

repos suggested for removal

SPRUCE missing data products

For EMSL Summer school, we discocered 3 SPRUCE samples that were on the data portal, but we couldn't fin the data products from the NMDC workflows. Samples do exist on NERSC.

June2016WEW_10_10 (June2016WEW_Plot10_D2) : https://data.microbiomedata.org/details/sample/gold:Gb0153621

June2016WEW_13_40 (June2016WEW_Plot13_D5): https://data.microbiomedata.org/details/sample/gold:Gb0153630

June2016WEW_17_40 (June2016WEW_Plot17_D5): https://data.microbiomedata.org/details/sample/gold:Gb0153638

Completion Criteria

  • SPRUCE missing data is located and ingested
  • Mechanism for checking orphan data sets in the portal where we have sample metadata but no data products

Schema and server code should distinguish between photos of PIs and logos of studies

The NMDC schema has several slots in which to store URLs pointing to images

  • Study.principal_investigator.PersonValue.profile_image_url
  • Study.study_image

I think that all images, no matter whether they are study logos or PI headshots are being stored in one single field on mongodb, and that conflation is being propagated to the server ingest and server code.

Pictures of people should not be stored in a study logo filed or vice versa. Either we should

  • eliminate one of the fields and make the other more more general
  • route the URLs to the appropriate field and update the server code to use a conditional or default/fallback approach to picking a picture for study pages.
    I'll prove examples form mongodb tomorrow.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.