microbiomedata / issues Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 71 KB

public repo for issues related to NMDC work

issues's People

Contributors

Stargazers

Watchers

issues's Issues

Prioritize data sets for ingest

In order to prioritize work for the other squads leading up to GSP we need a list of prioritized data sets for ingest.

Partnership criteria - Apply criteria to partners

Last step from previous partnership criteria work - #13

See more details on criteria at https://docs.google.com/presentation/d/1YohF6Lm_p1nbBUVQyfgSsrrzL1-Noej_nvpxO09dUDY/edit#slide=id.g1885faa4fd2_0_10

@jkelliher-github will work with @simroux on this.

GROW confirm environmental context terms

GROW- ingest biosamples in GOLD for GOLD study Gs0149396

List and prioritize data sources appropriate for semi-automated biosample ingest pipelines

As part of the biosample ingest squad, we'd like to enumerate sources that we could build pipelines for in the semi-automated ingestion system. For each source it would be good to have a sense of:

what's the priority of ingesting the data
what are the technical details around fetching the data
what information would need to be provided by a data wrangler to start an ingest

Design validation for IDs

Is your feature request related to a problem? Please describe.
Trying to make sure identifiers are accurate and in the right slots

Describe the solution you'd like
Design a way to validate and verify that correct IDs are used in their designated alternate identifiers slots

Describe alternatives you've considered
Will be part of design

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Who will use this feature/enhancement? Team will use to make sure identifiers are accurate
When will they use it? As needed
How will they use it? Mark can you add something here?
How will they test it to make sure it's working? Mark can you add something here?
Is the request achievable? During one sprint? Design only for one sprint
What is your definition of done for this request? That there is a design for how to do the verification and validation of identifiers in designated slots

Identify ESS-DIVE data for GROW samples to incorporate into NMDC

Ambassador Playbook updates

Ambassador playbook needs to be updated, and Ambassador handbook needs updating before Ambassadors are onboarded

Summarize GROW EMSL data -NOM

Investigation and learning JAWS for NMDC use case

Describe the solution you'd like
Learning the JAWS code and how the NMDC workflows can use JAWS to run more efficiently.

Describe alternatives you've considered
Continue as is.

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Who will use this feature/enhancement? To start Mark Flynn will be training and learning about JAWS.
When will they use it? Testing and seeing if the NMDC workflows will work effectively using JAWS.
How will they use it? To use the infrastructure created by JAWS so we don't duplicate work for NMDC.
How will they test it to make sure it's working? Running NMDC workflows
Is the request achievable? During one sprint? Learning will start this sprint, likely will continue.
What is your definition of done for this request? For Mark to document his learning and determine the best way to leverage JAWS for NMDC workflows. Future issues can cover implementation, Edge use case, etc.

JGI Plate layout format requirements

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Currently, the plate layout required format in the submission portal is letter-number. Is it possible to add "can't be this letter-# combo" restrictions? JGI requires that the corners (A1, A12, H1, H12) be blank.

"Plate location (well #): If you have indicated that the sample will be shipped in a plate, list the well location (ie A4, B5). The corner wells must be blank. For partial plates, fill the plate by columns rather than rows. Leave blank if the sample will be shipped in a tube. For more information on submitting samples in plates, please review the “Plate-based sample requirements” document at http://jgi.doe.gov/user-program-info/pmo-overview/project-materials-submission-overview/."

Describe the solution you'd like
A clear and concise description of what you want to happen.

-Add additional plate layout formatting requirments

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

What filling out the plate layout location in the template, A1 will flag as a formatting error
Add the above cells to the "Guidance" field for the column
Make plate layout / well # optional & not required (or add NA) (could submit via tube)

Who will use this feature/enhancement? -JGI submitting users
When will they use it? - When submitting samples to JGI and using NMDC template
How will they use it?
How will they test it to make sure it's working? - Test by doing validation
Is the request achievable? During one sprint? - Yes
What is your definition of done for this request? - Plate layout now has more rigorous formatting requirements that better reflect JGI requirements.

Update repo readme

Add info to the repo readme to clarify nmdc_schema from src/schema

@mslarae13 can you fill in which repo and other details?

Partnership Criteria

Brainstorm Criteria
Test criteria against partners in tracking sheet
Apply criteria to partners

GROW - Ingest GOLD study Gs0149396

We can fetch basic information from the GOLD API study

curl -X 'GET' \
  'https://gold-ws.jgi.doe.gov/api/v1/studies?studyGoldId=Gs0149396' \
  -H 'accept: */*'

slots to be populated via ingest scripts or change sheets post ingest

ess_dive_datasets
https://data.ess-dive.lbl.gov/view/doi:10.15485/1603775
https://data.ess-dive.lbl.gov/view/doi:10.15485/1729719

@cmungall @SamuelPurvine @emileyfadrosh are there other ESS-DIVE identifiers besides the ones I've listed?

doi
https://doi.org/10.46936/10.25585/60001289

websites
https://www.pnnl.gov/projects/WHONDRS
https://narrative.kbase.us/#org/grow
https://orcid.org/0000-0003-0434-4217
https://microbialecosystemslab.com/

funding_sources
“This study used data from the Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS) under the Perturbation Response Traits project at the Pacific Northwest National Laboratory (PNNL). This research was supported by the U.S. Department of Energy (DOE) Early Career Research Program. A portion of this work was performed at the U.S. Department of Energy Environmental Molecular Science Laboratory User Facility. PNNL is operated by Battelle Memorial Institute for the U.S. Department of Energy under Contract No. DE-AC05-76RL01830."
"This study used data from the Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS) under the River Corridor Science Focus Area (SFA) at the Pacific Northwest National Laboratory (PNNL). This research was supported by the U.S. Department of Energy (DOE), Office of Biological and Environmental Research (BER), Environmental System Science (ESS) Program. A portion of this work was performed at the U.S. Department of Energy Environmental Molecular Science Laboratory User Facility. PNNL is operated by Battelle Memorial Institute for the U.S. Department of Energy under Contract No. DE-AC05-76RL01830.”
"The work (proposal: https://doi.org/10.46936/10.25585/60001289) conducted by the U.S. Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy operated under Contract No. DE-AC02-05CH11231.”

Outline Potential Metabolomics & Metaproteomics Groups

Gather information from NMDC Specialists (Yuri & Paul)
Enter Potential Partners into Spreadsheet https://docs.google.com/spreadsheets/d/1c5wCVEA_Efh3RdLmPQ97omn_HqTSbaW8MA2UNPdL3hw/edit#gid=0

Ensure that soil biosamples have two depth values (minimum and maximum)

depth2 is being deprecated in https://github.com/microbiomedata/nmdc-schema/blob/issue-486-data-to-7-0 (which will actually result in a new version of the schema, greater than 7.0.0)
the released depth slot already takes objects, as its values, with has_minimum_numeric_value and has_maximum_numeric_value sub-slots.
when merging the depth2 content into depth, should we just migrate the depth's has_minimum_numeric_value value into the has_minimum_numeric_value slot if necessary?
how do we determine that a biosample is a soil biosample?

Compare Bioscales metaG to biosample IDs

Is your feature request related to a problem? Please describe.
This is to address consistency in the bioscales data

Describe the solution you'd like
Ability to compare Bioscales metaG metadata to biosample IDs Stan provided in Data Harmonizer

Describe alternatives you've considered
N/A

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Who will use this feature/enhancement?
When will they use it? When getting data prepared for upload to portal
How will they use it? Use it for checking data consistency and accuracy
How will they test it to make sure it's working? ?
Is the request achievable? During one sprint? ?
What is your definition of done for this request? comparison completed and inconsistencies addressed in bioscales metadata

GROW - Ingest JGI sequencing and analysis data records for metagenomes

GOLD study Gs0149396
There are 291 metagenomes.

Bioscales -metabolomics only samples metadata ingest

Metadata ingest for metabolomics only samples for bioscales. Use the study identifier created from #38

@ssarrafan This is planned for the sprint starting 2/13/23

Determine lifecycle of nmdc minted ids and determine if they can be assigned within submission portal

Is your feature request related to a problem? Please describe.
Need a process to see relationships between shoulders and classes

Describe the solution you'd like
Generate shoulders or document relationships between shoulders and classes

Describe alternatives you've considered
N/A

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

@mslarae13 can you fill out the following when it's time to work on this:

Who will use this feature/enhancement?
When will they use it?
How will they use it?
How will they test it to make sure it's working?
Is the request achievable? During one sprint?
What is your definition of done for this request?

Metadata ingest for biosamples in GOLD - study Gs0154044

Create NMDC Conference Pamphlet

This issue is to track the creation of the NMDC conference pamphlet. Originally captured in [this issue](Create new NMDC conference pamphlet).

Create template for MOUs and other governance materials

Governance policy for workflow updates and db partnerships with User Facilities
Develop governance policy for interactions with KBase
Develop governance policy for interactions with ESS-DIVE

Add a proxy that includes the portal API and runtime API under a single URL

This is a first small step toward API unification based on retreat discussions. @shreddd I believe you said you'd take this one.

@cmungall @shreddd @dwinston @elais @mcovalt

identifier slot for img ids

Make an alternative identifiers slot for IMG IDs

@turbomam can you add more details here when the time comes to work on this please

Update JGI metadata terms on the submission portal

Assigning first to Montana to provide Mark with what needs to be updated. She will re-assign to Mark.

This is from the task list for the subport squad.

Develop roadmap for NMDC website harmonization

We are currently in the very early stages of thinking about how to make the various NMDC product (data portal, EDGE) and documentation websites feel more cohesive and unified. The overarching goal is improved user experience in navigating, using, and understanding the various NMDC products. This issue is to organize the initial planning of those efforts.

Acceptance Criteria

Assess current situation (what websites exist, how they are linked, how they are built and managed, etc) and come up with ideas for improvement with relevant stakeholders
Based on initial assessment, create new GitHub issues for implementation tasks. It would be good to distinguish between short term tasks ("low hanging fruit") and longer term plans.

Is the request achievable? During one sprint? Yes
What is your definition of done for this request? Complete acceptance criteria items

Dev/prod mongo databases

@shreddd were you going to take this one as well?

mint NMDC `id`s of class FieldResearchSite for the trees

Mint FieldResearchSite IDs for the trees

NMDC slots
name - should be the tree name (ie BESC-905-CL1_32_20)
description - "Tree name. Tree name is a composite of genotype, site, and grid coordinates. Using BESC-905-CL1_32_20 as an example the components are 'BESC-905' is the genotype designation in the GWAS collection. 'CL1' is the replicate number in one of two plantation location where 'CL' is Clatskanie Oregon plantation and 'Co' is Corvallis Oregon plantation. '32-20' is the grid position in that plantation where the tree is planted, in this case row 32, column 20."

JGI data for bioscales

Ingest JGI sequencing and analysis data

@aclum if any of the template below is relevant please help to fill out.

Is your feature request related to a problem? Please describe.
Import JGI metagenomes for Gs0154044.

Describe the solution you'd like
The sequencing project tab of https://docs.google.com/spreadsheets/d/1ijV2j7Z79qvftRT3QerLXtLdGYidBc3a/edit#gid=1063921502 lists which sequencing projects have a status of Complete and their sequencing project ids. These IDs can be used with existing scripts to query jamo directly. Raw data, filtered data, assembly, annotation, and bins should be ingested.

Describe alternatives you've considered
It is possible the JDP API is ready as an alternative to directly querying jamo. We are waiting confirmation from Steve Wilson.
Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Raw reads, filtered data, assembly, annotation, and binning files are available on the NMDC data portal. Sequencing where useable=false or jat records where publish=false should not be ingested. Files should include readme or info files for filtering, assembly, and annotation.

Depends on

ingest bioscales study

Proposed plan is to use the information in nmdc currently for this study to make a new study submission record, modulo using an NMDC study id. The existing study record is https://data.microbiomedata.org/details/study/gold:Gs0154044

This is more complete than the GOLD study record avaliable via the GOLD API and will reduce manual modifications via change sheets. The current study information

(curl -X 'GET' \
  'https://api.microbiomedata.org/studies/gold%3AGs0154044' \
  -H 'accept: application/json')

Also include MASSIVE study link

MASSIVE identifiers were added to the schema in microbiomedata/nmdc-schema#642
slot is massive_study_identifiers the value should be MASSIVE:MSV000090886
which resolves to https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=83574f41458a4b259621d5c32a4d82f9

Renew allocation for expanse

Develop Partnership Tracking Mechanism

Current tracking of partners is distributed and inefficient, by creating a centralized place to hold this information we will have a better concept of how to move forward with partners.

This spreadsheet will make a go to resource to understand who we have interacted with since the beginning of the NMDC

Add all partners for the Partnership Drive into spreadsheet
Work with leadership team to ensure all informal partners are added to the drive

GROW - Run reads based taxonomy analysis

Run reads based analysis on metagenomes

Determine/design method of 'how to enable multiple data harmonizer sheets'

@pkalita-lbl I am putting this one in progress since you're already working on it. If you already have another GH issue for this let me know.

FYI @mslarae13
From the task list for subport squad.

DOE Quarterly Report Due 1/9/23

DOE Quarterly Report Due 1/9/23
Discussed at Squad Leads meeting, will have a section on squad accomplishments
Tagged squad leads and others to update the report

https://docs.google.com/document/d/1Y4k9bHngYy1sFCrddnKaRxHSF7Yq81AhKu0O2b4ZCP4/edit#

Bioscales metabolomics

Link existing metabolomics data in MASSIVE to the NMDC study page
MASSIVE record
https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=83574f41458a4b259621d5c32a4d82f9

ingest bioscales biosamples that exist in GOLD

Website revamp - Landing Page

This issue is focused on updating the Landing Page on the NMDC website. Steps include:

mock up new landing page
get feedback on mock ups (from NMDC team, champions, ambassadors?)
update landing page in test
update based on feedback
deploy to production

Identify possible "metadata status" values

This is from the subport squad task list.

GROW - Metadata ingest for NOM only samples

This issue is to ingest biosample metadata into NMDC for samples that are not in GOLD.
Depends on

Requires defining mixs environmental triad

Sujay follow up with Donny on SPRUCE query for Bioscales and MetaG

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
Would like for Sujay to understand Donny's SPRUCE query so he can replicate for other data like Bioscales and metaG

Describe alternatives you've considered
N/A

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Who will use this feature/enhancement? Sujay
When will they use it? As needed
How will they use it? For query other data sets
How will they test it to make sure it's working? ?
Is the request achievable? During one sprint? ?
What is your definition of done for this request? For Sujay to understand how to do this query and to document this and be able to do the query for bioscales and metaG data

Design and describe "multiple submission per study"

This is from the subport squad task list

Identifiers mural

Is your feature request related to a problem? Please describe.
The problem is making sure everyone understands how identifiers are used throughout NMDC metadata flow

Describe the solution you'd like
A mural created to show identifiers flow and where they are needed

Describe alternatives you've considered
N/A

Acceptance Criteria
Create a checklist or scenario-based acceptance criteria, from the users perspective, that answers the following:

Who will use this feature/enhancement? Entire team
When will they use it? In every meeting when identifiers comes up
How will they use it? Used in meetings and other ways to make sure team is clear and understands how identifiers work for NMDC overall
How will they test it to make sure it's working? check accuracy with other team members
Is the request achievable? During one sprint? yes
What is your definition of done for this request? A complete and accurate mural

repos suggested for removal

empty https://github.com/microbiomedata/metadata_documentation
https://github.com/microbiomedata/iot_to_linkml (superseded by https://github.com/microbiomedata/sheets_and_friends and sheets-for-nmdc-submission-schema)
https://github.com/microbiomedata/setstest
https://github.com/microbiomedata/dh_testing (@turbomam and @pkalita-lbl to conform)
https://github.com/microbiomedata/mixs (deprecated by incomplete GSC schemasheets?)
https://github.com/microbiomedata/schema_hackathon
https://github.com/microbiomedata/DataHarmonizer (No activity since February 2022. Committing directly to CIGDOH's repo, repos owned by @pkalita-lbl, or directly to nmdc-server?
https://github.com/microbiomedata/cleanroom-schema (delete/archive and start a new nmdc-schema cookiecutter repo?)

SPRUCE missing data products

For EMSL Summer school, we discocered 3 SPRUCE samples that were on the data portal, but we couldn't fin the data products from the NMDC workflows. Samples do exist on NERSC.

June2016WEW_10_10 (June2016WEW_Plot10_D2) : https://data.microbiomedata.org/details/sample/gold:Gb0153621

June2016WEW_13_40 (June2016WEW_Plot13_D5): https://data.microbiomedata.org/details/sample/gold:Gb0153630

June2016WEW_17_40 (June2016WEW_Plot17_D5): https://data.microbiomedata.org/details/sample/gold:Gb0153638

Completion Criteria

SPRUCE missing data is located and ingested
Mechanism for checking orphan data sets in the portal where we have sample metadata but no data products

Ambassador Handbook updates

This issue is to track the updates for the handbook before the Ambassador 2023 launch.

@jkelliher-github @frodriguez16 please add the link to the document or break down into steps here.

Survey NMDC Staff (New & Old) for External Partners

Develop survey
Send out to NMDC Team
Ensure responses from majority of team
Enter into partnership spreadsheet

Schema and server code should distinguish between photos of PIs and logos of studies

The NMDC schema has several slots in which to store URLs pointing to images

Study.principal_investigator.PersonValue.profile_image_url
Study.study_image

I think that all images, no matter whether they are study logos or PI headshots are being stored in one single field on mongodb, and that conflation is being propagated to the server ingest and server code.

Pictures of people should not be stored in a study logo filed or vice versa. Either we should

eliminate one of the fields and make the other more more general
route the URLs to the appropriate field and update the server code to use a conditional or default/fallback approach to picking a picture for study pages.
I'll prove examples form mongodb tomorrow.

Update roles/responsibilities - RACI for renewal

The roles and responsibilities matrix should be updated to reflect the renewal work.

For the pilot we used a large RACI, for the renewal we'll

incorporate RACI's into each squad using the squad template and
do a high level RACI for the main areas of the renewal

microbiomedata / issues Goto Github PK

issues's People

Contributors

Stargazers

Watchers

issues's Issues

Recommend Projects

Recommend Topics

Recommend Org