Giter VIP home page Giter VIP logo

Comments (11)

tsibley avatar tsibley commented on July 1, 2024 1

I think it would be smart to keep the full UUID linked one way or another. It is an identifier equivalent in utility to the GenBank accession.

from augur-build.

tsibley avatar tsibley commented on July 1, 2024

This means that sample UUID fe1a1206-21ef-45ff-8be0-9d7643eef879 would be strain A/Washington/SFS-43eef879/2019

I would really prefer to keep the entire UUID in the strain name. The whole reason for using the UUIDs in the first place is that they are universally unique; a property that we lose if we truncate them. If we don't use the UUID, then we've lost all its benefits and shouldn't have used them from the start.

I would also caution against using opaque acronyms like SFS, since they're meaningless outside of the study. Can we use something like

A/Washington/seattleflu.org/fe1a1206-21ef-45ff-8be0-9d7643eef879/2019

instead?

from augur-build.

trvrb avatar trvrb commented on July 1, 2024

@tsibley --- I'm afraid I don't agree. We should aim to be as consistent as possible with how the entire flu field treats strain names. It will be super weird if there are canonical names like B/Washington/2/2019 while we name things like B/Washington/seattleflu.org/fe1a1206-21ef-45ff-8be0-9d7643eef879/2019. It's far outside standard naming.

The strain name itself is meant to be unique, but short enough to be usable. Even A/Singapore/Infimh-16-0019/2016 was quite unwieldy. Keep in mind that each strain is tied to unique accession provisioned by Genbank or by GISAID that gives detailed provenance information. Strain names are meant to:

  1. Provide broad virus information, ie A vs B
  2. Provide broad geo information, ie Washington
  3. Provide a short disambiguation string (traditionally 1, 2, 3)
  4. Provide broad time information, ie 2019

(Field order is important too, extra slashes are non-standard and would break parsing)

I might even say to just name this as A/Washington/43eef879/2019. There is no way that the 8-digit hex will conflict with the CDC's 1, 2, 3 naming. (The SFS- was there for additional disambiguation, not for provenance)

from augur-build.

tsibley avatar tsibley commented on July 1, 2024

…but short enough to be usable. Even A/Singapore/Infimh-16-0019/2016 was quite unwieldy.

Ok! It seems like I don't understand how these names are used in practice, if that's considered unwieldy. (It doesn't, from my naive, outside perspective, seem unwieldy to me.)

Are these names regularly spoken, as opposed to copied/programmatically processed?

from augur-build.

trvrb avatar trvrb commented on July 1, 2024

Yes. Regularly spoken aloud and used to point people around a tree or around a titer table.

If you'd like to keep UUID, we can provide this as a "sample ID" in flat file data download that's paired with strain name.

from augur-build.

joverlee521 avatar joverlee521 commented on July 1, 2024

Add a more general identifier for each genome

from augur-build.

joverlee521 avatar joverlee521 commented on July 1, 2024

Update format for date in shipping.augur-build-metadata

from augur-build.

trvrb avatar trvrb commented on July 1, 2024

One additional request here: just using age_category eg adult vs child is too coarse of an analysis. I'd like to additionally have age_range_coarse, eg ["5 years","18 years"). I think age range coarse will be the right resolution for the genomic work and we won't be able to use age range fine.

I've added this as request number 4 above.

from augur-build.

trvrb avatar trvrb commented on July 1, 2024

Yet one more request. Can we restrict rows in shipping.augur-build-metadata to only those samples that have sequencing data? There are two reasons for this:

  1. We want to protect data privacy in these shipping views, so rather than downloading a dataset of ~20k rows with all encounters, it's safer to download a dataset of ~2k rows with just encounters that were sequenced.
  2. Dealing with the extra large metadata table is somewhat unwieldy given how scripts like select_strains.py are written.

I've added this as request number 5 above.

from augur-build.

kairstenfay avatar kairstenfay commented on July 1, 2024

Yet one more request. Can we restrict rows in shipping.augur-build-metadata to only those samples that have sequencing data?

@trvrb do you still only want the new shipping.metadata_for_augur_build to include samples with sequencing data? If so, is there a separate desire for a view similar to what Mike requested that contains all samples regardless of encounter or sequence data?

from augur-build.

kairstenfay avatar kairstenfay commented on July 1, 2024

There are a small handful of upstream fixes we need to shipping views.

1. The `date` field in `v2/shipping/augur-build-metadata` was formatted as `2019-09-25T19:37:35.483+00:00`. This should just read `2019-09-25`. I've fixed this on the augur side here: https://github.com/seattleflu/augur-build/blob/master/scripts/download_sfs_metadata.py#L25 for the time being.

This is now fixed on master.

4. Include `age_range_coarse` as a field in the shipping view.

This column is now present on master.

from augur-build.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.