Comments (11)
I think it would be smart to keep the full UUID linked one way or another. It is an identifier equivalent in utility to the GenBank accession.
from augur-build.
This means that sample UUID
fe1a1206-21ef-45ff-8be0-9d7643eef879
would be strainA/Washington/SFS-43eef879/2019
I would really prefer to keep the entire UUID in the strain name. The whole reason for using the UUIDs in the first place is that they are universally unique; a property that we lose if we truncate them. If we don't use the UUID, then we've lost all its benefits and shouldn't have used them from the start.
I would also caution against using opaque acronyms like SFS
, since they're meaningless outside of the study. Can we use something like
A/Washington/seattleflu.org/fe1a1206-21ef-45ff-8be0-9d7643eef879/2019
instead?
from augur-build.
@tsibley --- I'm afraid I don't agree. We should aim to be as consistent as possible with how the entire flu field treats strain names. It will be super weird if there are canonical names like B/Washington/2/2019
while we name things like B/Washington/seattleflu.org/fe1a1206-21ef-45ff-8be0-9d7643eef879/2019
. It's far outside standard naming.
The strain name itself is meant to be unique, but short enough to be usable. Even A/Singapore/Infimh-16-0019/2016
was quite unwieldy. Keep in mind that each strain is tied to unique accession provisioned by Genbank or by GISAID that gives detailed provenance information. Strain names are meant to:
- Provide broad virus information, ie
A
vsB
- Provide broad geo information, ie
Washington
- Provide a short disambiguation string (traditionally
1
,2
,3
) - Provide broad time information, ie
2019
(Field order is important too, extra slashes are non-standard and would break parsing)
I might even say to just name this as A/Washington/43eef879/2019
. There is no way that the 8-digit hex will conflict with the CDC's 1
, 2
, 3
naming. (The SFS-
was there for additional disambiguation, not for provenance)
from augur-build.
…but short enough to be usable. Even
A/Singapore/Infimh-16-0019/2016
was quite unwieldy.
Ok! It seems like I don't understand how these names are used in practice, if that's considered unwieldy. (It doesn't, from my naive, outside perspective, seem unwieldy to me.)
Are these names regularly spoken, as opposed to copied/programmatically processed?
from augur-build.
Yes. Regularly spoken aloud and used to point people around a tree or around a titer table.
If you'd like to keep UUID, we can provide this as a "sample ID" in flat file data download that's paired with strain name.
from augur-build.
Add a more general identifier for each genome
from augur-build.
Update format for
date
in shipping.augur-build-metadata
from augur-build.
One additional request here: just using age_category
eg adult
vs child
is too coarse of an analysis. I'd like to additionally have age_range_coarse
, eg ["5 years","18 years")
. I think age range coarse will be the right resolution for the genomic work and we won't be able to use age range fine.
I've added this as request number 4 above.
from augur-build.
Yet one more request. Can we restrict rows in shipping.augur-build-metadata
to only those samples that have sequencing data? There are two reasons for this:
- We want to protect data privacy in these shipping views, so rather than downloading a dataset of ~20k rows with all encounters, it's safer to download a dataset of ~2k rows with just encounters that were sequenced.
- Dealing with the extra large metadata table is somewhat unwieldy given how scripts like
select_strains.py
are written.
I've added this as request number 5 above.
from augur-build.
Yet one more request. Can we restrict rows in
shipping.augur-build-metadata
to only those samples that have sequencing data?
@trvrb do you still only want the new shipping.metadata_for_augur_build
to include samples with sequencing data? If so, is there a separate desire for a view similar to what Mike requested that contains all samples regardless of encounter or sequence data?
from augur-build.
There are a small handful of upstream fixes we need to shipping views.
1. The `date` field in `v2/shipping/augur-build-metadata` was formatted as `2019-09-25T19:37:35.483+00:00`. This should just read `2019-09-25`. I've fixed this on the augur side here: https://github.com/seattleflu/augur-build/blob/master/scripts/download_sfs_metadata.py#L25 for the time being.
This is now fixed on master.
4. Include `age_range_coarse` as a field in the shipping view.
This column is now present on master.
from augur-build.
Related Issues (11)
- Swap out reference for a more recent virus HOT 3
- Nucleotide length in reference .gb file HOT 3
- Create reproducible coloring for census tracts
- Make filter-to-region optional in extract_cluster_fastas.py HOT 1
- Include script to hide ancestral nodes based on cluster designation HOT 4
- Compare genetic distance to geographic distance
- Calculate genetic distance to closest sample HOT 1
- Filter cluster output HOT 5
- Genome reference
- Missing header information in genome reference HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from augur-build.