(SARS-CoV-2 Illustration image credit: Davian Ho for the Innovative Genomics Institute)
Global SARS-CoV-2 sequencing efforts have resulted in a massive genomic dataset available to the public for a variety of analyses. However, the two most common resources are genome assemblies (e.g. deposited in GISAID and GenBank) and raw sequencing reads. Both of these limit the quantity of information, especially with respect to variants found within the SARS-CoV-2 populations. Genome assemblies only contain consensus level information, which is not reflective of the full genomic diversity within a given sample (since even a single patient derived sample represents a viral population within the host). Raw sequencing reads on the other hand require further analyses in order to extract variant information, and can often be prohibitively large in size.
Thus, we propose cov2db; a database resource for collecting low frequency variant information for available SARS-CoV-2 data (currently there are more than 1.2 million SARS-CoV-2 sequencing datasets in SRA and ENA). Our goal is to provide an easy to use query system, and contribute to a database of VCF files that contain variant calls for SARS-CoV-2 samples. We hope that such interactive database will speed up downstream analyses and encourage collaboration.
Supporting queries based on the following fields.
Annotation:
- Reference amino acid
- Variant amino acid
- Gene name
- Mutation type (missense, synonymous, upstream, etc.)
Variant call information:
- Position
- Allele frequency
- Reference allele
- Alternative allele
- Coverage depth
- Strand bias
Sample metadata: [in development]
- Sequencing device
- Library layout
- Submission date
- Study accession
- Variant caller
[FILL IN WITH SAMPLE QUERY + SCREENSHOTS]
VAPr is an excellent mongodb based database for storing variant info. UCSC SARS-CoV-2 genome broswers also provides visualization of intrahost variants here.
- Daniel Agustinho, Washington University (data acquisition, writer)
- Li Chuin Chong, Twincore GmbH/HZI-DKFZ under auspices MHH (Sysadmin, mongodb)
- Maria Jose, Pondicherry Central University (data acquisition, mongodb)
- BaiWei Lo, University of Konstanz (data acquisition, QC)
- Ramanandan Prabhakaran, Roche Canada (Sysadmin, mongodb)
- Sophie Poon, (Data acquisition, QC)
- Suresh Kumar, (QC)
- Nick Sapoval, Rice University (Team co-lead, data acquisition, writer)
- Todd Treangen (Team Lead)