Giter VIP home page Giter VIP logo

cov2db's Introduction

Visitor

cov2db: a low frequency variant DB for SARS-CoV-2

cov2db_logo_bg (SARS-CoV-2 Illustration image credit: Davian Ho for the Innovative Genomics Institute)


Problem

Global SARS-CoV-2 sequencing efforts have resulted in a massive genomic dataset available to the public for a variety of analyses. However, the two most common resources are genome assemblies (e.g. deposited in GISAID and GenBank) and raw sequencing reads. Both of these limit the quantity of information, especially with respect to variants found within the SARS-CoV-2 populations. Genome assemblies only contain consensus level information, which is not reflective of the full genomic diversity within a given sample (since even a single patient derived sample represents a viral population within the host). Raw sequencing reads on the other hand require further analyses in order to extract variant information, and can often be prohibitively large in size.

Thus, we propose cov2db; a database resource for collecting low frequency variant information for available SARS-CoV-2 data (currently there are more than 1.2 million SARS-CoV-2 sequencing datasets in SRA and ENA). Our goal is to provide an easy to use query system, and contribute to a database of VCF files that contain variant calls for SARS-CoV-2 samples. We hope that such interactive database will speed up downstream analyses and encourage collaboration.

Features

Supporting queries based on the following fields.

Annotation:

  • Reference amino acid
  • Variant amino acid
  • Gene name
  • Mutation type (missense, synonymous, upstream, etc.)

Variant call information:

  • Position
  • Allele frequency
  • Reference allele
  • Alternative allele
  • Coverage depth
  • Strand bias

Sample metadata: [in development]

  • Sequencing device
  • Library layout
  • Submission date
  • Study accession
  • Variant caller

Example queries

[FILL IN WITH SAMPLE QUERY + SCREENSHOTS]

Methods

Workflow figure✍️

covid_freq-Group6 (1)

Related work

VAPr is an excellent mongodb based database for storing variant info. UCSC SARS-CoV-2 genome broswers also provides visualization of intrahost variants here.


Team members

  • Daniel Agustinho, Washington University (data acquisition, writer)
  • Li Chuin Chong, Twincore GmbH/HZI-DKFZ under auspices MHH (Sysadmin, mongodb)
  • Maria Jose, Pondicherry Central University (data acquisition, mongodb)
  • BaiWei Lo, University of Konstanz (data acquisition, QC)
  • Ramanandan Prabhakaran, Roche Canada (Sysadmin, mongodb)
  • Sophie Poon, (Data acquisition, QC)
  • Suresh Kumar, (QC)
  • Nick Sapoval, Rice University (Team co-lead, data acquisition, writer)
  • Todd Treangen (Team Lead)


cov2db's People

Contributors

chonglc avatar danielpaagustinho avatar dcgenomics avatar nsapoval avatar suresh2014 avatar treangen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.