Giter VIP home page Giter VIP logo

rabies's Introduction

Nextstrain repository for rabies virus

This repository contains two workflows for the analysis of rabies virus data:

  • ingest/ - Download data from GenBank, clean and curate it and upload it to S3
  • phylogenetic/ - Filter sequences, align, construct phylogeny and export for visualization

Each folder contains a README.md with more information. The results of running both workflows are publicly visible at nextstrain.org/rabies.

Installation

Follow the standard installation instructions for Nextstrain's suite of software tools.

Quickstart

Run the default phylogenetic workflow via:

cd phylogenetic/
nextstrain build .
nextstrain view .

Documentation

rabies's People

Contributors

kimandrews avatar joverlee521 avatar

Watchers

John SJ Anderson avatar Thomas Sibley avatar Trevor Bedford avatar james hadfield avatar Victor Lin avatar Emma Hodcroft avatar  avatar  avatar

rabies's Issues

ingest fails

Blocked on #1


First reported in Nextstrain office hours

The ingest workflow on the default branch fails during the curate rule.
Looking into the logs at ingest/logs/curate.txt, there's a bunch of warnings, but the actual error is

ERROR: Records do not have the same fields! Please check your input data has the same fields.

phylogenetic: Restore CI config

The phylogenetic CI config.yaml was removed as part of #6 because the workflow is still WIP and doesn't fully run yet.

Once the phylogenetic workflow is in a functional state, we should restore the CI config with example data.

Mixed up division/location fields for US

Noticed during 1:1 w/ @kimandrews that USA records have a mix of state and counties in the division and location fields.

Small subset of examples
accession accession_version strain date region country division location
AF394868 AF394868.1 1265 North America USA Monterey California
AF394869 AF394869.1 2253 North America USA California
AF394870 AF394870.1 3044 North America USA Arizona
AF394871 AF394871.1 1566 North America USA Plumas County California
AF394872 AF394872.1 2847 North America USA Lewis County Washington

Looking at GenBank record for one of the examples (AF394868), the geo_loc_name is "USA: Monterey, California".

This does not follow the pattern for GenBank's geo_loc_name (<country_value>[:<region>][, <locality>]) that we expect in augur curate parse-genbank-location.

Possible solutions

  1. Add geolocation rules to correct these records for rabies
  2. Programmatically catch these mix ups in augur curate parse-genbank-location (nextstrain/augur#1578)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.