Giter VIP home page Giter VIP logo

gs-reasmb's Introduction

gs-reasmb

Scripts to re-assemble the gold standard using shovill.

Prerequisites

git clone https://github.com/pg-space/gs-reasmb.git
mamba create -n gs-reasmb -c conda-forge -c bioconda ncbi-datasets-cli entrez-direct sra-tools shovill snakemake-minimal
conda activate gs-reasmb

HowTo

These are the steps to fetch metadata from a project, find SRA ids, download the samples (only those tagged as PAIRED samples), and assemble them.

Projects:

  • NCTC3000 (accession: PRJEB6403)
  • GEBA (accession: PRJNA30815)
  • FDA-ARGOS (accession: PRJNA231221)
# Download all assemblies from project
datasets download genome accession PRJNA30815 --filename PRJNA30815.zip
# Extract jsonl report
unzip -p PRJNA30815.zip ncbi_dataset/data/assembly_data_report.jsonl > PRJNA30815.jsonl
# Extract SRA project ID + remove duplicates (e.g., same assembly from RefSeq and GenBank)
python3 extract_sra_idx.py PRJNA30815.jsonl | sort -u > PRJNA30815.list
# Fetch xml metadata using project id (xml files will be downloaded to the provided directory)
bash map_idx.sh PRJNA30815.list PRJNA30815
# Summarize information from xml files
python3 ~/parse_metadatas.py PRJNA30815/ > PRJNA30815.csv
# Download and assemble samples (one after the other)
snakemake [-n] -c8 --config wd="PRJNA30815-OUT" csv="PRJNA30815.csv"

gs-reasmb's People

Contributors

ldenti avatar jorgeavilacartes avatar

Watchers

Gianluca Della Vedova avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.