Giter VIP home page Giter VIP logo

Comments (6)

saramonzon avatar saramonzon commented on May 23, 2024

The files needed for the pipeline are:

  • Human reference genome + bowtie2 indexes.
  • Virus reference genome + bowtie2 indexes + blast db + gff/gtf file (Currently NC_045512,2 version from refseq is being used for sars-cov-2)

from viralrecon.

heuermh avatar heuermh commented on May 23, 2024

What is the best way to get the GFF/GTF file?

There is one linked here
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/858/895/GCF_009858895.2_ASM985889v3/GCF_009858895.2_ASM985889v3_genomic.gff.gz

or would it be better to generate one from the NC_045512,2 full Genbank record?

from viralrecon.

drpatelh avatar drpatelh commented on May 23, 2024

Yep, it would be helpful if we can have links to where the reference files were downloaded (especially for the virus). If they have been generated manually from standard files (e.g. blast db) then worth listing the command used here too.

Also, just out of interest why did you pick NC_045512,2 as a reference sequence and not one of the others that are being used? e.g.
https://www.ncbi.nlm.nih.gov/nuccore/MN908947

I am just trying to figure out if we make all/one of these available in the format required by the pipeline?

from viralrecon.

saramonzon avatar saramonzon commented on May 23, 2024

NC_045512,2 and MN908947 are exactly the same sequences, one is the RefSeq identifier and the other one is the Genbank's. The genbank one was the one used at the beginning because the refseq didn't exist. But the sequence and coordinates are the same.

We've downloaded everything from ncbi.
Virus fasta file: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/858/895/GCF_009858895.2_ASM985889v3/GCF_009858895.2_ASM985889v3_genomic.fna.gz
Virus gff file:https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/858/895/GCF_009858895.2_ASM985889v3/GCF_009858895.2_ASM985889v3_genomic.gff.gz

blast db:
makeblastdb -in NC_045512.2 -parse_seqids -dbtype nucl

bowtie2 index:
bowtie2-build NC_045512.2.fasta NC_045512.2.fasta

from viralrecon.

drpatelh avatar drpatelh commented on May 23, 2024

Amazing! Thanks @saramonzon. This will come in handy if and when we decide to upload the data to AWS iGenomes or I guess we could just provide URLs to the pipeline 🤔

from viralrecon.

drpatelh avatar drpatelh commented on May 23, 2024

This should be sorted now with this PR to host the viral genomes on test-datasets:
nf-core/test-datasets#148

from viralrecon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.