Giter VIP home page Giter VIP logo

Comments (5)

mdshw5 avatar mdshw5 commented on June 10, 2024

It appears that you're trying to index the entire mm9 genome using salmon. Both salmon and rapmap are designed to work with a smaller sequence space such as what you would find in a transcriptome. Your log file shows that salmon processes 615,000,000 bases from the genome and then aborts. Depending on how many transcripts are in your feature file, a human transcriptome might be 5-10X smaller.

from salmon.

rob-p avatar rob-p commented on June 10, 2024

Hi @vd4mmind,

Indeed, @mdshw5 is spot on. The issue you're seeing is a result of the hash table doubling failing to allocate sufficient memory when attempting to build a hash table for all 31-mers in the mouse genome. In addition to the memory requirements of building a quasi-index on the genome (which we're actually working to mitigate b/c we think it could be useful in another context), this won't be particularly useful for quantification. Salmon treats each entry in the multifasta file as a distinct transcriptional target. Thus, here, even if the index did build successfully, you'd be quantifying the abundance of different chromosomes & contigs, rather than the transcripts. What you should do (as pointed out by @mdshw5 above), is to grab a file that contains the mouse transcripts (or take your mm9 genome and an appropriate gtf file and use a tool like gffread to extract the transcript sequences).

from salmon.

vd4mmind avatar vd4mmind commented on June 10, 2024

Ah yes this is actually true, I realised it now. Infact I always ran salmon on the transcripts file for human rather than genome. Yes the mm9 does not have transcripts fasta file in our lab, so I will create one and then run indexes on it. Yes it is my bad. Thanks for the suggestions. I will do the needful and run the index once it is done I will report it here. If its not a problem till that time I would like to keep this ticket open.

from salmon.

vd4mmind avatar vd4mmind commented on June 10, 2024

I have a question , if you guys would like to answer. Where can I get the transcripts.gtf file for mm9. Is there any link from where I can download or do I have to create on my own. I am a bit confused and different forums are adding up to my confusion if you would like to suggest.

from salmon.

vd4mmind avatar vd4mmind commented on June 10, 2024

Done the required work. Sorry for bothering everyone. Downloaded the refGene.gtf file from UCSC for mm9 having transcript information and then used gffread to build the transcript.fa for the mm9. Finally ran salmon indexes and to my surprise it finished in matter of few minutes < 3'. Thanks for all the suggestions. This is something which I always like getting to learn something new every day. Closing the issue.

from salmon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.