Comments (5)
It appears that you're trying to index the entire mm9 genome using salmon. Both salmon and rapmap are designed to work with a smaller sequence space such as what you would find in a transcriptome. Your log file shows that salmon processes 615,000,000 bases from the genome and then aborts. Depending on how many transcripts are in your feature file, a human transcriptome might be 5-10X smaller.
from salmon.
Hi @vd4mmind,
Indeed, @mdshw5 is spot on. The issue you're seeing is a result of the hash table doubling failing to allocate sufficient memory when attempting to build a hash table for all 31-mers in the mouse genome. In addition to the memory requirements of building a quasi-index on the genome (which we're actually working to mitigate b/c we think it could be useful in another context), this won't be particularly useful for quantification. Salmon treats each entry in the multifasta file as a distinct transcriptional target. Thus, here, even if the index did build successfully, you'd be quantifying the abundance of different chromosomes & contigs, rather than the transcripts. What you should do (as pointed out by @mdshw5 above), is to grab a file that contains the mouse transcripts (or take your mm9 genome and an appropriate gtf file and use a tool like gffread
to extract the transcript sequences).
from salmon.
Ah yes this is actually true, I realised it now. Infact I always ran salmon on the transcripts file for human rather than genome. Yes the mm9 does not have transcripts fasta file in our lab, so I will create one and then run indexes on it. Yes it is my bad. Thanks for the suggestions. I will do the needful and run the index once it is done I will report it here. If its not a problem till that time I would like to keep this ticket open.
from salmon.
I have a question , if you guys would like to answer. Where can I get the transcripts.gtf file for mm9. Is there any link from where I can download or do I have to create on my own. I am a bit confused and different forums are adding up to my confusion if you would like to suggest.
from salmon.
Done the required work. Sorry for bothering everyone. Downloaded the refGene.gtf file from UCSC for mm9 having transcript information and then used gffread
to build the transcript.fa for the mm9. Finally ran salmon indexes and to my surprise it finished in matter of few minutes < 3'. Thanks for all the suggestions. This is something which I always like getting to learn something new every day. Closing the issue.
from salmon.
Related Issues (20)
- missing flags for indropV2 HOT 4
- How to handle Multiplet data?
- -seqbias | is it specific to random hexameric primers ? HOT 4
- Segmentation fault in salmon quant HOT 4
- (alevin) Specifying --read-geometry in paired-end samples
- Hi @Ray6283, HOT 1
- anaconda version of salmon outdated, missing decoys option HOT 7
- Salmon quant error in --ont mode (Bus error (core dumped)
- View salmon quant output in a browser
- Mapping one organism from a mixed tissue sample HOT 2
- segmentation fault when skipQuant flag is set
- Quantification in Alignment mode for Nanopore Data HOT 2
- Please make gencode SA files available
- Only 50% reads aligned on PE human mRNA stranded library
- (sorry for repeated reference from discussion)
- After using SAMtools to convert bam to fastq, the salmon quantification mapping rate is super low: is it normal?
- icu version conflict with libboost
- salmon quantmerge skipped the nucleotide IDs that have multiple sequences - Metagenome dataset HOT 2
- Precompiled asset missing for version 1.10.1
- Configure error installing salmon HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from salmon.