nf-core / scrnaseq Goto Github PK
View Code? Open in Web Editor NEWA single-cell RNAseq pipeline for 10X genomics data
Home Page: https://nf-co.re/scrnaseq
License: MIT License
A single-cell RNAseq pipeline for 10X genomics data
Home Page: https://nf-co.re/scrnaseq
License: MIT License
Hi,
I ran the following using the star_index of the testdata (made by default run settings of STAR genomeGenerate)
nextflow run nf-core/scrnaseq --reads 'data/S10_L001/*_R{1,2}*.fastq.gz' -profile docker -r 1.0.0 --aligner star --type 10x --chemistry V3 --fasta data/references/GRCm38.p6.genome.chr19/GRCm38.p6.genome.chr19.fa --gtf data/references/GRCm38.p6.genome.chr19/gencode.vM19.annotation.chr19.gtf --max_memory 6.GB --max_cpus 2 --star_index data/references/GRCm38.p6.genome.chr19/star/
I get the following error:
Channel
star_indexhas been used twice as an output by process
makeSTARindex` and another operator
-- Check script '/root/.nextflow/assets/nf-core/scrnaseq/main.nf' at line: 365 or see '.nextflow.log' file for more details`
nf-core/scrnaseq
[insane_watson] - revision: 884e541 [1.0.0] / \ |__) |__ } { | \| | \__, \__/ | \ |___ \
-.,--,
.,.,'Pipeline Release : 1.0.0
Run Name : insane_watson
Reads : data/S10_L001/_R{1,2}.fastq.gz
Genome Reference : data/references/GRCm38.p6.genome.chr19/GRCm38.p6.genome.chr19.fa
GTF Reference : data/references/GRCm38.p6.genome.chr19/gencode.vM19.annotation.chr19.gtf
Save Reference? : false
Aligner : star
STARsolo Index : data/references/GRCm38.p6.genome.chr19/star/
Droplet Technology: 10x
Chemistry Version : V3
Max Resources : 6.GB memory, 2 cpus, 10d time per job
Container : docker - nfcore/scrnaseq:1.0.0
Shouldn't 'makeSTARindex' be switched off when using a star_index?
Best,
Momo
Hi @PeterBailey ,
I have just cloned the git repository of the "scrnaseq" pipeline and when I try to launch it, here the message I obtain :
nextflow run nf-core/scrnaseq -r 1.0.0 -profile conda,test
N E X T F L O W ~ version 19.10.0
Project nf-core/scrnaseq
contains uncommitted changes -- Cannot switch to revision: 1.0.0
May I have some help please ?
Thank you in advance
It says in environment.yml that version 2.7.3a is broken and therefore version 2.7.2c is used. I was wondering how you checked that 2.7.3a is broken.
Thanks!
Momo
The method used by the pipeline to generate the transcript to gene mapping file for alevin relies on specific positions of the GTF's attributes column. This means that if the order of attributes changes (as it is the case in Gencode vs Ensembl GTFs), the mapping file will be invalid.
It is possible to generate a valid mapping file for virtually any GTF using this script and skipping gene names. Then, one should be able to use option --txp2gene
with the path to the valid file. However, it seems that even when the option is set to the file path, the pipeline generates its own, potentially invalid, mapping file. If the file is invalid, it can be substituted by the valid one in the appropriate directory under ./work
.
One possible solution is to use the above python script to generate the mapping file, as it is already part of the pipeline.
Would be great to generate (extra?) output for the nf-core/scflow pipeline that allows using the output matrices from this pipeline directly in the scflow pipeline as input ๐๐ป @combiz and I discussed that a bit in the #scflow channel on slack.
Would require some bits added, but shouldn't be too much extra effort and would enable direct downstream analysis for quite some projects - which would benefit both the scrnaseq users and scflow users ;-)
Not done correctly for some gtfs. For example, gtf from Ensembl (e.g. ftp://ftp.ensembl.org/pub/release-94/gtf/homo_sapiens/Homo_sapiens.GRCh38.94.gtf.gz), produces:
head results/reference_data/alevin/txp2gene.tsv
5 ENSG00000223972
5 ENSG00000223972
5 ENSG00000227232
1 ENSG00000278267
5 ENSG00000243485
5 ENSG00000243485
1 ENSG00000284332
2 ENSG00000237613
2 ENSG00000237613
3 ENSG00000268020
I used the dev
branch of this nextflow pipeline and ran the following command:
nextflow run ../scrnaseq -profile awsbatch,test --awsregion us-east-1 --awsqueue nextflow-default --aligner kallisto -w 's3://egenesis-data-processed/dangeles/tmp'
However, I got the following error:
Error executing process > 'bustools_correct_sort (S10_L001_bus_output)'
Caused by:
Process `bustools_correct_sort (S10_L001_bus_output)` terminated with an error exit status (137)
Command executed:
bustools correct -w 10x_V3_barcode_whitelist -o S10_L001_bus_output/output.corrected.bus S10_L001_bus_output/output.bus
mkdir -p tmp
bustools sort -T tmp/ -t 2 -m 6G -o S10_L001_bus_output/output.corrected.sort.bus S10_L001_bus_output/output.corrected.bus
Command exit status:
137
Command output:
(empty)
Command error:
Found 6794880 barcodes in the whitelist
Processed 0 bus records
In whitelist = 0
Corrected = 0
Uncorrected = 0
.command.sh: line 4: 520 Killed bustools sort -T tmp/ -t 2 -m 6G -o S10_L001_bus_output/output.corrected.sort.bus S10_L001_bus_output/output.corrected.bus
Work dir:
s3://egenesis-data-processed/dangeles/tmp/c1/e3c21a3a97279df7ec1e6327202896
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
I also ran this using conda
, with the same results.
Any help would be appreciated. I am having other issues with bustools_correct_sort
, but it seems worthwhile to fix the test first and go from there.
Thank you very much!!!
Needed modules for this pipeline:
An error occurred with kallisto aligner during the generation of reference index. The error occurred in the call to ngs_tools/gtf/Segment.py
. It seems to have something to do with segment of zero length. I am using GRCh38.p14 fasta and gtf files from NCBI with appended ERCC transcripts. An update to required kb_python
version in
scrnaseq/modules/nf-core/modules/kallistobustools/ref/main.nf
Lines 5 to 8 in 0bf83a8
and
scrnaseq/modules/local/kallistobustools_count.nf
Lines 5 to 8 in 0bf83a8
will probably fix this.
Lioscro/ngs-tools#30
## Workflow execution completed unsuccessfully
Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (GCF_000001405.40_GRCh38.p14_genomic_ERCC92.fna.gz)'
Caused by:
Missing output file(s) `kb_ref_out.idx` expected by process `NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (GCF_000001405.40_GRCh38.p14_genomic_ERCC92.fna.gz)`
Command executed:
kb \
ref \
-i kb_ref_out.idx \
-g t2g.txt \
-f1 cdna.fa \
--workflow standard \
GCF_000001405.40_GRCh38.p14_genomic_ERCC92.fna.gz \
GCF_000001405.40_GRCh38.p14_genomic_ERCC92_no_gene_bar.gtf.gz
cat <<-END_VERSIONS > versions.yml
"NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF":
kallistobustools: $(echo $(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*$//')
END_VERSIONS
Command exit status:
0
Command output:
(empty)
Command error:
[2022-06-28 15:32:18,920] INFO [ref] Preparing GCF_000001405.40_GRCh38.p14_genomic_ERCC92.fna.gz, GCF_000001405.40_GRCh38.p14_genomic_ERCC92_no_gene_bar.gtf.gz
[2022-06-28 15:34:07,208] ERROR [main] An exception occurred
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/kb_python/main.py", line 856, in main
COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
File "/usr/local/lib/python3.9/site-packages/kb_python/main.py", line 168, in parse_ref
ref(
File "/usr/local/lib/python3.9/site-packages/ngs_tools/logging.py", line 62, in inner
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/kb_python/ref.py", line 393, in ref
gene_infos, transcript_infos = ngs.gtf.genes_and_transcripts_from_gtf(
File "/usr/local/lib/python3.9/site-packages/ngs_tools/gtf/__init__.py", line 190, in genes_and_transcripts_from_gtf
introns = exons.invert(transcript_interval)
File "/usr/local/lib/python3.9/site-packages/ngs_tools/gtf/SegmentCollection.py", line 108, in invert
Segment(self._segments[i].end, self._segments[i + 1].start)
File "/usr/local/lib/python3.9/site-packages/ngs_tools/gtf/Segment.py", line 27, in __init__
raise SegmentError(f'Invalid segment [{start}:{end})')
ngs_tools.gtf.Segment.SegmentError: Invalid segment [1095094:1095094)
Work dir:
s3://***
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
No response
https://bioconda.github.io/recipes/bioconductor-alevinqc/README.html
should be possible/easy to add nowadays ๐
Hi,
Thank you for a great tool. I was testing it out and am running into an issue as follows:
WARN: Found unexpected parameters:
* --reads: *_R{1,2}*fastq.gz
- Ignore this warning: params.schema_ignore_params = "reads"
Missing file pattern argument
My launch command is as follows:
nextflow run nf-core/scrnaseq --reads '*_R{1,2}*fastq.gz' --fasta human.fasta --gtf human.gtf -profile docker
Could you please let me know how to fix this?
Here are the core options
Core Nextflow options
revision : master
runName : distraught_morse
containerEngine : docker
container : nfcore/scrnaseq:1.1.0
launchDir : /mnt/irisgpfs/users/sbusi/apps/scrna
workDir : /mnt/irisgpfs/users/sbusi/apps/scrna/work
projectDir : /home/users/sbusi/.nextflow/assets/nf-core/scrnaseq
userName : sbusi
profile : docker
configFiles : /home/users/sbusi/.nextflow/assets/nf-core/scrnaseq/nextflow.config
Input/output options
input : null
input_paths : null
email : false
Thank you,
Susheel
I think it would be good to have SoupX or a similar tool as part of the pipeline.
Implement SoupX as a downstream process of all alignment methods.
Leave this kind of filtering and QC to downstream pipelines like #scflow.
TRUST4 claims they can call TCRs not only from VDJ-enriched 10x data, but also regular 10x 5' seq and other scRNA-seq protocols. Since that enables TCR analysis to a lot of datasets where this was not previously possible, I believe this would be a nice addition.
More details about TRUST4:
Run TRUST4 (enable explicitly, as not all datasets contain B or T cells) and create AIRR rearrangement output, and optionally a scirpy-compatible Anndata-file (related to #68)
Not sure if this is in scope of this pipeline, as not applicable to all datasets. Maybe also related to #bcellmagic?
Every time I run kallisto as a subworkflow, it crashes for memory reasons. However, unlike in #38, it crashes at the indexing stage instead of the bustools stage. It seems to be the same reason, though, as I get the same kind of error messages (I also tried hard-coding the memory requirements in version 1.1.0 and the pipeline then worked). The reason I see is that I ask for like 32GB of memory (cannot ask for 32G as I get string [32.G] does not match pattern ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$ (32.G)
) and kallisto wants to use 32G (and this cannot be changed for GB either).
$ nextflow run nf-core/scrnaseq -r dev --max_cpus 32 --max_memory '32.GB' --outdir /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/fulltest/ --protocol '10XV3' --aligner kallisto --transcript_fasta /shared/projects/bsbii/sc_single_cell_brain/sc_gene_models_ncbi_utrs.fasta --input 'test_samples_v2.csv' --genome_fasta /shared/projects/bsbii/sc_single_cell_brain/sc_ncbi_genome.fasta --gtf /shared/projects/bsbii/sc_single_cell_brain/sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf --kb_workflow 'nucleus' -profile ifb_core
Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (sc_ncbi_genome.fasta)'
Caused by:
Process `NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (sc_ncbi_genome.fasta)` terminated for an unknown reason -- Likely it has been terminated by the external system
Command executed:
kb \
ref \
-i kb_ref_out.idx \
-g t2g.txt \
-f1 cdna.fa \
-f2 intron.fa \
-c1 cdna_t2c.txt \
-c2 intron_t2c.txt \
--workflow nucleus \
sc_ncbi_genome.fasta \
sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf
cat <<-END_VERSIONS > versions.yml
"NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF":
kallistobustools: $(echo $(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*$//')
END_VERSIONS
Command exit status:
-
Command output:
(empty)
Command error:
[2022-06-21 09:31:39,900] INFO [ref_lamanno] Preparing sc_ncbi_genome.fasta, sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf
[2022-06-21 09:32:13,709] INFO [ref_lamanno] Splitting genome sc_ncbi_genome.fasta into cDNA at tmp/tmp74iebqht
[2022-06-21 09:32:53,465] INFO [ref_lamanno] Creating cDNA transcripts-to-capture at tmp/tmp4_dnckwg
[2022-06-21 09:32:53,805] INFO [ref_lamanno] Splitting genome into introns at tmp/tmpb18oat1d
[2022-06-21 09:38:41,370] INFO [ref_lamanno] Creating intron transcripts-to-capture at tmp/tmpmbj8zrjy
[2022-06-21 09:38:51,358] INFO [ref_lamanno] Concatenating 1 cDNA FASTAs to cdna.fa
[2022-06-21 09:38:51,770] INFO [ref_lamanno] Concatenating 1 cDNA transcripts-to-captures to cdna_t2c.txt
[2022-06-21 09:38:51,792] INFO [ref_lamanno] Concatenating 1 intron FASTAs to intron.fa
[2022-06-21 09:39:06,987] INFO [ref_lamanno] Concatenating 1 intron transcripts-to-captures to intron_t2c.txt
[2022-06-21 09:39:07,161] INFO [ref_lamanno] Concatenating cDNA and intron FASTAs to tmp/tmpvz8czi6v
[2022-06-21 09:39:22,955] INFO [ref_lamanno] Creating transcript-to-gene mapping at t2g.txt
[2022-06-21 09:39:37,502] INFO [ref_lamanno] Indexing tmp/tmpvz8czi6v to kb_ref_out.idx
Work dir:
/shared/ifbstor1/projects/bsbii/sc_single_cell_brain/work/72/f969fb35db25b832205956436bb6e5
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
No response
Nextflow version : 22.04.0
Hardware : HPC
Executor : Slurm
Container : Singularity
OS : CentOS
Version of nf-core/scrnaseq : 2.0.0 or dev
When running the pipeline using Alevin, I get the following error:
.command.sh: line 2: 551040 Segmentation fault (core dumped) salmon alevin -l ISR -1 samplesheet.csv -2 null --chromium -i salmon_index -o samplesheet_alevin_results -p 8 --tgMap txp2gene.tsv --dumpFeatures โ-dumpMtx
However, when I run Alevin directly (conda install) with the fastq files supplied directly, it works, eg:
salmon alevin -l ISR -1 data/SRR18163494_1.fastq.gz -2 data/SRR18163494_2.fastq.gz --chromium -i salmon_index -o samplesheet_alevin_results -p 8 --tgMap txp2gene.tsv --dumpFeatures โ-dumpMtx
I think it might be something to do with Alevin not being able to parse the sample sheet? Have I made an obvious mistake or is this a genuine bug?
Steps to reproduce the behaviour:
Command line:
nextflow run nf-core/scrnaseq -r 1.1.0 \
--input samplesheet.csv --fasta GRCh38.primary_assembly.genome.fa \
--gtf gencode.v39.primary_assembly.annotation.gtf \
-profile singularity,slurm -c custom_profile.conf
Samplespreadsheet.csv: (I have tried with and without headers, different strandedness)
sample,fastq_1,fastq_2,strandedness
SRR18163494,data/SRR18163494_1.fastq.gz,data/SRR18163494_2.fastq.gz,unstranded
SRR18163495,data/SRR18163495_1.fastq.gz,data/SRR18163495_2.fastq.gz,unstranded
Error:
.command.sh: line 2: 551040 Segmentation fault (core dumped) salmon alevin -l ISR -1 samplesheet.csv -2 null --chromium -i salmon_index -o samplesheet_alevin_results -p 8 --tgMap txp2gene.tsv --dumpFeatures โ-dumpMtx
I had expected this step to run to completion.
alevin-fry
is the successor of alevin
and this pipeline should use the latest-greatest version!
@rob-p pointed out on slack that he might have some people working on adding it here.
alevin
subworkflow with an alevin-fry
subworkflow.Discussion on slack:
https://nfcore.slack.com/archives/CHN5BV5DW/p1643207648031500
From slack:
I'm wondering how the samplesheet for scrnaseq should look like that it works for all four pipeline branches (kb/alevin/starsolo/cellranger):
When running in kallisto mode
, the pipeline fails at the bustools sort
step.
I think the problem is a bug with bustools
. If I replace the memory allocated to bustools
(set via config.base
) with a hard-coded memory slightly less than the config.base
memory, the pipeline runs without issues.
Got this via email:
This:
cdna_read = reads[0]
barcode_read = reads[1]
"""
STAR --genomeDir $index \\--sjdbGTFfile $gtf \\--readFilesIn $barcode_read $cdna_read \\
should be:
barcode_read = reads[0]
cdna_read = reads[1]
"""
STAR --genomeDir $index \\--sjdbGTFfile $gtf \\--readFilesIn $cdna_read $barcode_read \\
--
I'm facing this issue when trying to align with kallisto
:
Error executing process > 'bustools_correct_sort (E9_5_extraembryonic_bus_output)'
Caused by:
Process `bustools_correct_sort (E9_5_extraembryonic_bus_output)` terminated with an error exit status (139)
Command executed:
bustools correct -w 10x_V3_barcode_whitelist -o E9_5_extraembryonic_bus_output/output.corrected.bus E9_5_extraembryonic_bus_output/output.bus
mkdir -p tmp
bustools sort -T tmp/ -t 8 -m 64G -o E9_5_extraembryonic_bus_output/output.corrected.sort.bus E9_5_extraembryonic_bus_output/output.corrected.bus
Command exit status:
139
Command output:
(empty)
Command error:
Found 6794880 barcodes in the whitelist
Processed 0 bus records
In whitelist = 0
Corrected = 0
Uncorrected = 0
Read in 0 BUS records
.command.sh: line 4: 16827 Segmentation fault bustools sort -T tmp/ -t 8 -m 64G -o E9_5_extraembryonic_bus_output/output.corrected.sort.bus E9_5_extraembryonic_bus_output/output.corrected.bus
It seems that bustools is a memory intensive task, so this issue could be solved by allowing the user to set parameter -m
.
I've attached the associated log file.
nextflow.log
Whenever I launch scrnaseq with --aligner kallisto
, it crashes at the counting step with the following error message: kb count: error: the following arguments are required: -c1, -c2
. Is something missing from the previous steps?
nextflow run nf-core/scrnaseq -resume --max_memory '500.GB' -r dev -c /shared/home/hmayeur/.nextflow/config --outdir /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/fulltest/ --protocol '10XV3' --aligner kallisto --transcript_fasta /shared/projects/bsbii/sc_single_cell_brain/sc_gene_models_ncbi_utrs.fasta --input 'test_samples_v2.csv' --genome_fasta /shared/projects/bsbii/sc_single_cell_brain/sc_ncbi_genome.fasta --gtf /shared/projects/bsbii/sc_single_cell_brain/sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf --kb_workflow 'nucleus' -profile ifb_core
No response
Nextflow version : 22.04.0
Hardware : HPC
Executor : Slurm
Container : Singularity
OS : CentOS
Version of nf-core/scrnaseq : 2.0.0 or dev
The Kallisto branch might need a fix. Currently, on running the kallisto branch on the test dataset, I get an empty genes.mtx
and tcc.mtx
. I think this is because of a mismatch between how the transcripts are coded in the transcritps_to_gene.txt
:
$> head ./results/kallisto_gene_map/transcripts_to_genes.txt
ENSMUST00000190575 ENSMUSG00000100969 1700030N03Rik
ENSMUST00000189643 ENSMUSG00000100969 1700030N03Rik
ENSMUST00000191405 ENSMUSG00000100969 1700030N03Rik
ENSMUST00000190668 ENSMUSG00000100969 1700030N03Rik
and in the transcriptome fasta:
$> head ./results/extract_transcriptome/GRCm38.p6.genome.chr19.fa.transcriptome.fa
>ENSMUST00000190575.6 gene_type=lincRNA gene_name=1700030N03Rik transcript_type=lincRNA transcript_name=1700030N03Rik-203 level=2 transcript_support_level=1 tag=basic havana_gene=OTTMUSG00000045196.2 havana_transcript=OTTMUST00000118746.1
The transcripts_to_gene.txt
does not have the version number appended and that is why I think the pipeline is NOT completing successfully.
The pipeline does not detect when I provide a transcriptome fasta file. Instead, it produces the following error:
$ /home/jashmore/anaconda3/envs/nextflow/bin/nextflow run nf-core/scrnaseq --reads 'data/reads/*_R{1,2}.fastq.gz' --type 10x --chemistry V2 --aligner alevin --salmon_index data/genomes/index/salmon --txp2gene data/genomes/txp2gene.csv --transcriptome_fasta data/genomes/transcriptome.fa
N E X T F L O W ~ version 19.10.0
Launching `nf-core/scrnaseq` [marvelous_mcclintock] - revision: 884e541285 [master]
Neither of --fasta or --transcriptome provided! At least one must be provided to quantify genes
The error message is also slightly wrong as there is actually no argument called "--transcriptome", instead it is "--transcriptome_fasta" in the documentation. I think the error may be caused by a mistake in the variable naming in the main.nf file (lines 107 - 110). Shouldn't the params variable refer to params.transcriptome_fasta not params.transcript_fasta?
if (!params.fasta && !params.transcript_fasta){
exit 1, "Neither of --fasta or --transcriptome provided! At least one must be provided to quantify genes"
}
Thank you,
James
Just noticed this when trying to get the logo for a presentation.
Ahoy pirates,
sister issue for theislab/sfaira#392
We would suggest to add a process to output an AnnData object. This would allow us to add new fields to Sfaira dataloaders to indicate how the raw data was processed. Detailed discussion and context in the issue above.
An option to run demuxlet or freemuxlet (Kang et al. 2018) with an input vcf file. These tools allow researchers to combine populations from different individuals into a single 10x channel and computationally deconvolve them, by using their genotypes (with demuxlet) or by naively looking for genetic differences (freemuxlet). The output from this module could be used to add labels to cells of the experiment and remove doublets.
Running demuxlet/freemuxlet post-hoc on the bam file generated from this pipeline.
Tutorial: https://www.kallistobus.tools/velocity_tutorial.html
The sticking point is getting introns programmatically from every possible GTF. I've had success with gffutils.FeatureDB.create_introns before but haven't implemented it yet
We should consider adding STAR-solo to this.
Hello,
I am trying to get the test profile up and running and came across the following error:
Ran this:
nextflow run nf-core/scrnaseq -profile test
Error:
Command error:
Loading required package: alevinQC
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
Loading required package: tximport
Error in .checkPandoc(ignorePandoc) : pandoc-citeproc is not available!
Calls: alevinQCReport -> .checkPandoc
Execution halted
Not really sure what to make of this error, but was hoping you might have some insight.
The scrna pipeline runs, but appears to fail when running the alevin_qc.r plot.
Steps to reproduce the behaviour:
Command line:
nextflow run nf-core/scrnaseq -r 1.1.0 --input './SLX-21534*_{1,2}.fq.gz' --fasta fly.fa --gtf /public/genomics/species_references/nextflow/Genome_References/Ensembl/drosophila_melanogaster/BDGP6.32/Release_105/GTF/Drosophila_melanogaster.BDGP6.32.105.gtf -config /public/singularity/containers/nextflow/lmb-nextflow/lmb.config --outdir results -dsl1 -bg
See error:
Execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 140.
The full error message was:
Error executing process > 'alevin_qc (SLX-21534.SITTG11.H37MYDRX2.s_1.r)'
Caused by:
Process alevin_qc (SLX-21534.SITTG11.H37MYDRX2.s_1.r)
terminated with an error exit status (140)
Command executed:
mv 10x_V3_barcode_whitelist SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results/alevin/whitelist.txt
alevin_qc.r SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results SLX-21534.SITTG11.H37MYDRX2.s_1.r SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results
Command exit status:
140
Command output:
(empty)
Command error:
Loading required package: alevinQC
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
Loading required package: tximport
Reading Alevin output files...
Generating summary tables...
Generating knee plot...
Work dir:
/beegfs3/swingett/testing/scrna_seq/work/14/d79f9ee52236d2f715daa068a10c24
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh
The file .command.sh contains the text:
[swingett@hal d79f9ee52236d2f715daa068a10c24]$ cat .command.sh
#!/bin/bash -euo pipefail
mv 10x_V3_barcode_whitelist SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results/alevin/whitelist.txt
alevin_qc.r SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results SLX-21534.SITTG11.H37MYDRX2.s_1.r SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results
A would have expected a QC plot to have been generated here.
Have you provided the following extra information/files:
The log file is attached
launched the pipeline -->
Also, on the page https://nf-co.re/scrnaseq/1.1.0/output the useful link "excellent tutorial" is broken: https://www.kallistobus.tools/getting_started.html
Thanks,
Steven
Currently adding information about 10X version/chemistry is necessary. If this is not known a priori from the meta data, this might be helpful.
Just check the length of R1 (since the length of barcodes changed from 10Xv2 to 10Xv3).
As suggested by @grst on Slack: "Once we have cellranger, it should also support (10x) cite-seq out-of-the-box. IDK how difficult it would be to support this with the other aligners."
It appears that we already have a modules for cellranger
in the dev
version of the pipeline.
h5ad/seurat are becoming default file formats for scRNAseq. It would be nice if the pipeline outputs one of the two or even both.
Output to h5ad and seurat (could be controllable via a flag.)
Pytest workflow makes it easier to maintain larger test suites and also allows to check outputs.
cf. nf-core/tools#605, nf-core/rnaseq#546, https://github.com/nf-core/sarek/tree/dev/tests
Testing scenarios:
Hi!
this is not necessarily an issue with the pipeline, but in order to streamline the documentation group next week for the hackathon, I'm opening issues in all repositories / pipeline repos that might need this update to switch from parameter docs to auto-generated documentation based on the JSON schema.
This will then supersede any further parameter documentation, thus making things a bit easier :-)
If this doesn't apply (anymore), please close the issue. Otherwise, I'm hoping to have some helping hands on this next week in the documentation team on Slack https://nfcore.slack.com/archives/C01QPMKBYNR
When I run the following command:
export NXF_OPTS='-Xms1g -Xmx4g'
nextflow run -r 1.0.0 nf-core/scrnaseq\
--reads 'data/H204SC19123229/raw_data/*/*_R{{1,2}}_001.fastq.gz'\
--aligner kallisto\
--type 10x\
--chemistry V3\
--genome GRCh37\
--outdir 'results/expression_counts'\
-profile singularity
I get the following error:
N E X T F L O W ~ version 19.10.0
Launching `nf-core/scrnaseq` [sharp_shaw] - revision: 884e541285 [1.0.0]
Unknown config attribute `params.genomes.GRCh37.salmon_index` -- check config file: ...
I have tried changing the -r
argument to dev
, using --genome GRCh38
, and supplying my own --gtf
file, but I always get this error.
I'm not trying to use salmon?
I think it could be nice to have a step to distinguish empty droplets from actual cells.
As far as I know, Alevin/Kallisto only perform cell calling based on "knee plots", while cellranger implements the emptyDrops algorithm. According to the emptyDrops paper the method clearly outperforms filtering based on knee plots.
Implement a process downstream of the aligner subworkflows running the emptyDrops algorithm.
This kind of filtering could be left to downstream pipelines such as #scflow.
However, IMO, it would still make sense to have this as a default even when not using scflow for downstream analysis.
STARsolo implements the emptydrops algorithm as of version 2.7.8a which can be activated using the --soloCellFilter EmptyDrops_CR
option: https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#emptydrop-like-filtering
I don't know if emptyDrops
is still state-of-the-art of if there's something more advanced by now.
Hi there!
I am not sure if the BUG label is appropriate, I guess it should be more of a HELP WANTED label.
I am running into an issue that I cannot figure out how to fix. It seems to be related to the gtf and fasta files that are provided. I used freshly downloaded both following this https://ewels.github.io/AWS-iGenomes/ guide. See below for all pipeline parameter inputs.
**Error executing process > 'extract_transcriptome (genome.fa)'
Caused by:
Process `extract_transcriptome (genome.fa)` terminated with an error exit status (1)
Command executed:
gffread -F genes.gtf -w "genome.fa.transcriptome.fa" -g genome.fa
Command exit status:
1
Command output:
(empty)
Command error:
FASTA index file genome.fa.fai created.
Warning: couldn't find fasta record for 'GL000191.1'!
Error: no genomic sequence available (check -g option!).**
I have checked the following places for your error:
------------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/scrnaseq v1.1.0
------------------------------------------------------
Core Nextflow options
revision : master
runName : adoring_mahavira
containerEngine : docker
container : nfcore/scrnaseq:1.1.0
launchDir : /home/chroer/Projects/MTC/test_data
workDir : /home/chroer/Projects/MTC/test_data/work
projectDir : /root/.nextflow/assets/nf-core/scrnaseq
userName : root
profile : docker
configFiles : /root/.nextflow/assets/nf-core/scrnaseq/nextflow.config
Input/output options
input : /home/chroer/Projects/MTC/test_data/fastq/*[1,2]*.fastq
input_paths : null
outdir : /home/chroer/Projects/MTC/test_data/
email : false
Mandatory arguments
barcode_whitelist : false
Reference genome options
genome : false
fasta : /home/chroer/Projects/MTC/test_data/references/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa
transcript_fasta : false
gtf : /home/chroer/Projects/MTC/test_data/references/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf
save_reference : false
Alevin Options
salmon_index : false
txp2gene : false
STARSolo Options
star_index : false
Kallisto/BUS Options
kallisto_gene_map : false
bustools_correct : true
kallisto_index : false
Generic options
email_on_fail : false
max_multiqc_email_size : 25 MB
multiqc_config : false
Max job request options
max_memory : 128 GB
max_time : 10d
Institutional config options
config_profile_name : false
config_profile_description: false
config_profile_contact : false
config_profile_url : false
[Only displaying parameters that differ from pipeline default]
------------------------------------------------------
------------------------------------------------------
executor > local (3)
[23/53e4a2] process > get_software_versions [100%] 1 of 1 โ
[- ] process > unzip_10x_barcodes [ 0%] 0 of 1
[00/fb1078] process > extract_transcriptome (genome.fa) [100%] 1 of 1, failed: 1 โ
[- ] process > build_salmon_index -
[- ] process > makeSTARindex -
[- ] process > build_kallisto_index -
[- ] process > build_gene_map -
[a8/af51c6] process > build_txp2gene (genes.gtf) [100%] 1 of 1 โ
[- ] process > alevin -
[- ] process > alevin_qc -
[- ] process > star -
[- ] process > kallisto -
[- ] process > bustools_correct_sort -
[- ] process > bustools_count -
[- ] process > bustools_inspect -
[- ] process > multiqc [ 0%] 0 of 1
[- ] process > output_documentation [ 0%] 0 of 1
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/scrnaseq] Pipeline completed with errors-
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
I have checked the following places for your error:
STARsolo uses 10X-V2 chemistry, regardless of what is specified.
Steps to reproduce the behaviour:
nextflow run nf-core/scrnaseq -r 1.1.0 -params-file nf-params.json
nf-params.json
{
"chemistry": "V3",
"input": "./data/*_{1,2}.fastq.gz",
"fasta": "./data/genome.fa",
"gtf": "./data/genes.gtf",
"aligner": "star",
}
Where: genome.fa
is from http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M27/GRCm39.genome.fa.gz
; genes.gtf
is from http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M27/gencode.vM27.annotation.gtf.gz
; and the fastq files are from https://www.ncbi.nlm.nih.gov/sra/?term=SRR14597268
[13/99028a] process > get_software_versions [100%] 1 of 1 โ
[a5/5fcfb4] process > unzip_10x_barcodes (V3) [100%] 1 of 1 โ
[- ] process > extract_transcriptome -
[- ] process > build_salmon_index -
[8e/00790d] process > makeSTARindex (genome.fa) [100%] 1 of 1 โ
[- ] process > build_kallisto_index -[- ] process > build_gene_map -
[- ] process > build_txp2gene -
[- ] process > alevin -[- ] process > alevin_qc -
[bd/676904] process > star (SRR14597268_1) [100%] 2 of 2, failed: 2, retri..
[- ] process > kallisto -[- ] process > bustools_correct_sort -
[- ] process > bustools_count -[- ] process > bustools_inspect -
[- ] process > multiqc -
[ae/de0382] process > output_documentation [100%] 1 of 1 โ
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/scrnaseq] Pipeline completed with errors-
[8a/4103a7] NOTE: Process `star (SRR14597268_1)` terminated with an error exit status (104) -- Execution is retried (1)
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'star (SRR14597268_1)'
Caused by:
Process requirement exceed available memory -- req: 128 GB; avail: 124.4 GB
Command executed:
STAR --genomeDir star \
--sjdbGTFfile genes.gtf \
--readFilesIn SRR14597268_2.fastq.gz SRR14597268_1.fastq.gz \
--runThreadN 10 \
--twopassMode Basic \
--outWigType bedGraph \
--outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 137338953472 \
--readFilesCommand zcat \
--runDirPerm All_RWX \
--outFileNamePrefix SRR14597268_1 \
--soloType Droplet \
--soloCBwhitelist 10x_V3_barcode_whitelist
samtools index SRR14597268_1Aligned.sortedByCoord.out.bam
Command exit status:
-
Command output:
Jun 29 21:52:14 ..... started STAR run
Jun 29 21:52:15 ..... loading genome
Jun 29 21:52:30 ..... processing annotations GTF
Jun 29 21:52:39 ..... inserting junctions into the genome indices
Jun 29 21:54:04 ..... started 1st pass mapping
Jun 29 21:54:05 ..... finished 1st pass mapping
Jun 29 21:54:05 ..... inserting junctions into the genome indices
Jun 29 21:55:34 ..... started mapping
Command error:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 28 not equal to expected 26
Read [email protected] 1 N 0 Sequence=CAGGCNAGTCCAACGCCCTTCTGCCTTT
SOLUTION: make sure that the barcode read is the second in --readFilesIn and check that is has the correct formatting
If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength
Jun 29 21:55:35 ...... FATAL ERROR, exiting
According to the STAR readme,
The default barcode lengths (CB=16b, UMI=10b) work for 10X Chromium V2. For V3, specify:
--soloUMIlen 12
This option is not specified by the pipeline. The STAR script should differ by the chemistry, as per https://www.biostars.org/p/462568/
10x v1
Whitelist, 737K-april-2014_rc.txt
CB length, 14
UMI start, 15
UMI length, 10 (courtesy ATpoint)10X v2
Whitelist, 737K-august-2016.txt
CB length, 16
UMI start, 17
UMI length, 1010x v3
Whitelist, 3M-Feb_2018_V3.txt
CB length, 16
UMI start, 17
UMI length, 12
Would be happy to write a PR
I have checked the following places for your error:
Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (GRCm38.p6.genome.chr19.fa)'
Caused by:
No such property: launchDir for class: java.lang.String
Source block:
if (workflow == "standard") {
"""
kb \\
ref \\
-i kb_ref_out.idx \\
-g t2g.txt \\
-f1 cdna.fa \\
--workflow $workflow \\
$fasta \\
$gtf
cat <<-END_VERSIONS > versions.yml
${getProcessName(task.process)}:
${getSoftwareName(task.process)}: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//')
END_VERSIONS
"""
} else {
"""
kb \\
ref \\
-i kb_ref_out.idx \\
-g t2g.txt \\
-f1 cdna.fa \\
-f2 intron.fa \\
-c1 cdna_t2c.txt \\
-c2 intron_t2c.txt \\
--workflow $workflow \\
$fasta \\
$gtf
cat <<-END_VERSIONS > versions.yml
${getProcessName(task.process)}:
${getSoftwareName(task.process)}: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//')
END_VERSIONS
"""
}
Work dir:
nextflow run nf-core/scrnaseq -r dev -profile test,<docker/singularity/...>
Should not be triggering an error ๐๐ป
Have you provided the following extra information/files:
@alexblaessle initially found it, reproduced it with the test profile. Might also be that the module for kallistobustools is not function properly ๐ค
Some convenient command line scripts: https://pypi.org/project/scanpy-scripts/
You're probably already working on this, but would it be possible to add CellRanger to nf-core/scrnaseq to perform alignment and preprocessing? If there is interest I would be happy to create a PR to dev
with cellranger count
output filtered_feature_bc_matrix
that is processed by seurat
.
Hi,
I'm trying to run kallisto, using a precomputed index, but it keeps failing.
nextflow run nf-core/scrnaseq --reads 'fastq/*_R{1,2}_001.fastq.gz' \
--aligner "kallisto" \
--kallisto_gene_map resources/transcripts_to_genes.txt \
--kallisto_index resources/Homo_sapiens.GRCh38.cdna.all.idx \
--chemistry "V2" \
--barcode_whitelist resources/10xv2_whitelist.txt \
--outdir results \
--type 10x \
-profile docker
I get Must provide either a GTF file ('--gtf') or transcript to gene mapping ('--txp2gene') to align with Alevin
, which is slightly weird since im not trying to run Alevin.
Adding --gtf resources/Homo_sapiens.GRCh38.96.gtf
, gives me Neither of --fasta or --transcriptome provided! At least one must be provided to quantify genes
Adding --transcriptome_fasta resources/Homo_sapiens.GRCh38.cdna.all.fa.gz
runs into this issue #20
Anyways, neither --gtf
nor --transcriptome_fasta
should be needed for kallisto with a precomputed index!
I tried to backtrack the issue in the main.nf
but no luck, the logic is quite complicated (I just started using nextflow two days ago)
I find quality control reports per sample (number of detected cells, number of genes per cell etc.) very useful to
get a first idea if the sequencing experiment was successful. As far as I can tell this is currently only implemented for Salmon/Alevin using the AlevinQC module.
Even better would be, if in addition to one report per sample, a summary of the results would show in the multiqc report. This would be particularly useful for experiments with many samples (no tedious checking of the individual reports).
Tested on DSL2 version in dev.
Either update modules to handle gzip files, or include an optional gunzip process.
The pipeline should bet set-up to run with default genomes and genome annotations (iGenomes?).
Where possible, also pre-built indexes should be used.
This feature was temporarily removed in #76 and needs to be re-added.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.