nf-core / scrnaseq Goto Github PK

View Code? Open in Web Editor NEW

169.0 124.0 147.0 37.51 MB

A single-cell RNAseq pipeline for 10X genomics data

Home Page: https://nf-co.re/scrnaseq

License: MIT License

HTML 1.49% R 1.96% Python 14.73% Nextflow 65.10% Groovy 16.72%

nf-core nextflow workflow pipeline 10x-genomics 10xgenomics alevin bustools cellranger kallisto

scrnaseq's People

Contributors

Stargazers

Watchers

Forkers

peterbailey apeltzer egenesis yyxql crazyhottommy zorrodong marencc meenasub santiagorevale r78v10a07 raonyguimaraes amrr101 mvattiku nkm47 standardgalactic ggabernet sk-sahu kevinmenden eijynagai fredhutch jennomics rpolicastro dongfang1021 medulka rhreynolds daisyhan97 genostack lifebit-ai alaindomissy fbdtemme mandyzhang6 shashidhar22 johnchamberlin adeslatt talbulus dicerna grst matt-sd-watson hernet kane9530 bbyun28 truwl mphan-exs lux563624348 pulintz-um drpatelh xianggenti esrice mribeirodantas wikk-chy larondeinc vivian-chen16 gerbeldo mhptr rnaimehaom khajidu isaclee rlinchangco kristian-a biomage-org ogibson hasihays ameynert swangam tomkellygenetics mliu-miltenyi l-modolo ryankim3gilead smwasya sparktx-data-science sandrasaliger rargelaguet jokerx6696 abyssum sandrocarollo goodwright ashwini-girish robsyme zhewa joaodemeirelles villani-lab bingli2019 stan0610 adamrtalbot rnandety sclan heylf john-lee-johnson johnmma philge rifius shinthor rhodesch brickmanlab maxulysse mcjmigdal realyuyangyang gowanaka tzhang-nmdp glichtenstein

scrnaseq's Issues

Channel 'star_index' used twice

Hi,

I ran the following using the star_index of the testdata (made by default run settings of STAR genomeGenerate)

nextflow run nf-core/scrnaseq --reads 'data/S10_L001/*_R{1,2}*.fastq.gz' -profile docker -r 1.0.0 --aligner star --type 10x --chemistry V3 --fasta data/references/GRCm38.p6.genome.chr19/GRCm38.p6.genome.chr19.fa --gtf data/references/GRCm38.p6.genome.chr19/gencode.vM19.annotation.chr19.gtf --max_memory 6.GB --max_cpus 2 --star_index data/references/GRCm38.p6.genome.chr19/star/

I get the following error:

Channel star_indexhas been used twice as an output by processmakeSTARindex` and another operator

-- Check script '/root/.nextflow/assets/nf-core/scrnaseq/main.nf' at line: 365 or see '.nextflow.log' file for more details`

Information output:
N E X T F L O W ~ version 20.01.0
Launching `nf-core/scrnaseq` [insane_watson] - revision: `884e541` [1.0.0]
[2m----------------------------------------------------
,--./,-.
_ _ /,-..--~'
|\ | | / `/ \ |) | } { | \| | \, \/ | \ |_ \`-.,-`-,` .,.,'
nf-core/scrnaseq v1.0.0

Pipeline Release : 1.0.0
Run Name : insane_watson
Reads : data/S10_L001/_R{1,2}.fastq.gz
Genome Reference : data/references/GRCm38.p6.genome.chr19/GRCm38.p6.genome.chr19.fa
GTF Reference : data/references/GRCm38.p6.genome.chr19/gencode.vM19.annotation.chr19.gtf
Save Reference? : false
Aligner : star
STARsolo Index : data/references/GRCm38.p6.genome.chr19/star/
Droplet Technology: 10x
Chemistry Version : V3
Max Resources : 6.GB memory, 2 cpus, 10d time per job
Container : docker - nfcore/scrnaseq:1.0.0

Shouldn't 'makeSTARindex' be switched off when using a star_index?

Best,
Momo

Cannot switch to revision: 1.0.0

Hi @PeterBailey ,

I have just cloned the git repository of the "scrnaseq" pipeline and when I try to launch it, here the message I obtain :

nextflow run nf-core/scrnaseq -r 1.0.0 -profile conda,test
N E X T F L O W ~ version 19.10.0
Project nf-core/scrnaseq contains uncommitted changes -- Cannot switch to revision: 1.0.0

May I have some help please ?

Thank you in advance

Question: STAR version 2.7.3a

It says in environment.yml that version 2.7.3a is broken and therefore version 2.7.2c is used. I was wondering how you checked that 2.7.3a is broken.

Thanks!
Momo

Problems with alevin's tx2gene mapping generation

The method used by the pipeline to generate the transcript to gene mapping file for alevin relies on specific positions of the GTF's attributes column. This means that if the order of attributes changes (as it is the case in Gencode vs Ensembl GTFs), the mapping file will be invalid.

It is possible to generate a valid mapping file for virtually any GTF using this script and skipping gene names. Then, one should be able to use option --txp2gene with the path to the valid file. However, it seems that even when the option is set to the file path, the pipeline generates its own, potentially invalid, mapping file. If the file is invalid, it can be substituted by the valid one in the appropriate directory under ./work.

One possible solution is to use the above python script to generate the mapping file, as it is already part of the pipeline.

Make output compatible with nf-core/scflow

nf-core/scrnaseq feature request

Describe the solution you'd like

Would be great to generate (extra?) output for the nf-core/scflow pipeline that allows using the output matrices from this pipeline directly in the scflow pipeline as input 👍🏻 @combiz and I discussed that a bit in the #scflow channel on slack.

Would require some bits added, but shouldn't be too much extra effort and would enable direct downstream analysis for quite some projects - which would benefit both the scrnaseq users and scflow users ;-)

Parsing of gtf to generate transcript to gene mapping file for Alevin

Not done correctly for some gtfs. For example, gtf from Ensembl (e.g. ftp://ftp.ensembl.org/pub/release-94/gtf/homo_sapiens/Homo_sapiens.GRCh38.94.gtf.gz), produces:

head results/reference_data/alevin/txp2gene.tsv
5       ENSG00000223972
5       ENSG00000223972
5       ENSG00000227232
1       ENSG00000278267
5       ENSG00000243485
5       ENSG00000243485
1       ENSG00000284332
2       ENSG00000237613
2       ENSG00000237613
3       ENSG00000268020

Test with kallisto fails

I used the dev branch of this nextflow pipeline and ran the following command:

nextflow run ../scrnaseq -profile awsbatch,test --awsregion us-east-1 --awsqueue nextflow-default --aligner kallisto -w 's3://egenesis-data-processed/dangeles/tmp'

However, I got the following error:

Error executing process > 'bustools_correct_sort (S10_L001_bus_output)'

Caused by:
  Process `bustools_correct_sort (S10_L001_bus_output)` terminated with an error exit status (137)

Command executed:

  bustools correct -w 10x_V3_barcode_whitelist -o S10_L001_bus_output/output.corrected.bus S10_L001_bus_output/output.bus
  mkdir -p tmp
  bustools sort -T tmp/ -t 2 -m 6G -o S10_L001_bus_output/output.corrected.sort.bus S10_L001_bus_output/output.corrected.bus

Command exit status:
  137

Command output:
  (empty)

Command error:
  Found 6794880 barcodes in the whitelist
  Processed 0 bus records
  In whitelist = 0
  Corrected = 0
  Uncorrected = 0
  .command.sh: line 4:   520 Killed                  bustools sort -T tmp/ -t 2 -m 6G -o S10_L001_bus_output/output.corrected.sort.bus S10_L001_bus_output/output.corrected.bus

Work dir:
  s3://egenesis-data-processed/dangeles/tmp/c1/e3c21a3a97279df7ec1e6327202896

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

I also ran this using conda, with the same results.

Any help would be appreciated. I am having other issues with bustools_correct_sort, but it seems worthwhile to fix the test first and go from there.

Thank you very much!!!

Add BioConda recipe for alevinQC

Incorrect name of Alevin option in docs

According to the docs, the parameter that allows the specification of a transcript to gene mapping file for Salmon Alevin and AlevinQC is --txp2gene_alevin but it should be --txp2gene (see main.nf).

Modules for scrnaseq

Needed modules for this pipeline:

ngs_tools.gtf.Segment.SegmentError: Invalid segment with kallisto aligner

Description of the bug

An error occurred with kallisto aligner during the generation of reference index. The error occurred in the call to ngs_tools/gtf/Segment.py. It seems to have something to do with segment of zero length. I am using GRCh38.p14 fasta and gtf files from NCBI with appended ERCC transcripts. An update to required kb_python version in

scrnaseq/modules/nf-core/modules/kallistobustools/ref/main.nf

Lines 5 to 8 in 0bf83a8

 conda (params.enable_conda ? 'bioconda::kb-python=0.26.3' : null) 

 container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 

  'https://depot.galaxyproject.org/singularity/kb-python:0.26.3--pyhdfd78af_0' : 

  'quay.io/biocontainers/kb-python:0.26.3--pyhdfd78af_0' }"

and

scrnaseq/modules/local/kallistobustools_count.nf

Lines 5 to 8 in 0bf83a8

 conda (params.enable_conda ? "bioconda::kb-python=0.25.1" : null) 

 container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 

  'https://depot.galaxyproject.org/singularity/kb-python:0.25.1--py_0' : 

  'quay.io/biocontainers/kb-python:0.25.1--py_0' }"

will probably fix this.
Lioscro/ngs-tools#30

Command used and terminal output

## Workflow execution completed unsuccessfully

Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (GCF_000001405.40_GRCh38.p14_genomic_ERCC92.fna.gz)'

Caused by:
  Missing output file(s) `kb_ref_out.idx` expected by process `NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (GCF_000001405.40_GRCh38.p14_genomic_ERCC92.fna.gz)`

Command executed:

  kb \
      ref \
      -i kb_ref_out.idx \
      -g t2g.txt \
      -f1 cdna.fa \
      --workflow standard \
      GCF_000001405.40_GRCh38.p14_genomic_ERCC92.fna.gz \
      GCF_000001405.40_GRCh38.p14_genomic_ERCC92_no_gene_bar.gtf.gz
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF":
      kallistobustools: $(echo $(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*$//')
  END_VERSIONS

Command exit status:
  0

Command output:
  (empty)

Command error:
  [2022-06-28 15:32:18,920]    INFO [ref] Preparing GCF_000001405.40_GRCh38.p14_genomic_ERCC92.fna.gz, GCF_000001405.40_GRCh38.p14_genomic_ERCC92_no_gene_bar.gtf.gz
  [2022-06-28 15:34:07,208]   ERROR [main] An exception occurred
  Traceback (most recent call last):
    File "/usr/local/lib/python3.9/site-packages/kb_python/main.py", line 856, in main
      COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
    File "/usr/local/lib/python3.9/site-packages/kb_python/main.py", line 168, in parse_ref
      ref(
    File "/usr/local/lib/python3.9/site-packages/ngs_tools/logging.py", line 62, in inner
      return func(*args, **kwargs)
    File "/usr/local/lib/python3.9/site-packages/kb_python/ref.py", line 393, in ref
      gene_infos, transcript_infos = ngs.gtf.genes_and_transcripts_from_gtf(
    File "/usr/local/lib/python3.9/site-packages/ngs_tools/gtf/__init__.py", line 190, in genes_and_transcripts_from_gtf
      introns = exons.invert(transcript_interval)
    File "/usr/local/lib/python3.9/site-packages/ngs_tools/gtf/SegmentCollection.py", line 108, in invert
      Segment(self._segments[i].end, self._segments[i + 1].start)
    File "/usr/local/lib/python3.9/site-packages/ngs_tools/gtf/Segment.py", line 27, in __init__
      raise SegmentError(f'Invalid segment [{start}:{end})')
  ngs_tools.gtf.Segment.SegmentError: Invalid segment [1095094:1095094)

Work dir:
  s3://***

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Relevant files

No response

System information

Nextflow version: 22.04.3
Hardware: Cloud
Executor: awsbatch
Container engine:
OS: CentOS Linux
Version of nf-core/scrnaseq: 2.0.0

Add AlevinQC

https://bioconda.github.io/recipes/bioconductor-alevinqc/README.html

should be possible/easy to add nowadays 👍

Pipeline competed with errors

Hi,

Thank you for a great tool. I was testing it out and am running into an issue as follows:

WARN: Found unexpected parameters:
* --reads: *_R{1,2}*fastq.gz
- Ignore this warning: params.schema_ignore_params = "reads"

Missing file pattern argument

My launch command is as follows:

nextflow run nf-core/scrnaseq --reads '*_R{1,2}*fastq.gz' --fasta human.fasta --gtf human.gtf -profile docker

Could you please let me know how to fix this?

Here are the core options

Core Nextflow options
    revision                  : master
    runName                   : distraught_morse
    containerEngine           : docker
    container                 : nfcore/scrnaseq:1.1.0
    launchDir                 : /mnt/irisgpfs/users/sbusi/apps/scrna
    workDir                   : /mnt/irisgpfs/users/sbusi/apps/scrna/work
    projectDir                : /home/users/sbusi/.nextflow/assets/nf-core/scrnaseq
    userName                  : sbusi
    profile                   : docker
    configFiles               : /home/users/sbusi/.nextflow/assets/nf-core/scrnaseq/nextflow.config

Input/output options
    input                     : null
    input_paths               : null
    email                     : false

Thank you,
Susheel

star + samtools index (needed? or can be two processes?)

Detect and remove cell-free mRNA contamination

Is your feature request related to a problem? Please describe

I think it would be good to have SoupX or a similar tool as part of the pipeline.

Describe the solution you'd like

Implement SoupX as a downstream process of all alignment methods.

Describe alternatives you've considered

Leave this kind of filtering and QC to downstream pipelines like #scflow.

Add TRUST4 for VDJ calling

Is your feature request related to a problem? Please describe

TRUST4 claims they can call TCRs not only from VDJ-enriched 10x data, but also regular 10x 5' seq and other scRNA-seq protocols. Since that enables TCR analysis to a lot of datasets where this was not previously possible, I believe this would be a nice addition.

More details about TRUST4:

Describe the solution you'd like

Run TRUST4 (enable explicitly, as not all datasets contain B or T cells) and create AIRR rearrangement output, and optionally a scirpy-compatible Anndata-file (related to #68)

Describe alternatives you've considered

Not sure if this is in scope of this pipeline, as not applicable to all datasets. Maybe also related to #bcellmagic?

kallisto subworkflow runs out of memory (reiteration of #38)

Description of the bug

Every time I run kallisto as a subworkflow, it crashes for memory reasons. However, unlike in #38, it crashes at the indexing stage instead of the bustools stage. It seems to be the same reason, though, as I get the same kind of error messages (I also tried hard-coding the memory requirements in version 1.1.0 and the pipeline then worked). The reason I see is that I ask for like 32GB of memory (cannot ask for 32G as I get string [32.G] does not match pattern ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$ (32.G)) and kallisto wants to use 32G (and this cannot be changed for GB either).

Command used and terminal output

$ nextflow run nf-core/scrnaseq -r dev --max_cpus 32 --max_memory '32.GB' --outdir /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/fulltest/ --protocol '10XV3' --aligner kallisto --transcript_fasta /shared/projects/bsbii/sc_single_cell_brain/sc_gene_models_ncbi_utrs.fasta --input 'test_samples_v2.csv' --genome_fasta /shared/projects/bsbii/sc_single_cell_brain/sc_ncbi_genome.fasta --gtf /shared/projects/bsbii/sc_single_cell_brain/sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf --kb_workflow 'nucleus' -profile ifb_core

Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (sc_ncbi_genome.fasta)'

Caused by:
  Process `NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (sc_ncbi_genome.fasta)` terminated for an unknown reason -- Likely it has been terminated by the external system

Command executed:

  kb \
      ref \
      -i kb_ref_out.idx \
      -g t2g.txt \
      -f1 cdna.fa \
      -f2 intron.fa \
      -c1 cdna_t2c.txt \
      -c2 intron_t2c.txt \
      --workflow nucleus \
      sc_ncbi_genome.fasta \
      sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF":
      kallistobustools: $(echo $(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*$//')
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Command error:
  [2022-06-21 09:31:39,900]    INFO [ref_lamanno] Preparing sc_ncbi_genome.fasta, sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf
  [2022-06-21 09:32:13,709]    INFO [ref_lamanno] Splitting genome sc_ncbi_genome.fasta into cDNA at tmp/tmp74iebqht
  [2022-06-21 09:32:53,465]    INFO [ref_lamanno] Creating cDNA transcripts-to-capture at tmp/tmp4_dnckwg
  [2022-06-21 09:32:53,805]    INFO [ref_lamanno] Splitting genome into introns at tmp/tmpb18oat1d
  [2022-06-21 09:38:41,370]    INFO [ref_lamanno] Creating intron transcripts-to-capture at tmp/tmpmbj8zrjy
  [2022-06-21 09:38:51,358]    INFO [ref_lamanno] Concatenating 1 cDNA FASTAs to cdna.fa
  [2022-06-21 09:38:51,770]    INFO [ref_lamanno] Concatenating 1 cDNA transcripts-to-captures to cdna_t2c.txt
  [2022-06-21 09:38:51,792]    INFO [ref_lamanno] Concatenating 1 intron FASTAs to intron.fa
  [2022-06-21 09:39:06,987]    INFO [ref_lamanno] Concatenating 1 intron transcripts-to-captures to intron_t2c.txt
  [2022-06-21 09:39:07,161]    INFO [ref_lamanno] Concatenating cDNA and intron FASTAs to tmp/tmpvz8czi6v
  [2022-06-21 09:39:22,955]    INFO [ref_lamanno] Creating transcript-to-gene mapping at t2g.txt
  [2022-06-21 09:39:37,502]    INFO [ref_lamanno] Indexing tmp/tmpvz8czi6v to kb_ref_out.idx

Work dir:
  /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/work/72/f969fb35db25b832205956436bb6e5

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Relevant files

No response

System information

Nextflow version : 22.04.0
Hardware : HPC
Executor : Slurm
Container : Singularity
OS : CentOS
Version of nf-core/scrnaseq : 2.0.0 or dev

Add FastQC

#32 was an attempt to add FastQC to the workflow. This PR is obsolete as it was against the DSL1 version.

I still believe it would be nice to have FastQC reports in addition to the downstream quality metrics (#80)

Get Alevin seg fault when reading from sample spreadsheet

Description of the bug

When running the pipeline using Alevin, I get the following error:

.command.sh: line 2: 551040 Segmentation fault      (core dumped) salmon alevin -l ISR -1 samplesheet.csv -2 null --chromium -i salmon_index -o samplesheet_alevin_results -p 8 --tgMap txp2gene.tsv --dumpFeatures –-dumpMtx

However, when I run Alevin directly (conda install) with the fastq files supplied directly, it works, eg:

salmon alevin -l ISR -1 data/SRR18163494_1.fastq.gz -2 data/SRR18163494_2.fastq.gz --chromium -i salmon_index -o samplesheet_alevin_results -p 8 --tgMap txp2gene.tsv --dumpFeatures –-dumpMtx

I think it might be something to do with Alevin not being able to parse the sample sheet? Have I made an obvious mistake or is this a genuine bug?

Steps to reproduce

Steps to reproduce the behaviour:
Command line:

nextflow run nf-core/scrnaseq -r 1.1.0 \
    --input samplesheet.csv --fasta GRCh38.primary_assembly.genome.fa \
    --gtf gencode.v39.primary_assembly.annotation.gtf \
    -profile singularity,slurm -c custom_profile.conf

Samplespreadsheet.csv: (I have tried with and without headers, different strandedness)

sample,fastq_1,fastq_2,strandedness
SRR18163494,data/SRR18163494_1.fastq.gz,data/SRR18163494_2.fastq.gz,unstranded
SRR18163495,data/SRR18163495_1.fastq.gz,data/SRR18163495_2.fastq.gz,unstranded

Error:

.command.sh: line 2: 551040 Segmentation fault      (core dumped) salmon alevin -l ISR -1 samplesheet.csv -2 null --chromium -i salmon_index -o samplesheet_alevin_results -p 8 --tgMap txp2gene.tsv --dumpFeatures –-dumpMtx

Expected behaviour

I had expected this step to run to completion.

Log files

nextflow_log_cloud.txt

System

Hardware: This error is the same on both cloud and HPC
Executor: Have used slurm (HPC) and local (cloud)
OS: Linux ubuntu

Nextflow Installation

Version: 21.10.0

Container engine

Engine: Singularity
Image tag: nfcore/scrnaseq:1.1.0

Switch from alevin to alevin-fry

Is your feature request related to a problem? Please describe

alevin-fry is the successor of alevin and this pipeline should use the latest-greatest version!

@rob-p pointed out on slack that he might have some people working on adding it here.

Describe the solution you'd like

add all required modules to nf-core/modules
replace the alevin subworkflow with an alevin-fry subworkflow.

Additional context

Discussion on slack:
https://nfcore.slack.com/archives/CHN5BV5DW/p1643207648031500

Samplesheet standard

From slack:

I'm wondering how the samplesheet for scrnaseq should look like that it works for all four pipeline branches (kb/alevin/starsolo/cellranger):

Input files

one line per sample (e.g. pointing to a folder with technical replicates) or one line per fastq file? (the latter is used by #rnaseq, and more IMO more explicit and less error-prone)
kb, alevin and starsolo expect a pair (R1/R2) of fastq files. Alevin can merge technical replicates automatically, for the other two I couldn't figure out yet if that's possible.
cellranger expects a folder structure with one folder per sample which contains all technical replicates. If the technical replicates were sequenced on different flow cells, the same sample (or "gem") can even be distributed over multiple folders.

metadata

sample id
protocol (currently this is a global option)?
expectCells, on a per sample basis

bustools runs out of memory

When running in kallisto mode, the pipeline fails at the bustools sort step.

I think the problem is a bug with bustools. If I replace the memory allocated to bustools (set via config.base) with a hard-coded memory slightly less than the config.base memory, the pipeline runs without issues.

Order of barcode /cdna not logical

Got this via email:

This:

cdna_read = reads[0]
barcode_read = reads[1]
"""
STAR --genomeDir $index \\--sjdbGTFfile $gtf \\--readFilesIn $barcode_read $cdna_read \\

should be:

barcode_read = reads[0]
cdna_read = reads[1]
"""
STAR --genomeDir $index \\--sjdbGTFfile $gtf \\--readFilesIn $cdna_read $barcode_read \\
--

Segmentation fault caused by bustools_correct_sort

I'm facing this issue when trying to align with kallisto:

Error executing process > 'bustools_correct_sort (E9_5_extraembryonic_bus_output)'

Caused by:
  Process `bustools_correct_sort (E9_5_extraembryonic_bus_output)` terminated with an error exit status (139)

Command executed:

  bustools correct -w 10x_V3_barcode_whitelist -o E9_5_extraembryonic_bus_output/output.corrected.bus E9_5_extraembryonic_bus_output/output.bus    
  mkdir -p tmp
  bustools sort -T tmp/ -t 8 -m 64G -o E9_5_extraembryonic_bus_output/output.corrected.sort.bus E9_5_extraembryonic_bus_output/output.corrected.bus

Command exit status:
  139

Command output:
  (empty)

Command error:
  Found 6794880 barcodes in the whitelist
  Processed 0 bus records
  In whitelist = 0
  Corrected = 0
  Uncorrected = 0
  Read in 0 BUS records
  .command.sh: line 4: 16827 Segmentation fault      bustools sort -T tmp/ -t 8 -m 64G -o E9_5_extraembryonic_bus_output/output.corrected.sort.bus E9_5_extraembryonic_bus_output/output.corrected.bus

It seems that bustools is a memory intensive task, so this issue could be solved by allowing the user to set parameter -m.

I've attached the associated log file.
nextflow.log

Missing options c1 and c2 with kallistobustools count

Description of the bug

Whenever I launch scrnaseq with --aligner kallisto, it crashes at the counting step with the following error message: kb count: error: the following arguments are required: -c1, -c2. Is something missing from the previous steps?

Command used and terminal output

nextflow run nf-core/scrnaseq -resume --max_memory '500.GB' -r dev -c /shared/home/hmayeur/.nextflow/config --outdir /shared/ifbstor1/projects/bsbii/sc_single_cell_brain/fulltest/ --protocol '10XV3' --aligner kallisto --transcript_fasta /shared/projects/bsbii/sc_single_cell_brain/sc_gene_models_ncbi_utrs.fasta --input 'test_samples_v2.csv' --genome_fasta /shared/projects/bsbii/sc_single_cell_brain/sc_ncbi_genome.fasta --gtf /shared/projects/bsbii/sc_single_cell_brain/sc_gene_models_ncbi_no_genes_no_contigs_notrnas.gtf --kb_workflow 'nucleus' -profile ifb_core

Relevant files

No response

System information

Nextflow version : 22.04.0
Hardware : HPC
Executor : Slurm
Container : Singularity
OS : CentOS
Version of nf-core/scrnaseq : 2.0.0 or dev

bustools quantifaction is not working

The Kallisto branch might need a fix. Currently, on running the kallisto branch on the test dataset, I get an empty genes.mtx and tcc.mtx. I think this is because of a mismatch between how the transcripts are coded in the transcritps_to_gene.txt:

$> head ./results/kallisto_gene_map/transcripts_to_genes.txt
ENSMUST00000190575    ENSMUSG00000100969    1700030N03Rik
ENSMUST00000189643    ENSMUSG00000100969    1700030N03Rik
ENSMUST00000191405    ENSMUSG00000100969    1700030N03Rik
ENSMUST00000190668    ENSMUSG00000100969    1700030N03Rik

and in the transcriptome fasta:

$> head ./results/extract_transcriptome/GRCm38.p6.genome.chr19.fa.transcriptome.fa 
>ENSMUST00000190575.6 gene_type=lincRNA gene_name=1700030N03Rik transcript_type=lincRNA transcript_name=1700030N03Rik-203 level=2 transcript_support_level=1 tag=basic havana_gene=OTTMUSG00000045196.2 havana_transcript=OTTMUST00000118746.1

The transcripts_to_gene.txt does not have the version number appended and that is why I think the pipeline is NOT completing successfully.

Pipeline does not detect transcriptome_fasta argument

The pipeline does not detect when I provide a transcriptome fasta file. Instead, it produces the following error:

$ /home/jashmore/anaconda3/envs/nextflow/bin/nextflow run nf-core/scrnaseq --reads 'data/reads/*_R{1,2}.fastq.gz' --type 10x --chemistry V2 --aligner alevin --salmon_index data/genomes/index/salmon --txp2gene data/genomes/txp2gene.csv --transcriptome_fasta data/genomes/transcriptome.fa
N E X T F L O W  ~  version 19.10.0
Launching `nf-core/scrnaseq` [marvelous_mcclintock] - revision: 884e541285 [master]
Neither of --fasta or --transcriptome provided! At least one must be provided to quantify genes

The error message is also slightly wrong as there is actually no argument called "--transcriptome", instead it is "--transcriptome_fasta" in the documentation. I think the error may be caused by a mistake in the variable naming in the main.nf file (lines 107 - 110). Shouldn't the params variable refer to params.transcriptome_fasta not params.transcript_fasta?

if (!params.fasta && !params.transcript_fasta){
  exit 1, "Neither of --fasta or --transcriptome provided! At least one must be provided to quantify genes"
}

Thank you,
James

Add nf-core logo to README.md

Just noticed this when trying to get the logo for a presentation.

Add AnnData output process

Ahoy pirates,

sister issue for theislab/sfaira#392
We would suggest to add a process to output an AnnData object. This would allow us to add new fields to Sfaira dataloaders to indicate how the raw data was processed. Detailed discussion and context in the issue above.

CC @grst @davidsebfischer

Demuxlet/Freemuxlet module

Describe the solution you'd like

An option to run demuxlet or freemuxlet (Kang et al. 2018) with an input vcf file. These tools allow researchers to combine populations from different individuals into a single 10x channel and computationally deconvolve them, by using their genotypes (with demuxlet) or by naively looking for genetic differences (freemuxlet). The output from this module could be used to add labels to cells of the experiment and remove doublets.

Describe alternatives you've considered

Running demuxlet/freemuxlet post-hoc on the bam file generated from this pipeline.

Add RNA Velocity

Tutorial: https://www.kallistobus.tools/velocity_tutorial.html

The sticking point is getting introns programmatically from every possible GTF. I've had success with gffutils.FeatureDB.create_introns before but haven't implemented it yet

STAR-Solo addition

We should consider adding STAR-solo to this.

alevinQC -- pandoc-citeproc is not available

Hello,

I am trying to get the test profile up and running and came across the following error:

Ran this:
nextflow run nf-core/scrnaseq -profile test

Error:

Command error:
  Loading required package: alevinQC
  Registered S3 method overwritten by 'GGally':
    method from   
    +.gg   ggplot2
  Loading required package: tximport
  Error in .checkPandoc(ignorePandoc) : pandoc-citeproc is not available!
  Calls: alevinQCReport -> .checkPandoc
  Execution halted

Not really sure what to make of this error, but was hoping you might have some insight.

Problem Executing alevin_qc.r

Description of the bug

The scrna pipeline runs, but appears to fail when running the alevin_qc.r plot.

Steps to reproduce

Steps to reproduce the behaviour:

Command line:
nextflow run nf-core/scrnaseq -r 1.1.0 --input './SLX-21534*_{1,2}.fq.gz' --fasta fly.fa --gtf /public/genomics/species_references/nextflow/Genome_References/Ensembl/drosophila_melanogaster/BDGP6.32/Release_105/GTF/Drosophila_melanogaster.BDGP6.32.105.gtf -config /public/singularity/containers/nextflow/lmb-nextflow/lmb.config --outdir results -dsl1 -bg
See error:

Execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: 140.

The full error message was:

Error executing process > 'alevin_qc (SLX-21534.SITTG11.H37MYDRX2.s_1.r)'

Caused by:
Process alevin_qc (SLX-21534.SITTG11.H37MYDRX2.s_1.r) terminated with an error exit status (140)

Command executed:

mv 10x_V3_barcode_whitelist SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results/alevin/whitelist.txt
alevin_qc.r SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results SLX-21534.SITTG11.H37MYDRX2.s_1.r SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results

Command exit status:
140

Command output:
(empty)

Command error:
Loading required package: alevinQC
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
Loading required package: tximport
Reading Alevin output files...
Generating summary tables...
Generating knee plot...

Work dir:
/beegfs3/swingett/testing/scrna_seq/work/14/d79f9ee52236d2f715daa068a10c24

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

The file .command.sh contains the text:

[swingett@hal d79f9ee52236d2f715daa068a10c24]$ cat .command.sh
#!/bin/bash -euo pipefail
mv 10x_V3_barcode_whitelist SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results/alevin/whitelist.txt
alevin_qc.r SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results SLX-21534.SITTG11.H37MYDRX2.s_1.r SLX-21534.SITTG11.H37MYDRX2.s_1.r_alevin_results

Expected behaviour

A would have expected a QC plot to have been generated here.

Log files

Have you provided the following extra information/files:

The log file is attached

launched the pipeline -->

System

Hardware: HPC
Executor: slurm
OS: Scientific Linux 7.9 (Nitrogen)
Version 7.9

Nextflow Installation

Version: Nextflow version 22.03.1-edge build 5695

Container engine

Engine: Singularity
version: scrnaseq 1.1.0
Image tag: nfcore/scrnaseq:1.1.0

Additional context

Also, on the page https://nf-co.re/scrnaseq/1.1.0/output the useful link "excellent tutorial" is broken: https://www.kallistobus.tools/getting_started.html

Thanks,

Steven

nextflow.log

Chemistry/10X version detection

nf-core/scrnaseq feature request

Is your feature request related to a problem? Please describe

Currently adding information about 10X version/chemistry is necessary. If this is not known a priori from the meta data, this might be helpful.

Describe the solution you'd like

Just check the length of R1 (since the length of barcodes changed from 10Xv2 to 10Xv3).

Add support for CITE-seq data processing

As suggested by @grst on Slack: "Once we have cellranger, it should also support (10x) cite-seq out-of-the-box. IDK how difficult it would be to support this with the other aligners."

It appears that we already have a modules for cellranger in the dev version of the pipeline.

Automatic conversion to h5ad/seurat files after alignment.

Is your feature request related to a problem? Please describe

h5ad/seurat are becoming default file formats for scRNAseq. It would be nice if the pipeline outputs one of the two or even both.

Describe the solution you'd like

Output to h5ad and seurat (could be controllable via a flag.)

Implement tests with nf-test

Pytest workflow makes it easier to maintain larger test suites and also allows to check outputs.

cf. nf-core/tools#605, nf-core/rnaseq#546, https://github.com/nf-core/sarek/tree/dev/tests

Testing scenarios:

Run pipeline with more than one file
different parameter combinations
different filenames and folder structures for cellranger
run subworkflows with and without index
different protocols, if we manage to get the test files

SCRNASEQ: Convert params docs to JSON Schema documentation

Hi!

this is not necessarily an issue with the pipeline, but in order to streamline the documentation group next week for the hackathon, I'm opening issues in all repositories / pipeline repos that might need this update to switch from parameter docs to auto-generated documentation based on the JSON schema.

This will then supersede any further parameter documentation, thus making things a bit easier :-)

If this doesn't apply (anymore), please close the issue. Otherwise, I'm hoping to have some helping hands on this next week in the documentation team on Slack https://nfcore.slack.com/archives/C01QPMKBYNR

Unknown config attribute `params.genomes.GRCh37.salmon_index`

When I run the following command:

export NXF_OPTS='-Xms1g -Xmx4g'
nextflow run -r 1.0.0 nf-core/scrnaseq\
          --reads 'data/H204SC19123229/raw_data/*/*_R{{1,2}}_001.fastq.gz'\
          --aligner kallisto\
          --type 10x\
          --chemistry V3\
          --genome GRCh37\
          --outdir 'results/expression_counts'\
          -profile singularity

I get the following error:

N E X T F L O W  ~  version 19.10.0
Launching `nf-core/scrnaseq` [sharp_shaw] - revision: 884e541285 [1.0.0]
Unknown config attribute `params.genomes.GRCh37.salmon_index` -- check config file: ...

I have tried changing the -r argument to dev, using --genome GRCh38, and supplying my own --gtf file, but I always get this error.

I'm not trying to use salmon?

Call empty droplets

Is your feature request related to a problem? Please describe

I think it could be nice to have a step to distinguish empty droplets from actual cells.

As far as I know, Alevin/Kallisto only perform cell calling based on "knee plots", while cellranger implements the emptyDrops algorithm. According to the emptyDrops paper the method clearly outperforms filtering based on knee plots.

Describe the solution you'd like

Implement a process downstream of the aligner subworkflows running the emptyDrops algorithm.

Describe alternatives you've considered

This kind of filtering could be left to downstream pipelines such as #scflow.
However, IMO, it would still make sense to have this as a default even when not using scflow for downstream analysis.

Additional context

STARsolo implements the emptydrops algorithm as of version 2.7.8a which can be activated using the --soloCellFilter EmptyDrops_CR option: https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#emptydrop-like-filtering

I don't know if emptyDrops is still state-of-the-art of if there's something more advanced by now.

Help wanted with error - gffread related

Hi there!

I am not sure if the BUG label is appropriate, I guess it should be more of a HELP WANTED label.

I am running into an issue that I cannot figure out how to fix. It seems to be related to the gtf and fasta files that are provided. I used freshly downloaded both following this https://ewels.github.io/AWS-iGenomes/ guide. See below for all pipeline parameter inputs.

**Error executing process > 'extract_transcriptome (genome.fa)'
Caused by:
  Process `extract_transcriptome (genome.fa)` terminated with an error exit status (1)
Command executed:
  gffread -F genes.gtf -w "genome.fa.transcriptome.fa" -g genome.fa
Command exit status:
  1
Command output:
  (empty)
Command error:
  FASTA index file genome.fa.fai created.
  Warning: couldn't find fasta record for 'GL000191.1'!
  Error: no genomic sequence available (check -g option!).**

Check Documentation

I have checked the following places for your error:

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/scrnaseq v1.1.0
------------------------------------------------------

Core Nextflow options
    revision                  : master
    runName                   : adoring_mahavira
    containerEngine           : docker
    container                 : nfcore/scrnaseq:1.1.0
    launchDir                 : /home/chroer/Projects/MTC/test_data
    workDir                   : /home/chroer/Projects/MTC/test_data/work
    projectDir                : /root/.nextflow/assets/nf-core/scrnaseq
    userName                  : root
    profile                   : docker
    configFiles               : /root/.nextflow/assets/nf-core/scrnaseq/nextflow.config

Input/output options
    input                     : /home/chroer/Projects/MTC/test_data/fastq/*[1,2]*.fastq
    input_paths               : null
    outdir                    : /home/chroer/Projects/MTC/test_data/
    email                     : false

Mandatory arguments
    barcode_whitelist         : false

Reference genome options
    genome                    : false
    fasta                     : /home/chroer/Projects/MTC/test_data/references/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa
    transcript_fasta          : false
    gtf                       : /home/chroer/Projects/MTC/test_data/references/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf
    save_reference            : false

Alevin Options
    salmon_index              : false
    txp2gene                  : false

STARSolo Options
    star_index                : false

Kallisto/BUS Options
    kallisto_gene_map         : false
    bustools_correct          : true
    kallisto_index            : false

Generic options
    email_on_fail             : false
    max_multiqc_email_size    : 25 MB
    multiqc_config            : false

Max job request options
    max_memory                : 128 GB
    max_time                  : 10d

Institutional config options
    config_profile_name       : false
    config_profile_description: false
    config_profile_contact    : false
    config_profile_url        : false

[Only displaying parameters that differ from pipeline default]
------------------------------------------------------

------------------------------------------------------
executor >  local (3)
[23/53e4a2] process > get_software_versions             [100%] 1 of 1 ✔
[-        ] process > unzip_10x_barcodes                [  0%] 0 of 1
[00/fb1078] process > extract_transcriptome (genome.fa) [100%] 1 of 1, failed: 1 ✘
[-        ] process > build_salmon_index                -
[-        ] process > makeSTARindex                     -
[-        ] process > build_kallisto_index              -
[-        ] process > build_gene_map                    -
[a8/af51c6] process > build_txp2gene (genes.gtf)        [100%] 1 of 1 ✔
[-        ] process > alevin                            -
[-        ] process > alevin_qc                         -
[-        ] process > star                              -
[-        ] process > kallisto                          -
[-        ] process > bustools_correct_sort             -
[-        ] process > bustools_count                    -
[-        ] process > bustools_inspect                  -
[-        ] process > multiqc                           [  0%] 0 of 1
[-        ] process > output_documentation              [  0%] 0 of 1
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/scrnaseq] Pipeline completed with errors-
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.

STAR always uses V2 chemistry

Check Documentation

I have checked the following places for your error:

Description of the bug

STARsolo uses 10X-V2 chemistry, regardless of what is specified.

Steps to reproduce

Steps to reproduce the behaviour:

Run nextflow run nf-core/scrnaseq -r 1.1.0 -params-file nf-params.json

nf-params.json

{
    "chemistry": "V3",
    "input": "./data/*_{1,2}.fastq.gz",
    "fasta": "./data/genome.fa",
    "gtf": "./data/genes.gtf",
    "aligner": "star",
}

Where: genome.fa is from http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M27/GRCm39.genome.fa.gz; genes.gtf is from http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M27/gencode.vM27.annotation.gtf.gz; and the fastq files are from https://www.ncbi.nlm.nih.gov/sra/?term=SRR14597268

[13/99028a] process > get_software_versions     [100%] 1 of 1 ✔
[a5/5fcfb4] process > unzip_10x_barcodes (V3)   [100%] 1 of 1 ✔
[-        ] process > extract_transcriptome     -
[-        ] process > build_salmon_index        -
[8e/00790d] process > makeSTARindex (genome.fa) [100%] 1 of 1 ✔
[-        ] process > build_kallisto_index      -[-        ] process > build_gene_map            -
[-        ] process > build_txp2gene            -
[-        ] process > alevin                    -[-        ] process > alevin_qc                 -
[bd/676904] process > star (SRR14597268_1)      [100%] 2 of 2, failed: 2, retri..
[-        ] process > kallisto                  -[-        ] process > bustools_correct_sort     -
[-        ] process > bustools_count            -[-        ] process > bustools_inspect          -
[-        ] process > multiqc                   -
[ae/de0382] process > output_documentation      [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/scrnaseq] Pipeline completed with errors-
[8a/4103a7] NOTE: Process `star (SRR14597268_1)` terminated with an error exit status (104) -- Execution is retried (1)
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'star (SRR14597268_1)'

Caused by:
  Process requirement exceed available memory -- req: 128 GB; avail: 124.4 GB

Command executed:

  STAR --genomeDir star \
        --sjdbGTFfile genes.gtf \
        --readFilesIn SRR14597268_2.fastq.gz SRR14597268_1.fastq.gz  \
        --runThreadN 10 \
        --twopassMode Basic \
        --outWigType bedGraph \
        --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 137338953472 \
        --readFilesCommand zcat \
        --runDirPerm All_RWX \
        --outFileNamePrefix SRR14597268_1  \
        --soloType Droplet \
        --soloCBwhitelist 10x_V3_barcode_whitelist
  
  samtools index SRR14597268_1Aligned.sortedByCoord.out.bam

Command exit status:
  -

Command output:
  Jun 29 21:52:14 ..... started STAR run
  Jun 29 21:52:15 ..... loading genome
  Jun 29 21:52:30 ..... processing annotations GTF
  Jun 29 21:52:39 ..... inserting junctions into the genome indices
  Jun 29 21:54:04 ..... started 1st pass mapping
  Jun 29 21:54:05 ..... finished 1st pass mapping
  Jun 29 21:54:05 ..... inserting junctions into the genome indices
  Jun 29 21:55:34 ..... started mapping

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  
  EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 28 not equal to expected 26
  Read [email protected] 1 N 0   Sequence=CAGGCNAGTCCAACGCCCTTCTGCCTTT
  SOLUTION: make sure that the barcode read is the second in --readFilesIn and check that is has the correct formatting
            If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength
  
  Jun 29 21:55:35 ...... FATAL ERROR, exiting

Expected behaviour

According to the STAR readme,

The default barcode lengths (CB=16b, UMI=10b) work for 10X Chromium V2. For V3, specify:

--soloUMIlen 12

This option is not specified by the pipeline. The STAR script should differ by the chemistry, as per https://www.biostars.org/p/462568/

10x v1

Whitelist, 737K-april-2014_rc.txt
CB length, 14
UMI start, 15
UMI length, 10 (courtesy ATpoint)

10X v2

Whitelist, 737K-august-2016.txt
CB length, 16
UMI start, 17
UMI length, 10

10x v3

Whitelist, 3M-Feb_2018_V3.txt
CB length, 16
UMI start, 17
UMI length, 12

Log files

nextflow.log

System

Hardware: AWS r5.4xlarge
Executor: local
OS: ubuntu
Version 20.04

Nextflow Installation

Version: 21.04.1.5556

Container engine

Engine: Docker
version: Docker version 20.10.7, build f0df350
Image tag: nfcore/scrnaseq:1.1.0

Additional context

Would be happy to write a PR

Kallistobustools subworkflow fails ("no such propery launchDir for class: java.lang.String")

Check Documentation

I have checked the following places for your error:

Description of the bug

Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:KALLISTO_BUSTOOLS:KALLISTOBUSTOOLS_REF (GRCm38.p6.genome.chr19.fa)'

Caused by:
  No such property: launchDir for class: java.lang.String

Source block:
  if (workflow == "standard") {
          """
          kb \\
              ref \\
              -i kb_ref_out.idx \\
              -g t2g.txt \\
              -f1 cdna.fa \\
              --workflow $workflow \\
              $fasta \\
              $gtf

          cat <<-END_VERSIONS > versions.yml
          ${getProcessName(task.process)}:
              ${getSoftwareName(task.process)}: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//')
          END_VERSIONS
          """
      } else {
          """
          kb \\
              ref \\
              -i kb_ref_out.idx \\
              -g t2g.txt \\
              -f1 cdna.fa \\
              -f2 intron.fa \\
              -c1 cdna_t2c.txt \\
              -c2 intron_t2c.txt \\
              --workflow $workflow \\
              $fasta \\
              $gtf

          cat <<-END_VERSIONS > versions.yml
          ${getProcessName(task.process)}:
              ${getSoftwareName(task.process)}: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//')
          END_VERSIONS
          """
      }

Work dir:

Steps to reproduce

nextflow run nf-core/scrnaseq -r dev -profile test,<docker/singularity/...>

Expected behaviour

Should not be triggering an error 👍🏻

Log files

Have you provided the following extra information/files:

The command used to run the pipeline

Nextflow Installation

Version: 21.04.2

Container engine

Engine: Singularity

@alexblaessle initially found it, reproduced it with the test profile. Might also be that the module for kallistobustools is not function properly 🤔

Do scanpy preprocessing

Some convenient command line scripts: https://pypi.org/project/scanpy-scripts/

Add CellRanger to nf-core/scrnaseq

You're probably already working on this, but would it be possible to add CellRanger to nf-core/scrnaseq to perform alignment and preprocessing? If there is interest I would be happy to create a PR to dev with cellranger count output filtered_feature_bc_matrix that is processed by seurat.

Pipeline ignores --kallisto_index

Hi,

I'm trying to run kallisto, using a precomputed index, but it keeps failing.

nextflow run nf-core/scrnaseq --reads 'fastq/*_R{1,2}_001.fastq.gz' \
   --aligner "kallisto" \
   --kallisto_gene_map resources/transcripts_to_genes.txt \
   --kallisto_index resources/Homo_sapiens.GRCh38.cdna.all.idx \
   --chemistry "V2" \
   --barcode_whitelist resources/10xv2_whitelist.txt \
   --outdir results \
   --type 10x \
   -profile docker

I get Must provide either a GTF file ('--gtf') or transcript to gene mapping ('--txp2gene') to align with Alevin, which is slightly weird since im not trying to run Alevin.

Adding --gtf resources/Homo_sapiens.GRCh38.96.gtf, gives me Neither of --fasta or --transcriptome provided! At least one must be provided to quantify genes

Adding --transcriptome_fasta resources/Homo_sapiens.GRCh38.cdna.all.fa.gz runs into this issue #20

Anyways, neither --gtf nor --transcriptome_fasta should be needed for kallisto with a precomputed index!

I tried to backtrack the issue in the main.nf but no luck, the logic is quite complicated (I just started using nextflow two days ago)

Improve QC reports

Is your feature request related to a problem? Please describe

I find quality control reports per sample (number of detected cells, number of genes per cell etc.) very useful to
get a first idea if the sequencing experiment was successful. As far as I can tell this is currently only implemented for Salmon/Alevin using the AlevinQC module.

Even better would be, if in addition to one report per sample, a summary of the results would show in the multiqc report. This would be particularly useful for experiments with many samples (no tedious checking of the individual reports).

Describe the solution you'd like

Produce one HTML report with summary statistics (like AlevinQC or cellranger reports) for each sample
Show summary statistics for all samples in the multiqc report.

Alevin workflow cannot handle gzipped genomes or gtf files

Tested on DSL2 version in dev.

Either update modules to handle gzip files, or include an optional gunzip process.

Default genomes and genome annotation

The pipeline should bet set-up to run with default genomes and genome annotations (iGenomes?).
Where possible, also pre-built indexes should be used.

This feature was temporarily removed in #76 and needs to be re-added.

	conda (params.enable_conda ? 'bioconda::kb-python=0.26.3' : null)
	container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
	'https://depot.galaxyproject.org/singularity/kb-python:0.26.3--pyhdfd78af_0' :
	'quay.io/biocontainers/kb-python:0.26.3--pyhdfd78af_0' }"

	conda (params.enable_conda ? "bioconda::kb-python=0.25.1" : null)
	container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
	'https://depot.galaxyproject.org/singularity/kb-python:0.25.1--py_0' :
	'quay.io/biocontainers/kb-python:0.25.1--py_0' }"

nf-core / scrnaseq Goto Github PK

scrnaseq's People

Contributors

Stargazers

Watchers

Forkers

scrnaseq's Issues

nf-core/scrnaseq feature request

Describe the solution you'd like

Description of the bug

Command used and terminal output

Relevant files

System information

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Description of the bug

Command used and terminal output

Relevant files

System information

Description of the bug

Steps to reproduce

Expected behaviour

Log files

System

Nextflow Installation

Container engine

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Additional context

Input files

metadata

Description of the bug

Command used and terminal output

Relevant files

System information

Describe the solution you'd like

Describe alternatives you've considered

Description of the bug

Steps to reproduce

Expected behaviour

Log files

System

Nextflow Installation

Container engine

Additional context

nf-core/scrnaseq feature request

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Check Documentation

Check Documentation

Description of the bug

Steps to reproduce

Expected behaviour

Log files

System

Nextflow Installation

Container engine

Additional context

Check Documentation

Description of the bug

Steps to reproduce

Expected behaviour

Log files

Nextflow Installation

Container engine

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Recommend Projects

Recommend Topics

Recommend Org