parklab / scan-snv Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 5.0 177 KB

Single cell somatic genotyper

Shell 26.75% R 14.70% Python 47.08% C 11.42% Batchfile 0.04%

scan-snv's People

Contributors

Stargazers

Watchers

Forkers

zongchangli iprada zhengyunchao langbo-boop jin-bowen

scan-snv's Issues

Errors in running scansnv

I got the foolowing error messages for about 1/3 of my single cell wgs after running latest scansnv as of today (12/6/19). Any ideas? For those cells with errors in the log, some produced file somatic_genotypes.rda while the others didn't. Over 90% of steps were done for those problem cells.
Thanks for assistance.

slurm-47955.out:Error in 0:dp : NA/NaN argument
slurm-47955.out:Error in rule scansnv_genotype_spikein_scatter:
slurm-47955.out:CalledProcessError in line 778 of /home/xxxxx/miniconda3/envs/scansnv/lib/scansnv/Snakefile:

No Output when using snakemake on slurm

Hi,
I am trying to run scan-snv for whole chromosome in slurm cluster, I don't get any error or output. The config.yaml seems to be correct and coreXXX binary file is generated, and I don't see any error in ./output_dir/.snakemake./log.

Can you please let me know what can be wrong that no output is generated, demo works fine but as soon as I want to do run it in parallel.

Command:

module load anaconda3
conda activate scansnv
module load slurm-drmaa/1.2.0-dev.deca826

scansnv \
    --ref ./refs/b37_human_g1k_v37_decoy.fasta \
    --dbsnp ./refs/b37_dbsnp_138.b37.vcf \
    --shapeit-panel ./refs/1000GP_Phase3 \
    --output-dir test \
    --regions-file regions.bed \
    --bam Bulk Bulk.bam \
    --bam sc1 sc1.bam \
    --bam sc2 sc2.bam \
    --bam sc3 sc3.bam \
    --sc-sample sc1 \
    --sc-sample sc2 \
    --sc-sample sc3 \
    --bulk-sample Bulk \
    --abmodel-chunks 1 \
    --abmodel-samples-per-chunk 10000 \
    --abmodel-hsnp-chunk-size 50 \
    --hsnp-spikein-replicates 5 \
    --joblimit 10 --resume \
    --drmaa ' -p thin-shared -t 01:00:00 --cpus-per-task 1 --mem 5G'

Thanks in advance!
Monica.

How can we get the germline single cell SNVs in my single cell data ?

Hello,
It seems that the vcf output file represents for the somatic SNVs of the single-cell WGS data only. How can I get all the SNVs including somatic and germline SNVs of single-cell data ?

Thank you

Unable to install scan-snv

Hello,
I can not install the scan-snv package on my system, and here is the error I got:

conda install -c bioconda -c conda-forge/label/cf201901 -c jluquette scansnv

Collecting package metadata: done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

scansnv -> eagle-phase==2.3.5=0

scansnv -> shapeit==2.r837=h09b0a5c_1

Current channels:

https://conda.anaconda.org/bioconda/linux-64

https://conda.anaconda.org/bioconda/noarch

https://conda.anaconda.org/conda-forge/label/cf201901/linux-64

https://conda.anaconda.org/conda-forge/label/cf201901/noarch

https://conda.anaconda.org/jluquette/linux-64

https://conda.anaconda.org/jluquette/noarch

https://conda.anaconda.org/conda-forge/linux-64

https://conda.anaconda.org/conda-forge/noarch

https://repo.anaconda.com/pkgs/main/linux-64

https://repo.anaconda.com/pkgs/main/noarch

https://repo.anaconda.com/pkgs/free/linux-64

https://repo.anaconda.com/pkgs/free/noarch

https://repo.anaconda.com/pkgs/r/linux-64

https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.

I have tried to install the package with conda version 4.6.14, 4.7.14, 4.8.0, and 4.8.2, but it still can not be completely installed.

I haven't the corresponding bulk DNA WGS.

Hi,
I want to use scansnv for finding snv but I haven't the corresponding bulk DNA WGS data of the single cell WGS data. what should I do?

Thanks in advance

the snakemake running step throws `TypeError: must be str, not int`

I am trying to get scansnv to run on some data. I installed everything via bioconda as described. During the snakemake step it throws an error, so I went and ran only the snakemake command with --debug and --verbose enabled, namely:

snakemake --snakefile /path/to/conda/<hash_value>/lib/scansnv/Snakefile --configfile /path/to/out_dir/config.yaml --latency-wait 12 --rerun-incomplete --debug --verbose

This gives the following traceback that I cannot make any sense of:

Building DAG of jobs...
Full Traceback (most recent call last):
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 540, in apply_input_function
    value = func(Wildcards(fromdict=wildcards), **_aux_params)
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/scansnv/Snakefile", line 101, in <lambda>
    "    -out {output.vcf}"
TypeError: must be str, not int

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/__init__.py", line 544, in snakemake
    export_cwl=export_cwl)
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/workflow.py", line 418, in execute
    dag.init()
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/dag.py", line 151, in init
    job.is_valid()
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/jobs.py", line 149, in is_valid
    self.rule.expand_params(self.wildcards_dict, self.input, self.output, resources)
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 662, in expand_params
    "threads": resources._cores})
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 564, in _apply_wildcards
    item = self.apply_input_function(item, wildcards, **aux_params)
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 542, in apply_input_function
    raise InputFunctionException(e, rule=self, wildcards=wildcards)
snakemake.exceptions.InputFunctionException: TypeError: must be str, not int
Wildcards:
gatk_mmq=60
gatk_chunk=1

InputFunctionException in line 59 of /path/to/project/.snakemake/conda/<hash_value>/lib/scansnv/Snakefile:
TypeError: must be str, not int
Wildcards:
gatk_mmq=60
gatk_chunk=1
unlocking
removed all lock

The general setup is that I have one bam file for the bulk sample and multiple separate bam files for the single cell samples. I am not sure, as this is not documented anywhere, but I provided the scansnv command with each single cell bam file with a --bam flag invocation and a --sc-sample invocation. I.e. --bam sc1 sc1.bam --bam sc2 sc2.bam [...] --sc-sample sc1 --sc-sample sc2 [...]. I think this works, as the resulting config.yaml looks reasonable, namely something like this:

bams:
  bulk: /path/to/project/mapping/hg19/samples/bulk.sorted.bam
  sc1: /path/to/project/mapping/hg19/samples/sc1.sorted.bam
  sc2: /path/to/project/mapping/hg19/samples/sc2.sorted.bam
  ...
sc_samples:
  - sc1
  - sc2
  ...
bulk_sample: bulk
humref: /path/to/ref_data/hg19_CanonicalChr/genome_bwa-0.7.9a/hg19.fa
dbsnp: /path/to/ref_data/hg19_GATKBundle-2.8/dbsnp_138.hg19.vcf
shapeit_refpanel: /path/to/ref_data/hg19_shapeit/1000GP_Phase3
abmodel_chunks: 20
abmodel_samples_per_chunk: 1000
abmodel_hsnp_chunksize: 100
abmodel_steps: 4
fdr: 1.0
min_sc_alt: 1
min_sc_dp: 1
min_bulk_dp: 1
spikein_replicates: 100
hsnp_spikein_size: 40
hsnp_spikein_nsamples: 4000
gatk_chunks: 22
gatk_regions: 
  - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
  - 11
  - 12
  - 13
  - 14
  - 15
  - 16
  - 17
  - 18
  - 19
  - 20
  - 21
  - 22
chrs: 
  - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
  - 11
  - 12
  - 13
  - 14
  - 15
  - 16
  - 17
  - 18
  - 19
  - 20
  - 21
  - 22
scripts: /path/to/project/.snakemake/conda/<hash_value>/lib/scansnv

Do you have any ideas what might be going wrong here? Any suggestions what to try or what to check?

Single Cell genotypes

Hi,
I am using your SCAN-SNV to find the somatic variants from our WGS. We are concerned about the somatic genotype call results for the single cells. I am attaching 4 example record that passes somatic$pass (i.e., somatic$pass == TRUE) in table somatic from one of somatic_genotypes.rda files. I have transposed row/columns for better displays.

The 4 examples show the SingleCell genotypes (row 11) as 1/1, and coverage depths (ref.1,alt.1) as (0,10), (1,14), (0,27) and (0,15).
I understand that these results are coming from SCAN-SNV’s first step, GATK, for selecting candidates variants in your pipeline. The genotype calls and coverages may not be used in later steps. Should I assume these four cases are all true somatic mutations with heterozygous genotypes even though the raw data points to homozygous alternative alleles?

I also like to ask why final genotype call results would contradict the initial GATK genotype calls. Those coverages strongly suggest that the somatic variants are homozygous, as almost all the reads have the “alt.1” alleles. Put it in another way, is it true that the AB model will be so strong/confident to call it a heterozygote when there are 0 ref alleles and 27 alternative alleles?

This is important as we are looking at somatic signatures and need the accurate genotypes for the single cells. Please advise.

Thanks you very much.

Yong

sSNVs-1-1-examples.xlsx

parklab / scan-snv Goto Github PK

scan-snv's People

Contributors

Stargazers

Watchers

Forkers

scan-snv's Issues

Errors in running scansnv

No Output when using snakemake on slurm

How can we get the germline single cell SNVs in my single cell data ?

Unable to install scan-snv

I haven't the corresponding bulk DNA WGS.

the snakemake running step throws `TypeError: must be str, not int`

Single Cell genotypes

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent