parklab / scan-snv Goto Github PK
View Code? Open in Web Editor NEWSingle cell somatic genotyper
Single cell somatic genotyper
I got the foolowing error messages for about 1/3 of my single cell wgs after running latest scansnv as of today (12/6/19). Any ideas? For those cells with errors in the log, some produced file somatic_genotypes.rda while the others didn't. Over 90% of steps were done for those problem cells.
Thanks for assistance.
slurm-47955.out:Error in 0:dp : NA/NaN argument
slurm-47955.out:Error in rule scansnv_genotype_spikein_scatter:
slurm-47955.out:CalledProcessError in line 778 of /home/xxxxx/miniconda3/envs/scansnv/lib/scansnv/Snakefile:
Hi,
I am trying to run scan-snv for whole chromosome in slurm cluster, I don't get any error or output. The config.yaml seems to be correct and coreXXX binary file is generated, and I don't see any error in ./output_dir/.snakemake./log.
Can you please let me know what can be wrong that no output is generated, demo works fine but as soon as I want to do run it in parallel.
Command:
module load anaconda3
conda activate scansnv
module load slurm-drmaa/1.2.0-dev.deca826
scansnv \
--ref ./refs/b37_human_g1k_v37_decoy.fasta \
--dbsnp ./refs/b37_dbsnp_138.b37.vcf \
--shapeit-panel ./refs/1000GP_Phase3 \
--output-dir test \
--regions-file regions.bed \
--bam Bulk Bulk.bam \
--bam sc1 sc1.bam \
--bam sc2 sc2.bam \
--bam sc3 sc3.bam \
--sc-sample sc1 \
--sc-sample sc2 \
--sc-sample sc3 \
--bulk-sample Bulk \
--abmodel-chunks 1 \
--abmodel-samples-per-chunk 10000 \
--abmodel-hsnp-chunk-size 50 \
--hsnp-spikein-replicates 5 \
--joblimit 10 --resume \
--drmaa ' -p thin-shared -t 01:00:00 --cpus-per-task 1 --mem 5G'
Thanks in advance!
Monica.
Hello,
It seems that the vcf output file represents for the somatic SNVs of the single-cell WGS data only. How can I get all the SNVs including somatic and germline SNVs of single-cell data ?
Thank you
Hello,
I can not install the scan-snv package on my system, and here is the error I got:
conda install -c bioconda -c conda-forge/label/cf201901 -c jluquette scansnv
Collecting package metadata: done
Solving environment: failedPackagesNotFoundError: The following packages are not available from current channels:
- scansnv -> eagle-phase==2.3.5=0
- scansnv -> shapeit==2.r837=h09b0a5c_1
Current channels:
- https://conda.anaconda.org/bioconda/linux-64
- https://conda.anaconda.org/bioconda/noarch
- https://conda.anaconda.org/conda-forge/label/cf201901/linux-64
- https://conda.anaconda.org/conda-forge/label/cf201901/noarch
- https://conda.anaconda.org/jluquette/linux-64
- https://conda.anaconda.org/jluquette/noarch
- https://conda.anaconda.org/conda-forge/linux-64
- https://conda.anaconda.org/conda-forge/noarch
- https://repo.anaconda.com/pkgs/main/linux-64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/free/linux-64
- https://repo.anaconda.com/pkgs/free/noarch
- https://repo.anaconda.com/pkgs/r/linux-64
- https://repo.anaconda.com/pkgs/r/noarch
To search for alternate channels that may provide the conda package you're
looking for, navigate tohttps://anaconda.org
and use the search bar at the top of the page.
I have tried to install the package with conda version 4.6.14, 4.7.14, 4.8.0, and 4.8.2, but it still can not be completely installed.
Hi,
I want to use scansnv for finding snv but I haven't the corresponding bulk DNA WGS data of the single cell WGS data. what should I do?
Thanks in advance
I am trying to get scansnv to run on some data. I installed everything via bioconda as described. During the snakemake step it throws an error, so I went and ran only the snakemake command with --debug
and --verbose
enabled, namely:
snakemake --snakefile /path/to/conda/<hash_value>/lib/scansnv/Snakefile --configfile /path/to/out_dir/config.yaml --latency-wait 12 --rerun-incomplete --debug --verbose
This gives the following traceback that I cannot make any sense of:
Building DAG of jobs...
Full Traceback (most recent call last):
File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 540, in apply_input_function
value = func(Wildcards(fromdict=wildcards), **_aux_params)
File "/path/to/project/.snakemake/conda/<hash_value>/lib/scansnv/Snakefile", line 101, in <lambda>
" -out {output.vcf}"
TypeError: must be str, not int
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/__init__.py", line 544, in snakemake
export_cwl=export_cwl)
File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/workflow.py", line 418, in execute
dag.init()
File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/dag.py", line 151, in init
job.is_valid()
File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/jobs.py", line 149, in is_valid
self.rule.expand_params(self.wildcards_dict, self.input, self.output, resources)
File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 662, in expand_params
"threads": resources._cores})
File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 564, in _apply_wildcards
item = self.apply_input_function(item, wildcards, **aux_params)
File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 542, in apply_input_function
raise InputFunctionException(e, rule=self, wildcards=wildcards)
snakemake.exceptions.InputFunctionException: TypeError: must be str, not int
Wildcards:
gatk_mmq=60
gatk_chunk=1
InputFunctionException in line 59 of /path/to/project/.snakemake/conda/<hash_value>/lib/scansnv/Snakefile:
TypeError: must be str, not int
Wildcards:
gatk_mmq=60
gatk_chunk=1
unlocking
removed all lock
The general setup is that I have one bam file for the bulk sample and multiple separate bam files for the single cell samples. I am not sure, as this is not documented anywhere, but I provided the scansnv
command with each single cell bam file with a --bam
flag invocation and a --sc-sample
invocation. I.e. --bam sc1 sc1.bam --bam sc2 sc2.bam [...] --sc-sample sc1 --sc-sample sc2 [...]
. I think this works, as the resulting config.yaml
looks reasonable, namely something like this:
bams:
bulk: /path/to/project/mapping/hg19/samples/bulk.sorted.bam
sc1: /path/to/project/mapping/hg19/samples/sc1.sorted.bam
sc2: /path/to/project/mapping/hg19/samples/sc2.sorted.bam
...
sc_samples:
- sc1
- sc2
...
bulk_sample: bulk
humref: /path/to/ref_data/hg19_CanonicalChr/genome_bwa-0.7.9a/hg19.fa
dbsnp: /path/to/ref_data/hg19_GATKBundle-2.8/dbsnp_138.hg19.vcf
shapeit_refpanel: /path/to/ref_data/hg19_shapeit/1000GP_Phase3
abmodel_chunks: 20
abmodel_samples_per_chunk: 1000
abmodel_hsnp_chunksize: 100
abmodel_steps: 4
fdr: 1.0
min_sc_alt: 1
min_sc_dp: 1
min_bulk_dp: 1
spikein_replicates: 100
hsnp_spikein_size: 40
hsnp_spikein_nsamples: 4000
gatk_chunks: 22
gatk_regions:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
chrs:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
scripts: /path/to/project/.snakemake/conda/<hash_value>/lib/scansnv
Do you have any ideas what might be going wrong here? Any suggestions what to try or what to check?
Hi,
I am using your SCAN-SNV to find the somatic variants from our WGS. We are concerned about the somatic genotype call results for the single cells. I am attaching 4 example record that passes somatic$pass (i.e., somatic$pass == TRUE) in table somatic from one of somatic_genotypes.rda files. I have transposed row/columns for better displays.
The 4 examples show the SingleCell genotypes (row 11) as 1/1, and coverage depths (ref.1,alt.1) as (0,10), (1,14), (0,27) and (0,15).
I understand that these results are coming from SCAN-SNV’s first step, GATK, for selecting candidates variants in your pipeline. The genotype calls and coverages may not be used in later steps. Should I assume these four cases are all true somatic mutations with heterozygous genotypes even though the raw data points to homozygous alternative alleles?
I also like to ask why final genotype call results would contradict the initial GATK genotype calls. Those coverages strongly suggest that the somatic variants are homozygous, as almost all the reads have the “alt.1” alleles. Put it in another way, is it true that the AB model will be so strong/confident to call it a heterozygote when there are 0 ref alleles and 27 alternative alleles?
This is important as we are looking at somatic signatures and need the accurate genotypes for the single cells. Please advise.
Thanks you very much.
Yong
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.