Giter VIP home page Giter VIP logo

scan-snv's People

Contributors

jluquette avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

scan-snv's Issues

Errors in running scansnv

I got the foolowing error messages for about 1/3 of my single cell wgs after running latest scansnv as of today (12/6/19). Any ideas? For those cells with errors in the log, some produced file somatic_genotypes.rda while the others didn't. Over 90% of steps were done for those problem cells.
Thanks for assistance.

slurm-47955.out:Error in 0:dp : NA/NaN argument
slurm-47955.out:Error in rule scansnv_genotype_spikein_scatter:
slurm-47955.out:CalledProcessError in line 778 of /home/xxxxx/miniconda3/envs/scansnv/lib/scansnv/Snakefile:

No Output when using snakemake on slurm

Hi,
I am trying to run scan-snv for whole chromosome in slurm cluster, I don't get any error or output. The config.yaml seems to be correct and coreXXX binary file is generated, and I don't see any error in ./output_dir/.snakemake./log.

Can you please let me know what can be wrong that no output is generated, demo works fine but as soon as I want to do run it in parallel.

Command:

module load anaconda3
conda activate scansnv
module load slurm-drmaa/1.2.0-dev.deca826

scansnv \
    --ref ./refs/b37_human_g1k_v37_decoy.fasta \
    --dbsnp ./refs/b37_dbsnp_138.b37.vcf \
    --shapeit-panel ./refs/1000GP_Phase3 \
    --output-dir test \
    --regions-file regions.bed \
    --bam Bulk Bulk.bam \
    --bam sc1 sc1.bam \
    --bam sc2 sc2.bam \
    --bam sc3 sc3.bam \
    --sc-sample sc1 \
    --sc-sample sc2 \
    --sc-sample sc3 \
    --bulk-sample Bulk \
    --abmodel-chunks 1 \
    --abmodel-samples-per-chunk 10000 \
    --abmodel-hsnp-chunk-size 50 \
    --hsnp-spikein-replicates 5 \
    --joblimit 10 --resume \
    --drmaa ' -p thin-shared -t 01:00:00 --cpus-per-task 1 --mem 5G'

Thanks in advance!
Monica.

Unable to install scan-snv

Hello,
I can not install the scan-snv package on my system, and here is the error I got:

conda install -c bioconda -c conda-forge/label/cf201901 -c jluquette scansnv

Collecting package metadata: done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  • scansnv -> eagle-phase==2.3.5=0
  • scansnv -> shapeit==2.r837=h09b0a5c_1

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

I have tried to install the package with conda version 4.6.14, 4.7.14, 4.8.0, and 4.8.2, but it still can not be completely installed.

the snakemake running step throws `TypeError: must be str, not int`

I am trying to get scansnv to run on some data. I installed everything via bioconda as described. During the snakemake step it throws an error, so I went and ran only the snakemake command with --debug and --verbose enabled, namely:

snakemake --snakefile /path/to/conda/<hash_value>/lib/scansnv/Snakefile --configfile /path/to/out_dir/config.yaml --latency-wait 12 --rerun-incomplete --debug --verbose

This gives the following traceback that I cannot make any sense of:

Building DAG of jobs...
Full Traceback (most recent call last):
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 540, in apply_input_function
    value = func(Wildcards(fromdict=wildcards), **_aux_params)
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/scansnv/Snakefile", line 101, in <lambda>
    "    -out {output.vcf}"
TypeError: must be str, not int

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/__init__.py", line 544, in snakemake
    export_cwl=export_cwl)
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/workflow.py", line 418, in execute
    dag.init()
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/dag.py", line 151, in init
    job.is_valid()
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/jobs.py", line 149, in is_valid
    self.rule.expand_params(self.wildcards_dict, self.input, self.output, resources)
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 662, in expand_params
    "threads": resources._cores})
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 564, in _apply_wildcards
    item = self.apply_input_function(item, wildcards, **aux_params)
  File "/path/to/project/.snakemake/conda/<hash_value>/lib/python3.6/site-packages/snakemake/rules.py", line 542, in apply_input_function
    raise InputFunctionException(e, rule=self, wildcards=wildcards)
snakemake.exceptions.InputFunctionException: TypeError: must be str, not int
Wildcards:
gatk_mmq=60
gatk_chunk=1

InputFunctionException in line 59 of /path/to/project/.snakemake/conda/<hash_value>/lib/scansnv/Snakefile:
TypeError: must be str, not int
Wildcards:
gatk_mmq=60
gatk_chunk=1
unlocking
removed all lock

The general setup is that I have one bam file for the bulk sample and multiple separate bam files for the single cell samples. I am not sure, as this is not documented anywhere, but I provided the scansnv command with each single cell bam file with a --bam flag invocation and a --sc-sample invocation. I.e. --bam sc1 sc1.bam --bam sc2 sc2.bam [...] --sc-sample sc1 --sc-sample sc2 [...]. I think this works, as the resulting config.yaml looks reasonable, namely something like this:

bams:
  bulk: /path/to/project/mapping/hg19/samples/bulk.sorted.bam
  sc1: /path/to/project/mapping/hg19/samples/sc1.sorted.bam
  sc2: /path/to/project/mapping/hg19/samples/sc2.sorted.bam
  ...
sc_samples:
  - sc1
  - sc2
  ...
bulk_sample: bulk
humref: /path/to/ref_data/hg19_CanonicalChr/genome_bwa-0.7.9a/hg19.fa
dbsnp: /path/to/ref_data/hg19_GATKBundle-2.8/dbsnp_138.hg19.vcf
shapeit_refpanel: /path/to/ref_data/hg19_shapeit/1000GP_Phase3
abmodel_chunks: 20
abmodel_samples_per_chunk: 1000
abmodel_hsnp_chunksize: 100
abmodel_steps: 4
fdr: 1.0
min_sc_alt: 1
min_sc_dp: 1
min_bulk_dp: 1
spikein_replicates: 100
hsnp_spikein_size: 40
hsnp_spikein_nsamples: 4000
gatk_chunks: 22
gatk_regions: 
  - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
  - 11
  - 12
  - 13
  - 14
  - 15
  - 16
  - 17
  - 18
  - 19
  - 20
  - 21
  - 22
chrs: 
  - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
  - 11
  - 12
  - 13
  - 14
  - 15
  - 16
  - 17
  - 18
  - 19
  - 20
  - 21
  - 22
scripts: /path/to/project/.snakemake/conda/<hash_value>/lib/scansnv

Do you have any ideas what might be going wrong here? Any suggestions what to try or what to check?

Single Cell genotypes

Hi,
I am using your SCAN-SNV to find the somatic variants from our WGS. We are concerned about the somatic genotype call results for the single cells. I am attaching 4 example record that passes somatic$pass (i.e., somatic$pass == TRUE) in table somatic from one of somatic_genotypes.rda files. I have transposed row/columns for better displays.

The 4 examples show the SingleCell genotypes (row 11) as 1/1, and coverage depths (ref.1,alt.1) as (0,10), (1,14), (0,27) and (0,15).
I understand that these results are coming from SCAN-SNV’s first step, GATK, for selecting candidates variants in your pipeline. The genotype calls and coverages may not be used in later steps. Should I assume these four cases are all true somatic mutations with heterozygous genotypes even though the raw data points to homozygous alternative alleles?

I also like to ask why final genotype call results would contradict the initial GATK genotype calls. Those coverages strongly suggest that the somatic variants are homozygous, as almost all the reads have the “alt.1” alleles. Put it in another way, is it true that the AB model will be so strong/confident to call it a heterozygote when there are 0 ref alleles and 27 alternative alleles?

This is important as we are looking at somatic signatures and need the accurate genotypes for the single cells. Please advise.

Thanks you very much.

Yong

sSNVs-1-1-examples.xlsx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.