Giter VIP home page Giter VIP logo

clincnv's People

Contributors

axelgschwind avatar bondarevts avatar fohlen avatar germandemidov avatar imgagbot avatar marc-sturm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

fohlen pythseq

clincnv's Issues

Germline algo changes

  • change output file name [sample]_cnvs.tsv
  • round number of decimal places (loglikelihood)
  • add CNV size (kB) column
  • add allele frequency in cohort column
  • add QC to header: numer of iterations, median percentage of outliers

clincnv runs an error about family samples

Hello, clincnv analyzed 3 family samples, and an error occurred. Although I set the parameter minimumNumOfElemsInCluster to 1, how can I solve the error? The error content is as follows:

[1] "We run script located in folder /fuer2/03.Soft/01.Soft_project/ClinCNV-1.17.2 . All the paths will be calculated realtive to this one. If everything crashes, please, check the correctness of this path first."
[1] "START cluster allocation."
[1] "Cluster allocated."
[1] "END cluster allocation."
[1] "We are started with reading the coverage files and bed files 2022-03-14 17:20:48"
[1] "Started basic quality filtering. 2022-03-14 17:20:49"
[1] "Amount of regions after filtering of 0-covered regions 99.528"
[1] "Normalization with GC and length starts. 2022-03-14 17:21:00"
[1] "Percentage of regions remained after GC correction: 0.997975235573553"
[1] "Amount of regions after GC-extreme filtering 99.327"
[1] "Amount of regions after Systematically Low Covered regions filtering 99.327"
[1] "We start to cluster your data (you will find a plot if clustering is possible in your output directory) ./result 2022-03-14 17:21:16"
Error: umap: number of neighbors must be smaller than number of items
Execution halted

Questions about the raw output file(s)'s columns

Hello, I searched for a while for a description of the columns in the "*_cnvs.tsv" output file. These are the columns:
#chr start end CN_change loglikelihood no_of_regions length_KB potential_AF genes qvalue

Some have obvious meaning, some don't (to me), is there an explanatory document somewhere?
Is CN_change code for something, or is it the actual number of copies?
loglikelihood of what?
no_of_regions, is this number of exons, or number of intervals in the input bedfile?
length_KB, length of what?
potential_AF, this seems lower than 1 always, allele frequency?
genes, I suppose this is empty unless the bed file was annotated?
qvalue, qvalue of what?

Thank you for your patience, sorry if I missed something obvious

PS. I only ask about this output file as I thought it was the best one to look through. Did I get this wrong, as well? Which of the three files makes most sense to look through?

Documentation improvements

-[ ] TSV header: each value in one line, or group logically
-[ ] TSV header: number of iterations should not include super-recall
-[ ] SEG file: make log-likelihood positive
-[ ] SEG file: remove CN column
-[ ] SEG file: add coefficient of variation or similar value

Trios sample ID

Hello, clinCNV analyzed 5 trios, there was an error occurred. Is my sample ID's problem?
below is my sample ID file:
1_cyw,1_cywm,1_cywf
11_ywx,11_ywxm,11_ywxf
10_xjx,10_xjxm,10_xjxf
12_zxy,12_zxym,12_zxyf
13_zmh,13_zmhm,13_zmhf

There is error message:
[1] Error in strsplit(genesThatHasToBeSeparated[i], split = ",") :
[2]non-character argument
[3]Calls: source ... eval -> eval -> plotFoundCNVs -> unlist -> strsplit
[4]Execution halted

Thanks

No error if BAF file is not readable

ClinCnv throws no error if a BAF file is not readable.
Please throw an error in case any file that is given via the command line cannot be opened.

Running clinCNV reports an error

please help,thank`s .
run script : Rscript ./clinCNV.R --bed ./samples/bed_file.bed --normal ./samples/coverages_normal.cov --out result
the error is as follows:
......
Loading required package: sandwich
[1] "We start to estimate covariances between neighboring regions in germline data - may take some time 2022-03-10 17:22:41"
[1] "Tree of covariances (using 2 predictors - sum of regions' lengths and log2 of distance between regions) plotted in result 2022-03-10 17:23:28"
[1] "Calling started 2022-03-10 17:23:28"
[1] "Working with germline sample 0 2022-03-10 17:23:28"
[1] "Working with germline sample 1 2022-03-10 17:23:28"
Error in writeLines(c("#type=GENE_EXPRESSION", paste0("#track graphtype=points name="", :
cannot open the connection
Calls: source ... outputSegmentsAndDotsFromListOfCNVs -> makeTrackAnnotation -> writeLines
Execution halted

Error in 1:ncol(toyCoverageGermlineCohort) : argument of length 0 Calls: source -> withVisible -> eval -> eval

Hi,
I am running germline analysis for 40 samples on hg38 reference genome. The normal run (clinCNV.R --normal normal.cov --out outputFolder --bed annotatedBedFile.bed) without including offtarget regions works fine but when I add offtarget parameters then it has the following error.
image

Run command:
clinCNV.R --normal normal.cov --bed annotatedBedFile.bed --out outputFolder--normalOfftarget offtarget.cov --bedOfftarget annotatedBedFile_offtarget.bed --numberOfThreads 4 --hg38

Note: The bedfile was annotated with ngs-bits BedAnnotateGC and BedAnnotateGenes. Not path issue as the cov files and bedfiles can be read.

Does anyone know what went wrong?
Thanks :)

Is it possible to add mitochondrial DNA cytoband information?

Dear developers,

I am using your amazing tool for germline WES analysis. It works pretty well I think.
I was wondering if there was a way to use clinCNV for mitochondrial analysis. At the moment, I remove any chrM samples I have because the cytobandsHG38.txt file does not contain any chrM information. Is there a way to add chrM information to the cytobandsHG38.txt file?

Many Thanks,
Krutik

Order of the output CNVs

Hi,

I suggest we order the output CNVs exactly as they are ordered in the input Bed file or cov files respectively.

Code refactoring

  • remove parameter '--folderWithScript'
  • refactor code to split germline from somatic analysis (put it into the germline folder)
  • rename script to something else than firstStep.R, perhaps clincnv.R
  • add example data for each use-case => use as unit tests
  • check minimum version of R (3.2)

Offtarget coverage on targeted gene panel germline samples

Hi again,

I already calculated ontarget coverages. I also want to calculate the offtarget coverages to increase the overall accuracy.

Reading the docs, the steps are a bit confusing to me. I have some questions:

  1. Why are chunks 50000pb? My targeted regions are exons (usually a few hundred of bases), so the 3rd step (Chunk offtarget into pieces of 50k), does not produce any change. Then, if I remove regions <25k (last step), obviously the resulting bed file is empty. Would you recommend a different chunk size for targeted gene panels?
  2. Reading the parameter description, the offtarget file should contain a "GC-annotated" column. So, after following the steps to produce, Should I use BedAnnotateGC to annotate the final offtarget file?

A complete guide to produce offtarget coverage files on targeted samples would be really appreciated.

Thanks a lot in advance.

Failed in Determine.gender

Rscript clinCNV.R --bed hg38_nuc.bed --normal exome_germlines.cov --colNum 4 --reanalyseCohort TRUE --polymorphicCalling YES --superRecall SUPERRECALL --mosaicism --fdrGermline 10 --lengthG 1 --maxNumGermCNVs 100 --maxNumIter 3 --numberOfThreads 24 --out result

[1] "We run script located in folder /work/sassou/ClinCNV . All the paths will be calculated realtive to this one. If everything crashes, please, check the correctness of this path first."
[1] "You've choosen to detect polymorphic regions with the help of our tool - great choice!"
[1] "You suspect your samples to be mosaic - hmmm, we will check this out...(but the mosaic CN change should not be > 1 copy different from default"
[1] "START cluster allocation."
[1] "Cluster allocated."
[1] "END cluster allocation."
[1] "We are started with reading the coverage files and bed files 2021-06-21 13:36:13"
[1] "Started basic quality filtering. 2021-06-21 13:36:15"
[1] "Amount of regions after filtering of 0-covered regions 98.575"
[1] "Normalization with GC and length starts. 2021-06-21 13:36:44"
[1] "Percentage of regions remained after GC correction: 0.998089547500539"
[1] "Amount of regions after GC-extreme filtering 98.387"
[1] "Amount of regions after Systematically Low Covered regions filtering 98.387"
[1] "We start to cluster your data (you will find a plot if clustering is possible in your output directory) result 2021-06-21 13:37:29"
[1] "You ask to clusterise intro clusters of size 10000 but size of the cohort is 5 which is not enough. We continue without clustering."
[1] "Gender estimation started 2021-06-21 13:37:43"
Error in plot.new() : could not open file 'result/genders.png'
Calls: Determine.gender -> plot -> plot -> plot.default -> plot.new
Execution halted

Feature request: add to conda

Hi,

Would you be able to add your tool to bioconda? This would increase visibility and ease of use tremendously.

Conda packages also come with a free docker image in biocontainers, which is good for reproducability.

Thanks
M

somatic run error

Hi dear all:
when I run ClinCNV, there was a error flow:
image

any suggestions? thanks

"the condition has length > 1"

Hello!

Just getting started with ClinCNV. I am running the test samples unsuccessfully:

Rscript clinCNV.R --bed /home/joel/Programs/ClinCNV/samples/bed_file.bed --normal /home/joel/Programs/ClinCNV/samples/coverages_normal.cov --out test_results/ --folderWithScript $PWD

[1] "We are started with reading the coverage files and bed files 2022-05-24 14:06:07"

Error in if (substring(x, 1, nchar(prefix)) == prefix) { : 
  the condition has length > 1

Calls: startsWith
Execution halted

What's happening?

Thanks in advance!

[Ends of chrom] argument is of length zero

During our run we obtained the following error message

image

Our data is composed by one single WGS aligned with BWA using GRCh38.p14 as reference.
BED and COV files were prepared as recommended in the readme.md

Any suggestion?

Handling of PAR region

PAR region in males is called with CN=2.
This should be corrected, otherwise we might miss deletions.

grafik

error cannot open Rplot.pdf

we have to fix this issue:
-[] If --noPlot is given, no plots should be generated.
-[] ClinCNV should run without write permissions in the installation folder (otherwise you cannot run it from a container)

Error in writeLines [...] cannot open the connection

Hi ClinCNV developers,

running ClinCNV with the provided test data produced an error which we couldn't resolve. All packages are installed as indicated (install_deps_clincnv.R).

log.txt

Thanks a lot!
Best
PS: running on HPC w/ ubuntu 18.04.5.

How to format bed file for WES

--Hi,

is ClinCNV able to analyse germline WES ? And if yes how to format correctly the bed file ?
The bed file is used to buid the library with the target regions specific to the exome.

Thank you --

SEG output: indicate invalid regions

Hi German,

could you add information which input regions were skipped because of low quality.
Right now those regions cannot be easily recognized in IGV.
In CnvHunter I added them to the end of the SEG file:
https://github.com/imgag/megSAP/blob/master/test/data/vc_cnvhunter_out1.seg

For example, you could add failed regions to the SEG file and add a "QC" column that contains "qc failed" and some info why. The only drawback would be that you have to assign some CN value, e.g. CN=2 (or 1 for male gonosomes).

I opened a IGV issue to see if we can color the failed regions differently:
igvteam/igv#741

Best,
Marc

Argument is of length zero

Hello!

I've managed to run some hg19 exomes through ClinCNV, but not that I'm attempting some b37 exomes, it crashes like so:

[1] "We run script located in folder /home/joel/Programs/ClinCNV . All the paths will be calculated realtive to this one. If everything crashes, please, check the correctness of this path first."
[1] "START cluster allocation."
[1] "Cluster allocated."
[1] "END cluster allocation."
[1] "We are started with reading the coverage files and bed files 2022-08-19 14:32:58"
[1] "Started basic quality filtering. 2022-08-19 14:33:00"
[1] "Amount of regions after filtering of 0-covered regions 94.373"
Error in if (ends_of_chroms[[chrom]] < max(bedFile[bedFile[, 1] == chrom,  : 
  argument is of length zero
Execution halted

My input files are here:
https://file.io/rr1fFFpq8PzH

Thanks in advance!

ClinCNV creates file not neccessary

Hi,

ClinCNV creates a file "Rplots.pdf" in 1.14-stable in the script directory. That should not be there. Please create a commit or bugfix for 1.14 which fixes that problem.

Best,

Axel

error about clinCNV.R :

Hi @GermanDemidov @marc-sturm @bondarevts @jakobmatthes @Fohlen

 I got a problem , when I run clincnv.R . 
Do you have any suggestions for a solution?

`$Rscript $ClinCNV/clinCNV.R  \
    --bed $prepare/gcAnnotated.preparedBedHg38.bin50000.bed \
    --normal $prepare/merge_result/merge_S17.cov \
    --folderWithScript $ClinCNV \
    --scoreG 50 \
    --numberOfThreads 10 \
    --out $result/clincnv_prepare_result

There is error message:[1] "We run script located in folder /hwfssz1/CS_CELL/cs_cell/marui1/software/ClinCNV . Please, specify ABSOLUTE paths, relative paths do not work for every machine. If everything crashes, please, check the correctness of this path first."
[1] "START cluster allocation."
[1] "Cluster allocated."
[1] "END cluster allocation."
[1] "We are started with reading the coverage files and bed files 2023-03-23 11:33:33"
[1] "Started basic quality filtering. 2023-03-23 11:33:35"
[1] "Amount of regions after filtering of 0-covered regions 98.565"
[1] "Coordinates in BED file are outside of the cytobands! Please check if your cytobands file matches your reference genome version!"
`

Thanks

Make available via Bioconda

Hi! Would it be possible to make this tool available via bioconda for easier installation and automatic containerization via biocontainers?

Cheers, Rike

Release 1.16 changes

  • Add analysis type header line
  • Make formatting of QC metrics uniform (name: value)
  • Change number of decimals for 'median_loglikelihood'

Wrong warning

ClinCnv shows this warning, with R 4.1 which should not be there I guess:

Stdout of '/mnt/storage1/share/opt/R-4.1.0/bin/Rscript --vanilla /mnt/storage2/GRCh38/share/opt/ClinCNV-1.17.1/clinCNV.R': [1] "Your R version is too old. We can not guarantee stable work."

Documentation changes

  • links on main page to real documentation
  • use GitHub issue tracker instead of email
  • add license file to repository via GitHub
  • put documentation of each use-case (gemline/somatic/trio) to one sub-page
  • document minimum version of R (3.2)

Parameter values

Hi,

Congrats for this useful tool. I have a couple of questions:

  • Which is the recommended value of maxNumGermCNVs parameter for gene panel samples (100-130 genes)? Here I understand that default is 10000 but 2000 is suggested for WES samples, right?
  • Which is the default value of maxNumIter parameter? I didn’t find it. Do you recommend a specific value for germline calling on gene panel samples (100-130 genes)?

Thanks!

Override sample gender

Hi German,

Alex Seitz had a male patient with a large duplication on chrX, so it was determined to be a female.
Is there a way to overwrite the gender for a sample?

Best,
Marc

The data of the case can be found here: /mnt/users/ahsturm1/Sandbox/ClinCNV/bug_gender_clustering/

Genes output

If there is no gene overlapping with a CNV, currently 'NA' is written.
Can you just leave the field blank than?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.