imgag / clincnv Goto Github PK

View Code? Open in Web Editor NEW

70.0 10.0 2.0 50.09 MB

Detection of copy number changes in Germline/Trio/Somatic contexts in NGS data

License: MIT License

Java 2.21% Shell 0.81% R 95.13% Python 1.85%

bioinformatics-tool bioinformatics-algorithms copy-number-variation

clincnv's People

Contributors

Stargazers

Watchers

Forkers

fohlen pythseq

clincnv's Issues

Germline algo changes

change output file name [sample]_cnvs.tsv
round number of decimal places (loglikelihood)
add CNV size (kB) column
add allele frequency in cohort column
add QC to header: numer of iterations, median percentage of outliers

clincnv runs an error about family samples

Hello, clincnv analyzed 3 family samples, and an error occurred. Although I set the parameter minimumNumOfElemsInCluster to 1, how can I solve the error? The error content is as follows:

[1] "We run script located in folder /fuer2/03.Soft/01.Soft_project/ClinCNV-1.17.2 . All the paths will be calculated realtive to this one. If everything crashes, please, check the correctness of this path first."
[1] "START cluster allocation."
[1] "Cluster allocated."
[1] "END cluster allocation."
[1] "We are started with reading the coverage files and bed files 2022-03-14 17:20:48"
[1] "Started basic quality filtering. 2022-03-14 17:20:49"
[1] "Amount of regions after filtering of 0-covered regions 99.528"
[1] "Normalization with GC and length starts. 2022-03-14 17:21:00"
[1] "Percentage of regions remained after GC correction: 0.997975235573553"
[1] "Amount of regions after GC-extreme filtering 99.327"
[1] "Amount of regions after Systematically Low Covered regions filtering 99.327"
[1] "We start to cluster your data (you will find a plot if clustering is possible in your output directory) ./result 2022-03-14 17:21:16"
Error: umap: number of neighbors must be smaller than number of items
Execution halted

Questions about the raw output file(s)'s columns

Hello, I searched for a while for a description of the columns in the "*_cnvs.tsv" output file. These are the columns:
#chr start end CN_change loglikelihood no_of_regions length_KB potential_AF genes qvalue

Some have obvious meaning, some don't (to me), is there an explanatory document somewhere?
Is CN_change code for something, or is it the actual number of copies?
loglikelihood of what?
no_of_regions, is this number of exons, or number of intervals in the input bedfile?
length_KB, length of what?
potential_AF, this seems lower than 1 always, allele frequency?
genes, I suppose this is empty unless the bed file was annotated?
qvalue, qvalue of what?

Thank you for your patience, sorry if I missed something obvious

PS. I only ask about this output file as I thought it was the best one to look through. Did I get this wrong, as well? Which of the three files makes most sense to look through?

How to prepare WES bed file without ngs-bit ?

--hi,

i don't have ngs-bit, is there another tool i can use to prepare this bed file:
https://www.twistbioscience.com/sites/default/files/resources/2019-06/Twist_Exome_Target_hg38.bed
thank you --

Documentation improvements

-[ ] TSV header: each value in one line, or group logically
-[ ] TSV header: number of iterations should not include super-recall
-[ ] SEG file: make log-likelihood positive
-[ ] SEG file: remove CN column
-[ ] SEG file: add coefficient of variation or similar value

Trios sample ID

Hello, clinCNV analyzed 5 trios, there was an error occurred. Is my sample ID's problem?
below is my sample ID file:
1_cyw,1_cywm,1_cywf
11_ywx,11_ywxm,11_ywxf
10_xjx,10_xjxm,10_xjxf
12_zxy,12_zxym,12_zxyf
13_zmh,13_zmhm,13_zmhf

There is error message:
[1] Error in strsplit(genesThatHasToBeSeparated[i], split = ",") :
[2]non-character argument
[3]Calls: source ... eval -> eval -> plotFoundCNVs -> unlist -> strsplit
[4]Execution halted

Thanks

No error if BAF file is not readable

ClinCnv throws no error if a BAF file is not readable.
Please throw an error in case any file that is given via the command line cannot be opened.

Running clinCNV reports an error

please help，thank`s .
run script : Rscript ./clinCNV.R --bed ./samples/bed_file.bed --normal ./samples/coverages_normal.cov --out result
the error is as follows：
......
Loading required package: sandwich
[1] "We start to estimate covariances between neighboring regions in germline data - may take some time 2022-03-10 17:22:41"
[1] "Tree of covariances (using 2 predictors - sum of regions' lengths and log2 of distance between regions) plotted in result 2022-03-10 17:23:28"
[1] "Calling started 2022-03-10 17:23:28"
[1] "Working with germline sample 0 2022-03-10 17:23:28"
[1] "Working with germline sample 1 2022-03-10 17:23:28"
Error in writeLines(c("#type=GENE_EXPRESSION", paste0("#track graphtype=points name="", :
cannot open the connection
Calls: source ... outputSegmentsAndDotsFromListOfCNVs -> makeTrackAnnotation -> writeLines
Execution halted

Error in 1:ncol(toyCoverageGermlineCohort) : argument of length 0 Calls: source -> withVisible -> eval -> eval

Hi,
I am running germline analysis for 40 samples on hg38 reference genome. The normal run (clinCNV.R --normal normal.cov --out outputFolder --bed annotatedBedFile.bed) without including offtarget regions works fine but when I add offtarget parameters then it has the following error.

Run command:
clinCNV.R --normal normal.cov --bed annotatedBedFile.bed --out outputFolder--normalOfftarget offtarget.cov --bedOfftarget annotatedBedFile_offtarget.bed --numberOfThreads 4 --hg38

Note: The bedfile was annotated with ngs-bits BedAnnotateGC and BedAnnotateGenes. Not path issue as the cov files and bedfiles can be read.

Does anyone know what went wrong?
Thanks :)

Is it possible to add mitochondrial DNA cytoband information?

Dear developers,

I am using your amazing tool for germline WES analysis. It works pretty well I think.
I was wondering if there was a way to use clinCNV for mitochondrial analysis. At the moment, I remove any chrM samples I have because the cytobandsHG38.txt file does not contain any chrM information. Is there a way to add chrM information to the cytobandsHG38.txt file?

Many Thanks,
Krutik

Order of the output CNVs

Hi,

I suggest we order the output CNVs exactly as they are ordered in the input Bed file or cov files respectively.

Code refactoring

remove parameter '--folderWithScript'
refactor code to split germline from somatic analysis (put it into the germline folder)
rename script to something else than firstStep.R, perhaps clincnv.R
add example data for each use-case => use as unit tests
check minimum version of R (3.2)

Offtarget coverage on targeted gene panel germline samples

Hi again,

I already calculated ontarget coverages. I also want to calculate the offtarget coverages to increase the overall accuracy.

Reading the docs, the steps are a bit confusing to me. I have some questions:

Why are chunks 50000pb? My targeted regions are exons (usually a few hundred of bases), so the 3rd step (Chunk offtarget into pieces of 50k), does not produce any change. Then, if I remove regions <25k (last step), obviously the resulting bed file is empty. Would you recommend a different chunk size for targeted gene panels?
Reading the parameter description, the offtarget file should contain a "GC-annotated" column. So, after following the steps to produce, Should I use BedAnnotateGC to annotate the final offtarget file?

A complete guide to produce offtarget coverage files on targeted samples would be really appreciated.

Thanks a lot in advance.

Failed in Determine.gender

Rscript clinCNV.R --bed hg38_nuc.bed --normal exome_germlines.cov --colNum 4 --reanalyseCohort TRUE --polymorphicCalling YES --superRecall SUPERRECALL --mosaicism --fdrGermline 10 --lengthG 1 --maxNumGermCNVs 100 --maxNumIter 3 --numberOfThreads 24 --out result

[1] "We run script located in folder /work/sassou/ClinCNV . All the paths will be calculated realtive to this one. If everything crashes, please, check the correctness of this path first."
[1] "You've choosen to detect polymorphic regions with the help of our tool - great choice!"
[1] "You suspect your samples to be mosaic - hmmm, we will check this out...(but the mosaic CN change should not be > 1 copy different from default"
[1] "START cluster allocation."
[1] "Cluster allocated."
[1] "END cluster allocation."
[1] "We are started with reading the coverage files and bed files 2021-06-21 13:36:13"
[1] "Started basic quality filtering. 2021-06-21 13:36:15"
[1] "Amount of regions after filtering of 0-covered regions 98.575"
[1] "Normalization with GC and length starts. 2021-06-21 13:36:44"
[1] "Percentage of regions remained after GC correction: 0.998089547500539"
[1] "Amount of regions after GC-extreme filtering 98.387"
[1] "Amount of regions after Systematically Low Covered regions filtering 98.387"
[1] "We start to cluster your data (you will find a plot if clustering is possible in your output directory) result 2021-06-21 13:37:29"
[1] "You ask to clusterise intro clusters of size 10000 but size of the cohort is 5 which is not enough. We continue without clustering."
[1] "Gender estimation started 2021-06-21 13:37:43"
Error in plot.new() : could not open file 'result/genders.png'
Calls: Determine.gender -> plot -> plot -> plot.default -> plot.new
Execution halted

Handling of centromer regions

Centromer regions should be added to the arms so that we don't miss centromer CNVs like in the Array benchmark.

Feature request: add to conda

Hi,

Would you be able to add your tool to bioconda? This would increase visibility and ease of use tremendously.

Conda packages also come with a free docker image in biocontainers, which is good for reproducability.

Thanks
M

somatic run error

Hi dear all:
when I run ClinCNV， there was a error flow:

any suggestions? thanks

No error if input regions are outside the cytobands.txt file

In this case an error should be thrown instead of just ignoring the regions/bins outside the defined cytobands!

"the condition has length > 1"

Hello!

Just getting started with ClinCNV. I am running the test samples unsuccessfully:

Rscript clinCNV.R --bed /home/joel/Programs/ClinCNV/samples/bed_file.bed --normal /home/joel/Programs/ClinCNV/samples/coverages_normal.cov --out test_results/ --folderWithScript $PWD

[1] "We are started with reading the coverage files and bed files 2022-05-24 14:06:07"

Error in if (substring(x, 1, nchar(prefix)) == prefix) { : 
  the condition has length > 1

Calls: startsWith
Execution halted

What's happening?

Thanks in advance!

number formatting

Please set "length_KB" and "potential_AF" fixed to 3 decimals.

[Ends of chrom] argument is of length zero

During our run we obtained the following error message

Our data is composed by one single WGS aligned with BWA using GRCh38.p14 as reference.
BED and COV files were prepared as recommended in the readme.md

Any suggestion?

Handling of PAR region

PAR region in males is called with CN=2.
This should be corrected, otherwise we might miss deletions.

Bugfix PAR regions + speeding up

no need to perform plotting
reduce the number of females for PAR regions

error cannot open Rplot.pdf

we have to fix this issue:
-[] If --noPlot is given, no plots should be generated.
-[] ClinCNV should run without write permissions in the installation folder (otherwise you cannot run it from a container)

Error in writeLines [...] cannot open the connection

Hi ClinCNV developers,

running ClinCNV with the provided test data produced an error which we couldn't resolve. All packages are installed as indicated (install_deps_clincnv.R).

log.txt

Thanks a lot!
Best
PS: running on HPC w/ ubuntu 18.04.5.

Proposed patch for ?no gene name

ClinCNV-1.18.3.patch
Patch file submitted - reported by and end user of ours as a correction from the developer.

How to format bed file for WES

--Hi,

is ClinCNV able to analyse germline WES ? And if yes how to format correctly the bed file ?
The bed file is used to buid the library with the target regions specific to the exome.

Thank you --

SEG output: indicate invalid regions

Hi German,

could you add information which input regions were skipped because of low quality.
Right now those regions cannot be easily recognized in IGV.
In CnvHunter I added them to the end of the SEG file:
https://github.com/imgag/megSAP/blob/master/test/data/vc_cnvhunter_out1.seg

For example, you could add failed regions to the SEG file and add a "QC" column that contains "qc failed" and some info why. The only drawback would be that you have to assign some CN value, e.g. CN=2 (or 1 for male gonosomes).

I opened a IGV issue to see if we can color the failed regions differently:
igvteam/igv#741

Best,
Marc

Argument is of length zero

Hello!

I've managed to run some hg19 exomes through ClinCNV, but not that I'm attempting some b37 exomes, it crashes like so:

[1] "We run script located in folder /home/joel/Programs/ClinCNV . All the paths will be calculated realtive to this one. If everything crashes, please, check the correctness of this path first."
[1] "START cluster allocation."
[1] "Cluster allocated."
[1] "END cluster allocation."
[1] "We are started with reading the coverage files and bed files 2022-08-19 14:32:58"
[1] "Started basic quality filtering. 2022-08-19 14:33:00"
[1] "Amount of regions after filtering of 0-covered regions 94.373"
Error in if (ends_of_chroms[[chrom]] < max(bedFile[bedFile[, 1] == chrom,  : 
  argument is of length zero
Execution halted

My input files are here:
https://file.io/rr1fFFpq8PzH

Thanks in advance!

ClinCNV creates file not neccessary

Hi,

ClinCNV creates a file "Rplots.pdf" in 1.14-stable in the script directory. That should not be there. Please create a commit or bugfix for 1.14 which fixes that problem.

Best,

Axel

error about clinCNV.R :

Hi @GermanDemidov @marc-sturm @bondarevts @jakobmatthes @Fohlen

 I got a problem , when I run clincnv.R . 
Do you have any suggestions for a solution?

`$Rscript $ClinCNV/clinCNV.R  \
    --bed $prepare/gcAnnotated.preparedBedHg38.bin50000.bed \
    --normal $prepare/merge_result/merge_S17.cov \
    --folderWithScript $ClinCNV \
    --scoreG 50 \
    --numberOfThreads 10 \
    --out $result/clincnv_prepare_result

There is error message:[1] "We run script located in folder /hwfssz1/CS_CELL/cs_cell/marui1/software/ClinCNV . Please, specify ABSOLUTE paths, relative paths do not work for every machine. If everything crashes, please, check the correctness of this path first."
[1] "START cluster allocation."
[1] "Cluster allocated."
[1] "END cluster allocation."
[1] "We are started with reading the coverage files and bed files 2023-03-23 11:33:33"
[1] "Started basic quality filtering. 2023-03-23 11:33:35"
[1] "Amount of regions after filtering of 0-covered regions 98.565"
[1] "Coordinates in BED file are outside of the cytobands! Please check if your cytobands file matches your reference genome version!"
`

Thanks

Make available via Bioconda

Hi! Would it be possible to make this tool available via bioconda for easier installation and automatic containerization via biocontainers?

Cheers, Rike

#How to create my.bed file for WGS

Hi @marc-sturm @bondarevts @jakobmatthes @Fohlen @GermanDemidov

I don't really understand how to create my.bed file for WGS, 
my current samples only have bam format and original fasta format. 
The reference genome I use is hg38. 
Do I need to download the file hg38.chrom.sizes ?(https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes)

Release 1.16 changes

Add analysis type header line
Make formatting of QC metrics uniform (name: value)
Change number of decimals for 'median_loglikelihood'

How to determine the cnv type

Hi @GermanDemidov @marc-sturm

How to determine from the result table whether the type of cnv is deletion or duplication?

thanks!

Wrong warning

ClinCnv shows this warning, with R 4.1 which should not be there I guess:


Stdout of '/mnt/storage1/share/opt/R-4.1.0/bin/Rscript --vanilla /mnt/storage2/GRCh38/share/opt/ClinCNV-1.17.1/clinCNV.R':
[1] "Your R version is too old. We can not guarantee stable work."

Error in 1:ncol(coverages) : argument of length 0 Calls: gc_and_sample_size_normalise

Hi @marc-sturm

When I run 1 sample, I got this error.
May I ask if this tool is not suitable for a single sample?

Documentation changes

links on main page to real documentation
use GitHub issue tracker instead of email
add license file to repository via GitHub
put documentation of each use-case (gemline/somatic/trio) to one sub-page
document minimum version of R (3.2)

Remove (first) somatic folder and move everything in root folder

Parameter values

Hi,

Congrats for this useful tool. I have a couple of questions:

Which is the recommended value of maxNumGermCNVs parameter for gene panel samples (100-130 genes)? Here I understand that default is 10000 but 2000 is suggested for WES samples, right?
Which is the default value of maxNumIter parameter? I didn’t find it. Do you recommend a specific value for germline calling on gene panel samples (100-130 genes)?

Thanks!

Override sample gender

Hi German,

Alex Seitz had a male patient with a large duplication on chrX, so it was determined to be a female.
Is there a way to overwrite the gender for a sample?

Best,
Marc

The data of the case can be found here: /mnt/users/ahsturm1/Sandbox/ClinCNV/bug_gender_clustering/

Genes output

If there is no gene overlapping with a CNV, currently 'NA' is written.
Can you just leave the field blank than?