wedge-lab / battenberg Goto Github PK

View Code? Open in Web Editor NEW

75.0 75.0 47.0 898 KB

Battenberg R package for subclonal copynumber estimation

License: GNU Affero General Public License v3.0

R 98.76% Dockerfile 0.80% Shell 0.44%

battenberg's People

Contributors

Stargazers

Watchers

battenberg's Issues

Can Battenberg be used with ASCATngs output?

Hi,

Is it possible to run the battenberg pipeline over ASCAT or ASCATngs output? For example I have the allele counts for both the tumor and the normal sample from ASCATngs, could I just feed it in to the battenberg function and run the pipeline from there?

I tried doing so, with the code below but I get an error message telling me that the unused argument (normal_name = "MCF10A"). When I remove the normal_name parameter, it fails and gives this error Error in prepare_wgs(chrom_names = chrom_names, tumourbam = tumour_data_file, : argument "normalname" is missing, with no default. I tried feeding the battenberg function analysis = "cell_line", but I get the error unused argument (analysis= "cell_line").

Any help would be greatly appreciated, thanks!

Code to try and run battenberg pipeline:
battenberg(tumourname= 'H_LS-BH-A208-01A-11D-A159-09', tumour_data_file = '~/tmpAscat/ascat/H_LS-BH-A208-01A-11D-A159-09_alleleCounts.tab', normal_name ='MCF10A', normal_data_file = '~/tmpAscat/ascat/MCF10A_0_EDHG200003860-1a_H2L5YDSXY_L3.count' , ismale=FALSE, imputeinfofile='~/battenberg/reference/hg38/imputation/impute_info.txt', g1000prefix='~/battenberg/reference/hg38/1000G_loci/1kg.phase3.v5a_GRCh38nounref_loci_chr', g1000allelesprefix='~/battenberg/reference/hg38/1000G_loci/1kg.phase3.v5a_GRCh38nounref_allele_index_chr', gccorrectprefix='~/battenberg/reference/hg38/GC_correction/1000G_GC_chr', repliccorrectprefix='~/battenberg/reference/hg38/replicationtimings/1000G_RT_chr', problemloci='~/battenberg/reference/hg38/problem_loci/probloci.txt.gz', data_type='WGS', skip_allele_counting=T, skip_preprocessing=F, skip_phasing=F, prior_breakpoints_file=NULL)

Warmest regards,
Hannan

Set seed and save to disk

Determine a seed and use that when randomising the alleles and when running impute. Save it to a file such that a rerun of BB will yield the exact same results. This negates the necessity to keep all the output.

'sv_breakpoints_file = NULL' option in callSubclones()

Hey guys!

sv_breakpoints_file = NULL or sv_breakpoints_file = NA throws an error as neither NULL == "NA" or NA == "NA" returns a TRUE/FALSE statement in R for the if statement if (!is.null(sv_breakpoints_file) & !sv_breakpoints_file == "NA").

Hope you are all well!

George

Subclones output file

Hi,
A scientist in our group has found a case where the cellular fraction (frac2_A) of the nMaj2_A and nMin2_A columns has a higher value than frac1_A from the callSubclones subroutine (version 2.2.5). We would be very grateful if you could explain why this is the case.

For example:

zcat subclones.txt.gz | awk 'BEGIN{ FS=OFS="\t" }{ print $1,$2,$3,$8,$9,$10,$11,$12,$13 }' | column -t | less -S
chr startpos endpos nMaj1_A nMin1_A frac1_A nMaj2_A nMin2_A frac2_A
...
12 19462666 19698320 3 0 0.381011741894733 3 1 0.618988258105267
...
12 37866866 83372701 2 1 0.431746072237525 3 1 0.568253927762475
...
12 83501125 133836024 2 0 0.332939966391955 2 1 0.667060033608045
...

Are these fractions supposed to be listed with major fraction on the left and minor fraction on the right, which is not what we are seeing here?
Regards
Kathryn

Move creation of segmented LogR to separate function

segmented LogR is now created within fitcopynumber. This means it is redone each time a copy number profile is (re)fitted. Moving this step to a separate function (or making it part of the segmentation output) would reduce runtime there. This could possibly be combined with moving the last two steps to only use the segmented data, removing the need for the raw data to be read in (ticket #5).

Incorrect variable name in prepare_wgs.R

Hi,

Thanks for the great package.
I was keep having the error saying "subscript out of bounds" while running prepare_wgs -> getBAFsAndLogRs.
This is the stacktrace:

Error in as.matrix(x)[i] : subscript out of bounds
In addition: There were 11 warnings (use warnings() to see them)

Enter a frame number, or 0 to exit

1: source("battenberg_wgs_debug.R")
2: withVisible(eval(ei, envir))
3: eval(ei, envir)
4: eval(ei, envir)
5: battenberg_wgs_debug.R#79: battenberg(tumourname = TUMOURNAME, normalname =
6: prepare_wgs(chrom_names = chrom_names, tumourbam = tumour_data_file, normal
7: getBAFsAndLogRs(tumourAlleleCountsFile.prefix = paste(tumourname, "_alleleF
8: mutant_data[cbind(1:len, allele_data[, 3])]
9: `[.data.frame`(mutant_data, cbind(1:len, allele_data[, 3]))

Debugging it showed me that in line 62 of prepare_wgs.R had a wrong variable:
normal_input_data = normal_input_data[chrpos_tumour %in% matched_data,]
Here, it seems that chrpos_tumour needs to be chrpos_normal for it to be match in dimensions.
If it is not written as chrpos_tumour on purpose, please fix the error. Thanks!

interpretation of [samplename]_subclones_chr*.png files

Hello,

This question is regarding the [samplename]_subclones_chr*.png. Is it the BAF what those plots are showing?

Many thanks
Jorge

Empty reference file shapeit2 chrX_nonPAR

Hi there,

The reference file ALL.v1a.shapeit2_integrated_chrX_nonPAR.GRCh38.20181129.phased.hap.gz provided here appears to be empty.

This caused an error in impute2, which did not get picked up by Battenberg:

ERROR: There are no type 2 SNPs after applying the command-line settings for this run, which makes it impossible to perform imputation. One possible reason is that you have specified an analysis interval (-int) that contains reference panel SNPs but not inference panel SNPs -- e.g., this can happen at the ends of chromosomes. Another possibility is that your genotypes and the reference panel are mapped to different genome builds, which can lead the same SNPs to be assigned different positions in different panels. If you need help fixing this error, please contact the authors.

and so the CN profiles generated had a large chunk of X with no breakpoints across a fairly large cohort of samples.

I assume this file should not be empty, so would it be possible to provide the correct version?

Thanks so much!

Callsubclones error

Hi there,

Getting an error in the callsubclones step:
Error in t.test.default(logR[logR$Chromosome == subclones$chr[i - 1] & :
not enough 'x' observations
Calls: callSubclones -> merge_segments -> t.test -> t.test.default

Not sure if it is related, but the fitcn step mentions that 'reference segment did not provide a possible solution':
[1] "ploidy=1.78246500278682,rho=0.81,goodness=84.6365342478127,percentzero=0.292220227244316, perczerAbb=0.292220227244316"
[1] "ploidy=3.51630145168029,rho=0.68,goodness=77.8766478542867,percentzero=0.289668502443199, perczerAbb=0.289668502443199"
[1] "goodnessOfFit from grid=0.52666849231385"
[1] "reference segment did not provide a possible solution"

I have this pipeline working for the majority of my samples, but would like to know if there is a way to resolve these sorts of issues. It results in the finalise step failing as it can't find the subclones.txt file. Are these samples for which there is simply no solution?

Any help would be greatly appreciated! Thanks!

chr names bug

battenberg/R/prepare_wgs.R

Lines 54 to 56 in 82553a9

 allele_data[,1] = gsub("chr","",allele_data[,1]) 

 normal_input_data[,1] = gsub("chr","",normal_input_data[,1]) 

 input_data[,1] = gsub("chr","",input_data[,1])

Hi,
Note these 'gsub' lines in 2.2.10 are causing a bug in running hg38 with "chr" prefix contigs. These lines are removing "chr" and causing the subsequent line in gc.correct.wgs to fail to match up entries between Tumor_LogR and GC_data.

locimatches = match(x = paste0(Tumor_LogR$Chromosome, "",
Tumor_LogR$Position), table = paste0(GC_data$chr, "",
GC_data$Position))

Dockerfile fails at instillation of R packages

Hello,

In an effort to implement Battenberg into a WGS pipeline that is deployable across various systems, one of the first issues I encountered was the failing build recipe used in the current Dockerfile.
When following the directions in the README, this is the outcome from the build command:

$ docker build -t battenberg:2.2.9 .
[+] Building 334.6s (13/21)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.58kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:16.04 1.2s
=> [auth] library/ubuntu:pull token for registry-1.docker.io 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 1.43MB 0.1s
=> [ 1/16] FROM docker.io/library/ubuntu:16.04@sha256:e74994b7a9ec8e2129cfc6a871f3236940006ed31091de355578492ed140 4.2s
=> => resolve docker.io/library/ubuntu:16.04@sha256:e74994b7a9ec8e2129cfc6a871f3236940006ed31091de355578492ed140a3 0.0s
=> => sha256:0ba7bf18aa406cb7dc372ac732de222b04d1c824ff1705d8900831c3d1361ff5 527B / 527B 0.2s
=> => sha256:e74994b7a9ec8e2129cfc6a871f3236940006ed31091de355578492ed140a39c 1.42kB / 1.42kB 0.0s
=> => sha256:edf232ee7dc18c57c063bc83533ef2c03c6dfae55a0124f7d372aed51cd1d9c8 1.15kB / 1.15kB 0.0s
=> => sha256:8185511cd5ad68f14aee2bac83a449a6eea2be06f0a4715b008cfe19f07a64f7 3.36kB / 3.36kB 0.0s
=> => sha256:4007a89234b4f56c03e6831dc220550d2e5fba935d9f5f5bcea64857ac4f4888 45.96MB / 45.96MB 1.2s
=> => sha256:5dfa26c6b9c9d1ccbcb1eaa65befa376805d9324174ac580ca76fdedc3575f54 852B / 852B 0.3s
=> => sha256:4c6ec688ebe374ea7d89ce967576d221a177ebd2c02ca9f053197f954102e30b 169B / 169B 0.3s
=> => extracting sha256:4007a89234b4f56c03e6831dc220550d2e5fba935d9f5f5bcea64857ac4f4888 2.2s
=> => extracting sha256:5dfa26c6b9c9d1ccbcb1eaa65befa376805d9324174ac580ca76fdedc3575f54 0.0s
=> => extracting sha256:0ba7bf18aa406cb7dc372ac732de222b04d1c824ff1705d8900831c3d1361ff5 0.0s
=> => extracting sha256:4c6ec688ebe374ea7d89ce967576d221a177ebd2c02ca9f053197f954102e30b 0.0s
=> [ 2/16] RUN apt-get update && apt-get install -y libxml2 libxml2-dev libcurl4-gnutls-dev r-cran-rgl git libssl 60.3s
=> [ 3/16] RUN mkdir /tmp/downloads 0.3s
=> [ 4/16] RUN curl -sSL -o tmp.tar.gz --retry 10 https://github.com/samtools/htslib/archive/1.7.tar.gz && mk 33.9s
=> [ 5/16] RUN curl -sSL -o tmp.tar.gz --retry 10 https://github.com/cancerit/alleleCount/archive/v4.0.0.tar.gz && 1.7s
=> [ 6/16] RUN curl -sSL -o tmp.tar.gz --retry 10 https://mathgen.stats.ox.ac.uk/impute/impute_v2.3.2_x86_64_stati 4.1s
=> [ 7/16] RUN R -q -e 'source("http://bioconductor.org/biocLite.R"); biocLite(c("gtools", "optparse", "devtools 228.1s
=> ERROR [ 8/16] RUN R -q -e 'devtools::install_github("Crick-CancerGenomics/ascat/ASCAT")' 0.6s

[ 8/16] RUN R -q -e 'devtools::install_github("Crick-CancerGenomics/ascat/ASCAT")':
#12 0.546 > devtools::install_github("Crick-CancerGenomics/ascat/ASCAT")
#12 0.547 Error in loadNamespace(name) : there is no package called 'devtools'
#12 0.547 Calls: :: ... tryCatch -> tryCatchList -> tryCatchOne ->
#12 0.547 Execution halted
executor failed running [/bin/sh -c R -q -e 'devtools::install_github("Crick-CancerGenomics/ascat/ASCAT")']: exit code: 1

After troubleshooting, the issue is the R packages to be installed via Bioconductor fail. I believe this is due to the base R version that comes with the Ubuntu 16.04 base image is too far out of date for some of the packages and their internal dependencies.
I have attached an edited version of the Dockerfile that has a successful build recipe (NOTE: the file was given a '.txt' extension for uploading purposes and to be used this extension must be removed). The changes being updating the OS base image to Ubuntu 20.04 and using Bioconductor's BiocManager installation method.
Also, I think it should be clearly stated that this build does not work with any hg38 aligned data or the newly provided hg38 reference data. This will be brought up in subsequent issues.

Dockerfile.txt

Docker Engine specs:

$ docker version
Client: Docker Engine - Community
Cloud integration: 1.0.9
Version: 20.10.5
API version: 1.41
Go version: go1.13.15
Git commit: 55c4c88
Built: Tue Mar 2 20:13:00 2021
OS/Arch: darwin/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.5
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 363e9a8
Built: Tue Mar 2 20:15:47 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0

Write out gzipped files with gzfile

Add a function like below and use that to write out data files:

writeGzFile <- function(){
   gz1 = gzfile("df1.gz","w");
   write(out1, gz1);
   close(gz1) 
}

Source: http://stackoverflow.com/questions/14225566/write-a-gzip-file-from-data-frame

reference files for GRCh38

Hello,

I was wondering if there is a Battenberg reference area for GRCh38, or a procedure to generate the relevant files.

Many thanks
Jorge

Error in gc.correct.wgs

Hello,

At some point in my Battengerg run I get the following error:

Error in [.data.frame(GC_data, , 2 + maxGCcol_short) :
undefined columns selected
Calls: gc.correct.wgs -> [ -> [.data.frame
In addition: Warning message:
In cor(GC_data[flag_nona, 3:ncol(GC_data)], Tumor_LogR[flag_nona, :
the standard deviation is zero

I was wondering if you could please help me to debug the error.

Many thanks
Jorge

Run refit using refit_suggestion.txt

Hello,

I was wondering how to run refit using the refit suggestions generated in the initial run?

Also what is the difference between the *subclones_1.txt and *subclones.txt outputs.

Many Thanks,
Rashesh

step producing file <tumour_sample>_battenberg_cn.vcf.gz

Hello Dave,

I was wondering if you could let me know which step of battenberg.pl produces these two files:
<tumour_sample>_battenberg_cn.vcf.gz
<tumour_sample>_battenberg_cn.vcf.gz.tbi

The reason for asking is that I cannot find them and yet the pipeline seems completed.

Many thanks
Jorge

Rework workflow to remove unneeded data reads

The workflow could be changed to remove the need to read in data at various steps.

The workflow could exist of essentially 4 main blocks:

preparation
haplotyping (everything that can be run in parallel)
segmentation
fitting copy number

These can be implemented in 4 separate R scripts that can be called by cgpBattenberg

Difference between the sum of copy number segments and ntot values.

Hi,
I hope you can help me to understand the output from battenberg.
In the output of the call subclones part of the pipline a "segmentation file" is produced. This file has the clonal / subclonal copy number for each segment in the genome.
Here is a section that deomonstrates my confusion

chr startpos endpos BAF pval LogR ntot nMaj1_A nMin1_A frac1_A

21 5066290 5377969 0.534632035 0.203264531 0.073106415 2.109409229 1 1 1

21 5379291 13227129 0.790123457 0.090253745 0.064570678 2.096525116 2 1 1

21 13227353 46672988 0.5 1 0.004959692 2.008640488 1 1 1

Put most simply for regions with no called clonality (eg row 2) why is the sum of the major and minor alleles not equal to ntot. What does the ntot represent? How should there differences be reconciled?

many thanks

Bug in chr names in parse.imputeinfofile

Hi,
There is a bug in the below lines causing hg38 (contigs with chr prefix) to fail.

battenberg/R/impute.R

Lines 68 to 71 in 604a5d2

 chr_names=unique(impute.info$chrom) 

 # Subset for a particular chromosome 

 if (!is.na(chrom)) { 

 impute.info = impute.info[impute.info$chrom==chr_names[chrom],]

Specifically, this line: chr_names = unique(impute.info$chrom)
makes a list of unique chromosomes names and stores it in chr_names.

But this line accesses chr_names[chrom] using the chromosome name, but chr_names is not a named vector.
For example:

> chr_names
 [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9" 
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chr21" "chr22" "chrX" 
> chrom
[1] "chr10"
> chr_names[chrom]
[1] NA

Is there a version of Battenberg that can handle the "chr" prefixed contig names, specifically for hg38?

Thanks.

Missing RT_correction_hg38.zip file.

Hi,

I do not see the RT_correction_hg38.zip file in the https://ora.ox.ac.uk/objects/uuid:08e24957-7e76-438a-bd38-66c48008cf52 directory. I have some older copy of this (downloaded from some Dropbox a year ago). Is it just missing, or is it no longer being used?

Best wishes,
Paweł

battenberg_impute_1000G_v1.tar.gz can't be downloaded

Hi,
I can't download 1000G file. Seems the link is broken.

[E::hts_idx_push] Unsorted positions on sequence #6: 135999720 followed by 1

Hi,

As I posted at cancerit/cgpBattenberg#94, would you be able to add options(scipen = 999) to the battenberg R codes so that users don't have to change the default Rprofile file?

Thanks,

Taka

Move GC_SNP and SNPPOS references to ref file

These and a few more reference files can move to the SNP6 ref file.

Issues with hg38 aligned BAMs and new hg38 reference data

Hello,

With the recent release of the hg38 reference data, I have been working to run Battenberg with my hg38 aligned BAM files but have run into multiple issues so I wanted to make others aware.

First, I recommend those using looking to use the hg38 reference data to download them from the Dropbox site https://www.dropbox.com/sh/bize1n830t0mgzb/AACuzQiHbjQGqIuTJC3Xahzda?dl=0 instead of the ORA site as there seemed to be an issue with completing the download of the beagle5.zip. Also, the ORA site is completely missing the shapeit2 files that are used in the impute_info.txt file. Lastly for this point, the README.txt for the reference data is incomplete (there needs to be symlinks for the chrX files in the 1000G_loci_hg38 directory)

Second, the current version of the master branch (which is the default branch for those cloning the repo) does not support the new hg38 reference data. The hg38_ChrNotation_fix_NAP branch has implemented a fix for working with the hg38 reference data that will allow you to get past the first step (the prepare_wgs function call in the battenberg.R script) correctly.

Third, when using the hg38_ChrNotation_fix_NAP branch, the battenberg_wgs.R script requires extensive rewriting of default variables for compatibility with the hg38 reference data. The "..../" is used here as a placeholder for some user defined parent directory. The second half of the variables are specific for the use of Beagle5. Important notes for this part include: running Beagle5 requires Java 8, the JAR executable file must be downloaded https://faculty.washington.edu/browning/beagle/b5_0.html, all the reference files downloaded from the beagle5.zip need to moved into a new subdirectory that matches the GENOME_VERSION string (i.e. ..../beagle5/b38), and the JAR file must be in the ..../beagle5 directory.

IMPUTEINFOFILE = "..../imputation/impute_info.txt"
G1000PREFIX = "..../1000G_loci_hg38/1kg.phase3.v5a_GRCh38nounref_allele_index_chr"
G1000PREFIX_AC = "..../1000G_loci_hg38/1kg.phase3.v5a_GRCh38nounref_loci_chrstring_chr"
GCCORRECTPREFIX = "..../GC_correction_hg38/1000G_GC_chr"
REPLICCORRECTPREFIX = "..../RT_correction_hg38/1000G_RT_chr"
PROBLEMLOCI = "..../probloci.txt.gz"

GENOME_VERSION = "b38"
BEAGLE_BASEDIR = "..../beagle5"
BEAGLEJAR = file.path(BEAGLE_BASEDIR, "beagle.12Jul19.0df.jar")
BEAGLEREF.template = file.path(BEAGLE_BASEDIR, GENOME_VERSION, "chrCHROMNAME.1kg.phase3.v5a_GRCh38nounref.vcf.gz")
BEAGLEPLINK.template = file.path(BEAGLE_BASEDIR, GENOME_VERSION, "plink.chrCHROMNAME.GRCh38.map")

Fourth, there is a hardcoded variable in the battenberg function within the battenberg.R called "GENOMEBUILD" which is set to "hg19". This must be edited in place to "hg38".

Finally, even after all this, I have still been unsuccessful in having the pipeline execute 100% properly. I will detail the current issue in a new thread. I am grateful that the developers have produced the hg38 reference files, I just wish there also would have been a warning that they aren't fully ready for use in the current version of the software.

Ref files for hg38

Hello,

Do you currently offer reference files for hg38? If not, would you recommend just using Liftover?

Thank you!

error when producing sunRise plot

Hi,
We have a case where no copy number solutions have been found for a data set, with the appropriate message written to the cnaStatusFile. However, we then get an error when trying to produce the sunrise plot:
Error in if (psi_opt1 > 0 && rho_opt1 > 0) { : missing value where TRUE/FALSE needed
Calls: fit.copy.number -> runASCAT -> <Anonymous>
In addition: Warning message:
In log(d) : NaNs produced

Should the code exit if no copy number solutions have been found and not try to create a sunrise plot? If it is possible to produce a valid sunrise plot in this case, would it possible for you to suggest a cause for the error we are seeing?
Regards
Kathryn

difference between sublones.txt and subclones_1.txt

Hi,
when running the call_subclones command of the Battenberg pipe line two sublcones files are generated.

subclones_1.txt (generated first and usually with more CN segments
sublones.txt

this appears to be independent of the number of copy number solutions for that sample.
What is the difference?

many thanks

Jamie

Warning: invalid package '/opt/battenberg'

Hello,

I am getting the below error while building the docker image.
Jorge

Step 13/20 : RUN R -q -e 'install.packages("/opt/battenberg", repos=NULL, type="source")'
---> Running in 6d840fc92d87

install.packages("/opt/battenberg", repos=NULL, type="source")
Installing package into '/usr/local/lib/R/site-library'
(as 'lib' is unspecified)
Warning: invalid package '/opt/battenberg'
Error: ERROR: no packages specified
Warning message:
In install.packages("/opt/battenberg", repos = NULL, type = "source") :
installation of package '/opt/battenberg' had non-zero exit status

Error in write_battenberg_phasing function

Hello,

While attempting to run hg38 aligned BAMs through Battenberg (hg38_ChrNotation_fix_NAP branch) using the hg38 reference data. I encountered an error that stems from the write_battenberg_phasing function which halted execution. The log file which I've attached dictates the following:

Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
Calls: battenberg ... write_battenberg_phasing -> merge -> merge.data.frame -> fix.by

To try and troubleshoot the problem, I worked through the possible lines that could be the cause.
The bafsegmented_file as well as all SNPfiles and imputedHaplotypeFiles exist and are appear to be populated correctly.
And what makes things more confusing is that the output VCF files from this function all seem to have been produced (filename convention - [tumorname]Battenberg_phased[chrom].vcf) and are populated.

battenberg_output.log

correspondence between _A columns in the _subclones.txt and BattenbergProfile_subclones.png

Hello,

I was wondering if you could please help me to interpret the Battenberg output, which has given me two subclones. I understand the _A columns in the _subclones.txt match somehow segmetation plot in BattenbergProfile_subclones.png.

As I have two subclones, am I right saying that the columns nMaj1_A, nMin1_A contain the CN segments for one clone and the columns nMaj2_A, nMin2_A contain the CN for the other clone?

Then in the BattenbergProfile_subclones.png I see two different colours (as in ASCAT) and then thick and thin lines. Do the thick lines corrrespond to one clone and the thin ones to the other?

Thanks so much
Jorge

Check for reference files to exist before trying to load

The current pipeline returns a non-informative error when a reference file does not exist. The dopar foreach loop now yields:

Error in { : task 1 failed - "cannot open the connection" dopar

Build in ASCAT package - remove duplicate code

BB should not have the same code that is included into ASCAT. There are some issues when loading both packages at the moment as functions are overloaded.

Currently the recommended procedure is to load Battenberg before ASCAT if ASCAT is to be used and ASCAT before Battenberg when BB is to be used.

Error in if (nMinor < 0) { : missing value where TRUE/FALSE needed Calls: callSubclones -> determine_copynumber

Hi,

This is very rare but a few samples had this issue below when calling callSubclones().

Error in if (nMinor < 0) { : missing value where TRUE/FALSE needed
Calls: callSubclones -> determine_copynumber

This may be due to ploidy NA (FRAC_GENOME) in sample_rho_and_psi.txt?

rho psi ploidy distance is.best
ASCAT 1 2.15 2.10143838105564 NA NA
FRAC_GENOME 0.95 NA 2.09906082321258 0 FALSE
REF_SEG 1 2 2 1 TRUE

Thanks,

Taka

Rewrite fitcopynumber to not need full data

fitcopynumber could possibly run from just the segmented data and it's worth exploring whether this is possible. It would mean the raw data doesn't have to be read in and is not needed after impute in general.

Instruction about the result

Hello:

I just some results files from Batternberg algorithm. Is there any instruction or readme file the interpret the result? here is all the files, do you think the program is completed or not? Thanks

-rw-r--r-- 1 zhangt8 zhangt8 201M Jul 18 01:02 GT01062_2017_allelecounts.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 212M Jul 18 01:03 GT01062_2018_allelecounts.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 9.8M Jul 18 01:03 GT01062_2018_hetbaf.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 9.5M Jul 18 01:03 GT01062_2018_hetdata.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 369M Jul 18 01:05 GT01062_2018_impute_input.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 362M Jul 18 01:06 GT01062_2018_impute_output.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 18M Jul 18 01:07 GT01062_2018_logR_Baf_segmented.vcf.gz
-rw-r--r-- 1 zhangt8 zhangt8 1.2M Jul 18 01:07 GT01062_2018_logR_Baf_segmented.vcf.gz.tbi
-rw-r--r-- 1 zhangt8 zhangt8 3.0M Jul 18 01:03 GT01062_2018_other.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 3.3M Jul 18 01:03 GT01062_2018_rafseg.tar.gz
drwxr-xr-x 2 zhangt8 zhangt8 4.0K Jul 18 01:03 GT01062_2018_subclones/
-rw-r--r-- 1 zhangt8 zhangt8 4.4M Jul 18 01:03 GT01062_2018_subclones.tar.gz
drwxr-xr-x 14 zhangt8 zhangt8 512K Jul 18 01:07 tmpBattenberg/

Also, I got an error information during running:
Use of uninitialized value in concatenation (.) or string at /opt/wtsi-cgp/lib/perl5/Sanger/CGP/Battenberg/Implement.pm line 1, do you know what's issue here?

Thanks.

Read in data with readr package

The readr package is quite a bit quicker when reading in data. Part of this replace should be to put all the data parsers into a separate R file.

infile = "3e8a2c90-e747-4a22-bc9e-0b062479fec2_alleleFrequencies_chr4.txt"

system.time(read.table(infile, header=T, stringsAsFactor=F))
   user  system elapsed 
 18.045   0.364  18.562

system.time(read.delim(infile, header=T, stringsAsFactor=F, sep="\t"))
   user  system elapsed 
 22.165   0.744  22.872

library(readr)
system.time(read_tsv(infile, col_names=T))
   user  system elapsed 
 12.605   0.108  12.724

number of 'X' in sunrise plot and their correspondence to subclones.txt file

Hello,

I have done different Battenberg runs where I get either one or two 'X' in the sunrise plot. I was wondering if you could help me to understand how the number of 'X' links to the solutions in the subclones.txt.

Specifically I have two questions, any advice on those will be much appreciated.

Please correct me if I am wrong. My understanding is that a sunrise plot with a single 'X' means that there is a single tumour cell population. Then a sunrise plot with two 'X' means that there are two tumour cell populations corresponding to two states of any of the possible solutions in the subclones.txt file.
I see however that even if there is a single 'X' in the sunrise plot the solutions in the subclones.txt include two states. Does this mean that one of the stages is negligible?

I have attached a png file showing two sunrise plots with one and two 'X'. In the plot with a single 'X' I am also showing the two states of solution A.

Many thanks
Jorge

Can I skip some code to get only 2 output files?

Hello,

I am running Battenberg to get these two outputs:
[samplename]_subclones.txt, [samplename]_rho_and_psi.txt

Is there a way to skip some of the code to speed things up and get only these outputs?
Currently one run takes 5 hours or more.

Thanks,
Naama

Reference files - GRCh38 and hg19

Hi,

I was wondering whether it is possible to use Battenberg with hg19 or GRCh38? Do you provide reference file bundle?

Thanks

dev branch fails to install

FYI installation fails on the latest dev commit:

Error in parse(outFile) :                                                                                                                                                 
  /tmp/Rtmpu8a736/R.INSTALLab700ac38d/Battenberg/R/battenberg.R:307:72: unexpected ','                                                                                    
306:       # rename the original files without multisample phasing info                                                                                                   
307:       MutBAFfiles <- paste0(tumourname[sampleidx], "_chr", chrom_names),

The beagle5.zip in Battenberg reference (hg38) doesn't work.

Hello Naser,

This is Jian. When I try to download Battenberg reference (hg38) from the following link: https://ora.ox.ac.uk/objects/uuid:08e24957-7e76-438a-bd38-66c48008cf52 . I found that beagle5.zip cannot be compressed. Did you have this problem?

Thank you.

Docker build error & No permissions to output from docker...

I was not able to build the Battenberg 2.2.9 docker image sucessfully. What worked was pulling files from the github: OpenGenomics/Battenberg. After this, I was able to get the docker to run and enter the terminal.
When I run:

R CMD BATCH '--no-restore-data --no-save --args -t P2055N -n P1907T --nb /mnt/battenberg/P1907T.final.bam --tb /mnt/battenberg/P2055N.final.bam --sex female -o /mnt/battenberg/' /usr/local/bin/battenberg_wgs.R /mnt/battenberg/battenberg.Rout

I get the output:

'/usr/lib/R/bin/BATCH: 60: cannot create /mnt/battenberg/battenberg.Rout: Permission denied'

Does anyone have a solution to get permission to get permission to write the output?

Changing max_rho from 1 to 0.9 will break ascat.plotSunrise()

Hi,

As sometimes purity seems to be overestimated, I tried max_rho = 0.9 but it caused the issue below.

Error in axis(2, at = seq(0, 1/purity_max, by = 1/3/purity_max), labels = seq(purity_min, :
'at' and 'labels' lengths differ, 4 != 3
Calls: fit.copy.number -> runASCAT -> -> axis

I'm currently testing min_rho = 0, which may fix the issue but I think the axis labels should handle different min max rho values...?

Thanks,

Taka

Issues with running battenberg with hg38

Hello,

I have been trying to run Battenberg with hg38 and have been unable to do so with the master branch. I was looking through the comments and saw that a couple of months ago @pblaney recommended hg38_ChrNotation_fix_NAP but he still had issues.
Is there a stable version for hg38? What branch should I clone?

Error in callChrXsubclones function

Hello,

I'm running Battenberg for hg19 with a case that doesn't seem to contain any copy number alteration. However, I'm not be able to complete the analysis because of the following error: Error in { : task 1 failed - "missing value where TRUE/FALSE needed" .

I tracked the error to the callChrXsubclones function. I'll try to explain the reason and the two possible solutions I've thought of, but I don't know if they are correct or maybe something else needs to be changed:

In my case, nrow(BBloh) = 1 in line 62
So, it enters in the if statement of line 64, but BBloh remains 1
In line 77 seg$type = "loss" according to seg$mean and in line 84 seg$CNA = "yes"

Up to this point, the declaration of the variables seems to be correct.

Because of seg$CNA = "yes", it enters in the if statement of line 87, where the problem appears in line 126. The code of this part is:

else if (seg$type == "loss") {
seg$CN = 0
if (nrow(BBloh) > 0) {
seg$clonal = ifelse(round(abs(explogrLoss - seg$mean), digits = 2) < round(abs(sd(BBloh$LogR)/explogrLoss), digits = 2), "yes", "no")
}
else if (nrow(BBloh) == 0) {
seg$clonal = ifelse(round(abs(explogrLoss - seg$mean), digits = 2) < round(abs(BBsd_max/explogrLoss), digits = 2), "yes", "no")
}
}

As nrow(BBloh) = 1, it enters in the first if, but when the function tries to calculate the sd(BBloh$LogR), it returns seg$clonal = NA because there is only one number in the variable.
Because this reason, in line 173 if (seg$clonal == "no"), the above error appears and the program finishes incomplete.

The solutions I propose are:

In line 126: Change if (nrow(BBloh) > 0) by if (nrow(BBloh) > 1) and in line 131 else if (nrow(BBloh) == 0)by else if (nrow(BBloh) <= 1). So you don't have to calculate the sd of just one number.
Or change lines 80 and 84 so if (nrow(BBloh) == 1){seg$CNA == "no"}. Thus, it enters in the else statement of line 138 without errors.

My BBloh looks like:

   chr startpos   endpos       BAF      pval        LogR     ntot
10   9 44891284 44900981 0.7768091 0.5352291 -0.05235313 1.919503
   nMaj1_A nMin1_A frac1_A nMaj2_A nMin2_A frac2_A SDfrac_A
10       1       0       1      NA      NA      NA       NA
   SDfrac_A_BS frac1_A_0.025 frac1_A_0.975
10          NA            NA            NA

And SAMPLEsegs:

   sampleID chrom arm start.pos   end.pos n.probes    mean
1 2102635TD     X   p   2600150  58563072   375739 -0.1034
2 2102635TD     X   q  61690452 155255094   636530 -0.1074

It is very likely that the error occurred because the case doesn't have a CNA, however I believe that this situation should be contemplated (or at least given a proper error).

Unable to download Reference files GRCh37

Hello,

I am unable to download a few files from https://ora.ox.ac.uk/objects/uuid:2c1fec09-a504-49ab-9ce9-3f17bac531bc
for example battenberg_wgs_replic_correction_1000g_v3.tar.gz.
Is there another way to access these files?

Many Thanks,
Rashesh

Build in check whether foreach has completed for a chromosome

only 0 for all count values

Hi,

For some reasons I am only getting 0s at tall the positions for all the bases in all the [_alleleFrequencies_chr.txt] files. My reads are mapped to hg19, aren't the coordinates in ref files for hg38 by any chance? Or I am not really sure what I might be doing wrong.

Best
Mo

allelecounter.exe

Hello. Where and how can one get access to the allelecounter.exe that is required for this to run?
Apologies if I'm missing something really obvious but I've been searching for a while and can't find it. There is a allelecounter.PY file here https://github.com/secastel/allelecounter but this doesn't work with the script I'm trying to use that relies on the .exe
Thanks in advance for your help
CGS (CRUK CI)

plot error at plothaplotypes.23.

Hi,

Some samples had this error below at plothaplotypes.23. It seems that chrX caused some issues. Have you guys seen this error before?

Calls: plot.haplotype.data ... create.haplotype.plot -> plot -> plot.default -> localWindow -> plot.window
In addition: Warning messages:
1: In min(mut_data$Position, na.rm = T) :
no non-missing arguments to min; returning Inf
2: In max(mut_data$Position, na.rm = T) :
no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Execution halted

Thanks,

Taka

	allele_data[,1] = gsub("chr","",allele_data[,1])
	normal_input_data[,1] = gsub("chr","",normal_input_data[,1])
	input_data[,1] = gsub("chr","",input_data[,1])

	chr_names=unique(impute.info$chrom)
	# Subset for a particular chromosome
	if (!is.na(chrom)) {
	impute.info = impute.info[impute.info$chrom==chr_names[chrom],]

wedge-lab / battenberg Goto Github PK

battenberg's People

Contributors

Stargazers

Watchers

Forkers

battenberg's Issues

Recommend Projects

Recommend Topics

Recommend Org