wedge-lab / battenberg Goto Github PK
View Code? Open in Web Editor NEWBattenberg R package for subclonal copynumber estimation
License: GNU Affero General Public License v3.0
Battenberg R package for subclonal copynumber estimation
License: GNU Affero General Public License v3.0
Hi,
Is it possible to run the battenberg pipeline over ASCAT or ASCATngs output? For example I have the allele counts for both the tumor and the normal sample from ASCATngs, could I just feed it in to the battenberg
function and run the pipeline from there?
I tried doing so, with the code below but I get an error message telling me that the unused argument (normal_name = "MCF10A")
. When I remove the normal_name
parameter, it fails and gives this error Error in prepare_wgs(chrom_names = chrom_names, tumourbam = tumour_data_file, : argument "normalname" is missing, with no default
. I tried feeding the battenberg function analysis = "cell_line"
, but I get the error unused argument (analysis= "cell_line")
.
Any help would be greatly appreciated, thanks!
Code to try and run battenberg pipeline:
battenberg(tumourname= 'H_LS-BH-A208-01A-11D-A159-09', tumour_data_file = '~/tmpAscat/ascat/H_LS-BH-A208-01A-11D-A159-09_alleleCounts.tab', normal_name ='MCF10A', normal_data_file = '~/tmpAscat/ascat/MCF10A_0_EDHG200003860-1a_H2L5YDSXY_L3.count' , ismale=FALSE, imputeinfofile='~/battenberg/reference/hg38/imputation/impute_info.txt', g1000prefix='~/battenberg/reference/hg38/1000G_loci/1kg.phase3.v5a_GRCh38nounref_loci_chr', g1000allelesprefix='~/battenberg/reference/hg38/1000G_loci/1kg.phase3.v5a_GRCh38nounref_allele_index_chr', gccorrectprefix='~/battenberg/reference/hg38/GC_correction/1000G_GC_chr', repliccorrectprefix='~/battenberg/reference/hg38/replicationtimings/1000G_RT_chr', problemloci='~/battenberg/reference/hg38/problem_loci/probloci.txt.gz', data_type='WGS', skip_allele_counting=T, skip_preprocessing=F, skip_phasing=F, prior_breakpoints_file=NULL)
Warmest regards,
Hannan
Determine a seed and use that when randomising the alleles and when running impute. Save it to a file such that a rerun of BB will yield the exact same results. This negates the necessity to keep all the output.
Hey guys!
sv_breakpoints_file = NULL
or sv_breakpoints_file = NA
throws an error as neither NULL == "NA"
or NA == "NA"
returns a TRUE/FALSE statement in R for the if statement if (!is.null(sv_breakpoints_file) & !sv_breakpoints_file == "NA")
.
Hope you are all well!
George
Hi,
A scientist in our group has found a case where the cellular fraction (frac2_A) of the nMaj2_A and nMin2_A columns has a higher value than frac1_A from the callSubclones subroutine (version 2.2.5). We would be very grateful if you could explain why this is the case.
For example:
zcat subclones.txt.gz | awk 'BEGIN{ FS=OFS="\t" }{ print $1,$2,$3,$8,$9,$10,$11,$12,$13 }' | column -t | less -S
chr startpos endpos nMaj1_A nMin1_A frac1_A nMaj2_A nMin2_A frac2_A
...
12 19462666 19698320 3 0 0.381011741894733 3 1 0.618988258105267
...
12 37866866 83372701 2 1 0.431746072237525 3 1 0.568253927762475
...
12 83501125 133836024 2 0 0.332939966391955 2 1 0.667060033608045
...
Are these fractions supposed to be listed with major fraction on the left and minor fraction on the right, which is not what we are seeing here?
Regards
Kathryn
segmented LogR is now created within fitcopynumber. This means it is redone each time a copy number profile is (re)fitted. Moving this step to a separate function (or making it part of the segmentation output) would reduce runtime there. This could possibly be combined with moving the last two steps to only use the segmented data, removing the need for the raw data to be read in (ticket #5).
Hi,
Thanks for the great package.
I was keep having the error saying "subscript out of bounds" while running prepare_wgs -> getBAFsAndLogRs
.
This is the stacktrace:
Error in as.matrix(x)[i] : subscript out of bounds
In addition: There were 11 warnings (use warnings() to see them)
Enter a frame number, or 0 to exit
1: source("battenberg_wgs_debug.R")
2: withVisible(eval(ei, envir))
3: eval(ei, envir)
4: eval(ei, envir)
5: battenberg_wgs_debug.R#79: battenberg(tumourname = TUMOURNAME, normalname =
6: prepare_wgs(chrom_names = chrom_names, tumourbam = tumour_data_file, normal
7: getBAFsAndLogRs(tumourAlleleCountsFile.prefix = paste(tumourname, "_alleleF
8: mutant_data[cbind(1:len, allele_data[, 3])]
9: `[.data.frame`(mutant_data, cbind(1:len, allele_data[, 3]))
Debugging it showed me that in line 62 of prepare_wgs.R
had a wrong variable:
normal_input_data = normal_input_data[chrpos_tumour %in% matched_data,]
Here, it seems that chrpos_tumour
needs to be chrpos_normal
for it to be match in dimensions.
If it is not written as chrpos_tumour
on purpose, please fix the error. Thanks!
Hello,
This question is regarding the [samplename]_subclones_chr*.png. Is it the BAF what those plots are showing?
Many thanks
Jorge
Hi there,
The reference file ALL.v1a.shapeit2_integrated_chrX_nonPAR.GRCh38.20181129.phased.hap.gz
provided here appears to be empty.
This caused an error in impute2
, which did not get picked up by Battenberg:
ERROR: There are no type 2 SNPs after applying the command-line settings for this run, which makes it impossible to perform imputation. One possible reason is that you have specified an analysis interval (-int) that contains reference panel SNPs but not inference panel SNPs -- e.g., this can happen at the ends of chromosomes. Another possibility is that your genotypes and the reference panel are mapped to different genome builds, which can lead the same SNPs to be assigned different positions in different panels. If you need help fixing this error, please contact the authors.
and so the CN profiles generated had a large chunk of X with no breakpoints across a fairly large cohort of samples.
I assume this file should not be empty, so would it be possible to provide the correct version?
Thanks so much!
Hi there,
Getting an error in the callsubclones step:
Error in t.test.default(logR[logR$Chromosome == subclones$chr[i - 1] & :
not enough 'x' observations
Calls: callSubclones -> merge_segments -> t.test -> t.test.default
Not sure if it is related, but the fitcn step mentions that 'reference segment did not provide a possible solution':
[1] "ploidy=1.78246500278682,rho=0.81,goodness=84.6365342478127,percentzero=0.292220227244316, perczerAbb=0.292220227244316"
[1] "ploidy=3.51630145168029,rho=0.68,goodness=77.8766478542867,percentzero=0.289668502443199, perczerAbb=0.289668502443199"
[1] "goodnessOfFit from grid=0.52666849231385"
[1] "reference segment did not provide a possible solution"
I have this pipeline working for the majority of my samples, but would like to know if there is a way to resolve these sorts of issues. It results in the finalise step failing as it can't find the subclones.txt file. Are these samples for which there is simply no solution?
Any help would be greatly appreciated! Thanks!
Lines 54 to 56 in 82553a9
Hi,
Note these 'gsub' lines in 2.2.10 are causing a bug in running hg38 with "chr" prefix contigs. These lines are removing "chr" and causing the subsequent line in gc.correct.wgs to fail to match up entries between Tumor_LogR and GC_data.
locimatches = match(x = paste0(Tumor_LogR$Chromosome, "",
Tumor_LogR$Position), table = paste0(GC_data$chr, "",
GC_data$Position))
Hello,
In an effort to implement Battenberg into a WGS pipeline that is deployable across various systems, one of the first issues I encountered was the failing build recipe used in the current Dockerfile.
When following the directions in the README, this is the outcome from the build command:
$ docker build -t battenberg:2.2.9 .
[+] Building 334.6s (13/21)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.58kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:16.04 1.2s
=> [auth] library/ubuntu:pull token for registry-1.docker.io 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 1.43MB 0.1s
=> [ 1/16] FROM docker.io/library/ubuntu:16.04@sha256:e74994b7a9ec8e2129cfc6a871f3236940006ed31091de355578492ed140 4.2s
=> => resolve docker.io/library/ubuntu:16.04@sha256:e74994b7a9ec8e2129cfc6a871f3236940006ed31091de355578492ed140a3 0.0s
=> => sha256:0ba7bf18aa406cb7dc372ac732de222b04d1c824ff1705d8900831c3d1361ff5 527B / 527B 0.2s
=> => sha256:e74994b7a9ec8e2129cfc6a871f3236940006ed31091de355578492ed140a39c 1.42kB / 1.42kB 0.0s
=> => sha256:edf232ee7dc18c57c063bc83533ef2c03c6dfae55a0124f7d372aed51cd1d9c8 1.15kB / 1.15kB 0.0s
=> => sha256:8185511cd5ad68f14aee2bac83a449a6eea2be06f0a4715b008cfe19f07a64f7 3.36kB / 3.36kB 0.0s
=> => sha256:4007a89234b4f56c03e6831dc220550d2e5fba935d9f5f5bcea64857ac4f4888 45.96MB / 45.96MB 1.2s
=> => sha256:5dfa26c6b9c9d1ccbcb1eaa65befa376805d9324174ac580ca76fdedc3575f54 852B / 852B 0.3s
=> => sha256:4c6ec688ebe374ea7d89ce967576d221a177ebd2c02ca9f053197f954102e30b 169B / 169B 0.3s
=> => extracting sha256:4007a89234b4f56c03e6831dc220550d2e5fba935d9f5f5bcea64857ac4f4888 2.2s
=> => extracting sha256:5dfa26c6b9c9d1ccbcb1eaa65befa376805d9324174ac580ca76fdedc3575f54 0.0s
=> => extracting sha256:0ba7bf18aa406cb7dc372ac732de222b04d1c824ff1705d8900831c3d1361ff5 0.0s
=> => extracting sha256:4c6ec688ebe374ea7d89ce967576d221a177ebd2c02ca9f053197f954102e30b 0.0s
=> [ 2/16] RUN apt-get update && apt-get install -y libxml2 libxml2-dev libcurl4-gnutls-dev r-cran-rgl git libssl 60.3s
=> [ 3/16] RUN mkdir /tmp/downloads 0.3s
=> [ 4/16] RUN curl -sSL -o tmp.tar.gz --retry 10 https://github.com/samtools/htslib/archive/1.7.tar.gz && mk 33.9s
=> [ 5/16] RUN curl -sSL -o tmp.tar.gz --retry 10 https://github.com/cancerit/alleleCount/archive/v4.0.0.tar.gz && 1.7s
=> [ 6/16] RUN curl -sSL -o tmp.tar.gz --retry 10 https://mathgen.stats.ox.ac.uk/impute/impute_v2.3.2_x86_64_stati 4.1s
=> [ 7/16] RUN R -q -e 'source("http://bioconductor.org/biocLite.R"); biocLite(c("gtools", "optparse", "devtools 228.1s
=> ERROR [ 8/16] RUN R -q -e 'devtools::install_github("Crick-CancerGenomics/ascat/ASCAT")' 0.6s
[ 8/16] RUN R -q -e 'devtools::install_github("Crick-CancerGenomics/ascat/ASCAT")':
#12 0.546 > devtools::install_github("Crick-CancerGenomics/ascat/ASCAT")
#12 0.547 Error in loadNamespace(name) : there is no package called 'devtools'
#12 0.547 Calls: :: ... tryCatch -> tryCatchList -> tryCatchOne ->
#12 0.547 Execution halted
executor failed running [/bin/sh -c R -q -e 'devtools::install_github("Crick-CancerGenomics/ascat/ASCAT")']: exit code: 1
After troubleshooting, the issue is the R packages to be installed via Bioconductor fail. I believe this is due to the base R version that comes with the Ubuntu 16.04 base image is too far out of date for some of the packages and their internal dependencies.
I have attached an edited version of the Dockerfile that has a successful build recipe (NOTE: the file was given a '.txt' extension for uploading purposes and to be used this extension must be removed). The changes being updating the OS base image to Ubuntu 20.04 and using Bioconductor's BiocManager installation method.
Also, I think it should be clearly stated that this build does not work with any hg38 aligned data or the newly provided hg38 reference data. This will be brought up in subsequent issues.
Docker Engine specs:
$ docker version
Client: Docker Engine - Community
Cloud integration: 1.0.9
Version: 20.10.5
API version: 1.41
Go version: go1.13.15
Git commit: 55c4c88
Built: Tue Mar 2 20:13:00 2021
OS/Arch: darwin/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.5
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 363e9a8
Built: Tue Mar 2 20:15:47 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Add a function like below and use that to write out data files:
writeGzFile <- function(){
gz1 = gzfile("df1.gz","w");
write(out1, gz1);
close(gz1)
}
Source: http://stackoverflow.com/questions/14225566/write-a-gzip-file-from-data-frame
Hello,
I was wondering if there is a Battenberg reference area for GRCh38, or a procedure to generate the relevant files.
Many thanks
Jorge
Hello,
At some point in my Battengerg run I get the following error:
Error in [.data.frame
(GC_data, , 2 + maxGCcol_short) :
undefined columns selected
Calls: gc.correct.wgs -> [ -> [.data.frame
In addition: Warning message:
In cor(GC_data[flag_nona, 3:ncol(GC_data)], Tumor_LogR[flag_nona, :
the standard deviation is zero
I was wondering if you could please help me to debug the error.
Many thanks
Jorge
Hello,
I was wondering how to run refit using the refit suggestions generated in the initial run?
Also what is the difference between the *subclones_1.txt and *subclones.txt outputs.
Many Thanks,
Rashesh
Hello Dave,
I was wondering if you could let me know which step of battenberg.pl produces these two files:
<tumour_sample>_battenberg_cn.vcf.gz
<tumour_sample>_battenberg_cn.vcf.gz.tbi
The reason for asking is that I cannot find them and yet the pipeline seems completed.
Many thanks
Jorge
The workflow could be changed to remove the need to read in data at various steps.
The workflow could exist of essentially 4 main blocks:
These can be implemented in 4 separate R scripts that can be called by cgpBattenberg
Hi,
I hope you can help me to understand the output from battenberg.
In the output of the call subclones part of the pipline a "segmentation file" is produced. This file has the clonal / subclonal copy number for each segment in the genome.
Here is a section that deomonstrates my confusion
chr startpos endpos BAF pval LogR ntot nMaj1_A nMin1_A frac1_A
21 5066290 5377969 0.534632035 0.203264531 0.073106415 2.109409229 1 1 1
21 5379291 13227129 0.790123457 0.090253745 0.064570678 2.096525116 2 1 1
21 13227353 46672988 0.5 1 0.004959692 2.008640488 1 1 1
Put most simply for regions with no called clonality (eg row 2) why is the sum of the major and minor alleles not equal to ntot. What does the ntot represent? How should there differences be reconciled?
many thanks
Hi,
There is a bug in the below lines causing hg38 (contigs with chr prefix) to fail.
Lines 68 to 71 in 604a5d2
Specifically, this line: chr_names = unique(impute.info$chrom)
makes a list of unique chromosomes names and stores it in chr_names.
But this line accesses chr_names[chrom] using the chromosome name, but chr_names is not a named vector.
For example:
> chr_names
[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9"
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chr21" "chr22" "chrX"
> chrom
[1] "chr10"
> chr_names[chrom]
[1] NA
Is there a version of Battenberg that can handle the "chr" prefixed contig names, specifically for hg38?
Thanks.
Hi,
I do not see the RT_correction_hg38.zip file in the https://ora.ox.ac.uk/objects/uuid:08e24957-7e76-438a-bd38-66c48008cf52 directory. I have some older copy of this (downloaded from some Dropbox a year ago). Is it just missing, or is it no longer being used?
Best wishes,
Paweł
Hi,
I can't download 1000G file. Seems the link is broken.
Hi,
As I posted at cancerit/cgpBattenberg#94, would you be able to add options(scipen = 999) to the battenberg R codes so that users don't have to change the default Rprofile file?
Thanks,
Taka
These and a few more reference files can move to the SNP6 ref file.
Hello,
With the recent release of the hg38 reference data, I have been working to run Battenberg with my hg38 aligned BAM files but have run into multiple issues so I wanted to make others aware.
First, I recommend those using looking to use the hg38 reference data to download them from the Dropbox site https://www.dropbox.com/sh/bize1n830t0mgzb/AACuzQiHbjQGqIuTJC3Xahzda?dl=0 instead of the ORA site as there seemed to be an issue with completing the download of the beagle5.zip
. Also, the ORA site is completely missing the shapeit2
files that are used in the impute_info.txt
file. Lastly for this point, the README.txt for the reference data is incomplete (there needs to be symlinks for the chrX files in the 1000G_loci_hg38
directory)
Second, the current version of the master
branch (which is the default branch for those cloning the repo) does not support the new hg38 reference data. The hg38_ChrNotation_fix_NAP
branch has implemented a fix for working with the hg38 reference data that will allow you to get past the first step (the prepare_wgs
function call in the battenberg.R
script) correctly.
Third, when using the hg38_ChrNotation_fix_NAP
branch, the battenberg_wgs.R
script requires extensive rewriting of default variables for compatibility with the hg38 reference data. The "..../" is used here as a placeholder for some user defined parent directory. The second half of the variables are specific for the use of Beagle5. Important notes for this part include: running Beagle5 requires Java 8, the JAR executable file must be downloaded https://faculty.washington.edu/browning/beagle/b5_0.html, all the reference files downloaded from the beagle5.zip
need to moved into a new subdirectory that matches the GENOME_VERSION string (i.e. ..../beagle5/b38
), and the JAR file must be in the ..../beagle5
directory.
IMPUTEINFOFILE = "..../imputation/impute_info.txt"
G1000PREFIX = "..../1000G_loci_hg38/1kg.phase3.v5a_GRCh38nounref_allele_index_chr"
G1000PREFIX_AC = "..../1000G_loci_hg38/1kg.phase3.v5a_GRCh38nounref_loci_chrstring_chr"
GCCORRECTPREFIX = "..../GC_correction_hg38/1000G_GC_chr"
REPLICCORRECTPREFIX = "..../RT_correction_hg38/1000G_RT_chr"
PROBLEMLOCI = "..../probloci.txt.gz"
GENOME_VERSION = "b38"
BEAGLE_BASEDIR = "..../beagle5"
BEAGLEJAR = file.path(BEAGLE_BASEDIR, "beagle.12Jul19.0df.jar")
BEAGLEREF.template = file.path(BEAGLE_BASEDIR, GENOME_VERSION, "chrCHROMNAME.1kg.phase3.v5a_GRCh38nounref.vcf.gz")
BEAGLEPLINK.template = file.path(BEAGLE_BASEDIR, GENOME_VERSION, "plink.chrCHROMNAME.GRCh38.map")
Fourth, there is a hardcoded variable in the battenberg
function within the battenberg.R
called "GENOMEBUILD" which is set to "hg19". This must be edited in place to "hg38".
Finally, even after all this, I have still been unsuccessful in having the pipeline execute 100% properly. I will detail the current issue in a new thread. I am grateful that the developers have produced the hg38 reference files, I just wish there also would have been a warning that they aren't fully ready for use in the current version of the software.
Hello,
Do you currently offer reference files for hg38? If not, would you recommend just using Liftover?
Thank you!
Hi,
We have a case where no copy number solutions have been found for a data set, with the appropriate message written to the cnaStatusFile. However, we then get an error when trying to produce the sunrise plot:
Error in if (psi_opt1 > 0 && rho_opt1 > 0) { : missing value where TRUE/FALSE needed
Calls: fit.copy.number -> runASCAT -> <Anonymous>
In addition: Warning message:
In log(d) : NaNs produced
Should the code exit if no copy number solutions have been found and not try to create a sunrise plot? If it is possible to produce a valid sunrise plot in this case, would it possible for you to suggest a cause for the error we are seeing?
Regards
Kathryn
Hi,
when running the call_subclones command of the Battenberg pipe line two sublcones files are generated.
subclones_1.txt (generated first and usually with more CN segments
sublones.txt
this appears to be independent of the number of copy number solutions for that sample.
What is the difference?
many thanks
Jamie
Hello,
I am getting the below error while building the docker image.
Jorge
Step 13/20 : RUN R -q -e 'install.packages("/opt/battenberg", repos=NULL, type="source")'
---> Running in 6d840fc92d87
install.packages("/opt/battenberg", repos=NULL, type="source")
Installing package into '/usr/local/lib/R/site-library'
(as 'lib' is unspecified)
Warning: invalid package '/opt/battenberg'
Error: ERROR: no packages specified
Warning message:
In install.packages("/opt/battenberg", repos = NULL, type = "source") :
installation of package '/opt/battenberg' had non-zero exit status
Hello,
While attempting to run hg38 aligned BAMs through Battenberg (hg38_ChrNotation_fix_NAP
branch) using the hg38 reference data. I encountered an error that stems from the write_battenberg_phasing
function which halted execution. The log file which I've attached dictates the following:
Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
Calls: battenberg ... write_battenberg_phasing -> merge -> merge.data.frame -> fix.by
To try and troubleshoot the problem, I worked through the possible lines that could be the cause.
The bafsegmented_file as well as all SNPfiles and imputedHaplotypeFiles exist and are appear to be populated correctly.
And what makes things more confusing is that the output VCF files from this function all seem to have been produced (filename convention - [tumorname]Battenberg_phased[chrom].vcf) and are populated.
Hello,
I was wondering if you could please help me to interpret the Battenberg output, which has given me two subclones. I understand the _A columns in the _subclones.txt match somehow segmetation plot in BattenbergProfile_subclones.png.
As I have two subclones, am I right saying that the columns nMaj1_A, nMin1_A contain the CN segments for one clone and the columns nMaj2_A, nMin2_A contain the CN for the other clone?
Then in the BattenbergProfile_subclones.png I see two different colours (as in ASCAT) and then thick and thin lines. Do the thick lines corrrespond to one clone and the thin ones to the other?
Thanks so much
Jorge
The current pipeline returns a non-informative error when a reference file does not exist. The dopar foreach loop now yields:
Error in { : task 1 failed - "cannot open the connection" dopar
BB should not have the same code that is included into ASCAT. There are some issues when loading both packages at the moment as functions are overloaded.
Currently the recommended procedure is to load Battenberg before ASCAT if ASCAT is to be used and ASCAT before Battenberg when BB is to be used.
Hi,
This is very rare but a few samples had this issue below when calling callSubclones().
Error in if (nMinor < 0) { : missing value where TRUE/FALSE needed
Calls: callSubclones -> determine_copynumber
This may be due to ploidy NA (FRAC_GENOME) in sample_rho_and_psi.txt?
rho psi ploidy distance is.best
ASCAT 1 2.15 2.10143838105564 NA NA
FRAC_GENOME 0.95 NA 2.09906082321258 0 FALSE
REF_SEG 1 2 2 1 TRUE
Thanks,
Taka
fitcopynumber could possibly run from just the segmented data and it's worth exploring whether this is possible. It would mean the raw data doesn't have to be read in and is not needed after impute in general.
Hello:
I just some results files from Batternberg algorithm. Is there any instruction or readme file the interpret the result? here is all the files, do you think the program is completed or not? Thanks
-rw-r--r-- 1 zhangt8 zhangt8 201M Jul 18 01:02 GT01062_2017_allelecounts.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 212M Jul 18 01:03 GT01062_2018_allelecounts.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 9.8M Jul 18 01:03 GT01062_2018_hetbaf.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 9.5M Jul 18 01:03 GT01062_2018_hetdata.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 369M Jul 18 01:05 GT01062_2018_impute_input.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 362M Jul 18 01:06 GT01062_2018_impute_output.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 18M Jul 18 01:07 GT01062_2018_logR_Baf_segmented.vcf.gz
-rw-r--r-- 1 zhangt8 zhangt8 1.2M Jul 18 01:07 GT01062_2018_logR_Baf_segmented.vcf.gz.tbi
-rw-r--r-- 1 zhangt8 zhangt8 3.0M Jul 18 01:03 GT01062_2018_other.tar.gz
-rw-r--r-- 1 zhangt8 zhangt8 3.3M Jul 18 01:03 GT01062_2018_rafseg.tar.gz
drwxr-xr-x 2 zhangt8 zhangt8 4.0K Jul 18 01:03 GT01062_2018_subclones/
-rw-r--r-- 1 zhangt8 zhangt8 4.4M Jul 18 01:03 GT01062_2018_subclones.tar.gz
drwxr-xr-x 14 zhangt8 zhangt8 512K Jul 18 01:07 tmpBattenberg/
Also, I got an error information during running:
Use of uninitialized value in concatenation (.) or string at /opt/wtsi-cgp/lib/perl5/Sanger/CGP/Battenberg/Implement.pm line 1, do you know what's issue here?
Thanks.
The readr package is quite a bit quicker when reading in data. Part of this replace should be to put all the data parsers into a separate R file.
infile = "3e8a2c90-e747-4a22-bc9e-0b062479fec2_alleleFrequencies_chr4.txt"
system.time(read.table(infile, header=T, stringsAsFactor=F))
user system elapsed
18.045 0.364 18.562
system.time(read.delim(infile, header=T, stringsAsFactor=F, sep="\t"))
user system elapsed
22.165 0.744 22.872
library(readr)
system.time(read_tsv(infile, col_names=T))
user system elapsed
12.605 0.108 12.724
Hello,
I have done different Battenberg runs where I get either one or two 'X' in the sunrise plot. I was wondering if you could help me to understand how the number of 'X' links to the solutions in the subclones.txt.
Specifically I have two questions, any advice on those will be much appreciated.
Please correct me if I am wrong. My understanding is that a sunrise plot with a single 'X' means that there is a single tumour cell population. Then a sunrise plot with two 'X' means that there are two tumour cell populations corresponding to two states of any of the possible solutions in the subclones.txt file.
I see however that even if there is a single 'X' in the sunrise plot the solutions in the subclones.txt include two states. Does this mean that one of the stages is negligible?
I have attached a png file showing two sunrise plots with one and two 'X'. In the plot with a single 'X' I am also showing the two states of solution A.
Many thanks
Jorge
Hello,
I am running Battenberg to get these two outputs:
[samplename]_subclones.txt, [samplename]_rho_and_psi.txt
Is there a way to skip some of the code to speed things up and get only these outputs?
Currently one run takes 5 hours or more.
Thanks,
Naama
Hi,
I was wondering whether it is possible to use Battenberg with hg19 or GRCh38? Do you provide reference file bundle?
Thanks
FYI installation fails on the latest dev commit:
Error in parse(outFile) :
/tmp/Rtmpu8a736/R.INSTALLab700ac38d/Battenberg/R/battenberg.R:307:72: unexpected ','
306: # rename the original files without multisample phasing info
307: MutBAFfiles <- paste0(tumourname[sampleidx], "_chr", chrom_names),
Hello Naser,
This is Jian. When I try to download Battenberg reference (hg38) from the following link: https://ora.ox.ac.uk/objects/uuid:08e24957-7e76-438a-bd38-66c48008cf52 . I found that beagle5.zip cannot be compressed. Did you have this problem?
Thank you.
I was not able to build the Battenberg 2.2.9 docker image sucessfully. What worked was pulling files from the github: OpenGenomics/Battenberg. After this, I was able to get the docker to run and enter the terminal.
When I run:
R CMD BATCH '--no-restore-data --no-save --args -t P2055N -n P1907T --nb /mnt/battenberg/P1907T.final.bam --tb /mnt/battenberg/P2055N.final.bam --sex female -o /mnt/battenberg/' /usr/local/bin/battenberg_wgs.R /mnt/battenberg/battenberg.Rout
I get the output:
'/usr/lib/R/bin/BATCH: 60: cannot create /mnt/battenberg/battenberg.Rout: Permission denied'
Does anyone have a solution to get permission to get permission to write the output?
Hi,
As sometimes purity seems to be overestimated, I tried max_rho = 0.9 but it caused the issue below.
Error in axis(2, at = seq(0, 1/purity_max, by = 1/3/purity_max), labels = seq(purity_min, :
'at' and 'labels' lengths differ, 4 != 3
Calls: fit.copy.number -> runASCAT -> -> axis
I'm currently testing min_rho = 0, which may fix the issue but I think the axis labels should handle different min max rho values...?
Thanks,
Taka
Hello,
I have been trying to run Battenberg with hg38 and have been unable to do so with the master branch. I was looking through the comments and saw that a couple of months ago @pblaney recommended hg38_ChrNotation_fix_NAP but he still had issues.
Is there a stable version for hg38? What branch should I clone?
Hello,
I'm running Battenberg for hg19 with a case that doesn't seem to contain any copy number alteration. However, I'm not be able to complete the analysis because of the following error: Error in { : task 1 failed - "missing value where TRUE/FALSE needed"
.
I tracked the error to the callChrXsubclones
function. I'll try to explain the reason and the two possible solutions I've thought of, but I don't know if they are correct or maybe something else needs to be changed:
nrow(BBloh) = 1
in line 62BBloh
remains 1seg$type = "loss"
according to seg$mean
and in line 84 seg$CNA = "yes"
Up to this point, the declaration of the variables seems to be correct.
seg$CNA = "yes"
, it enters in the if statement of line 87, where the problem appears in line 126. The code of this part is:else if (seg$type == "loss") {
seg$CN = 0
if (nrow(BBloh) > 0) {
seg$clonal = ifelse(round(abs(explogrLoss - seg$mean), digits = 2) < round(abs(sd(BBloh$LogR)/explogrLoss), digits = 2), "yes", "no")
}
else if (nrow(BBloh) == 0) {
seg$clonal = ifelse(round(abs(explogrLoss - seg$mean), digits = 2) < round(abs(BBsd_max/explogrLoss), digits = 2), "yes", "no")
}
}
if
, but when the function tries to calculate the sd(BBloh$LogR)
, it returns seg$clonal = NA
because there is only one number in the variable.if (seg$clonal == "no")
, the above error appears and the program finishes incomplete.The solutions I propose are:
if (nrow(BBloh) > 0)
by if (nrow(BBloh) > 1)
and in line 131 else if (nrow(BBloh) == 0)
by else if (nrow(BBloh) <= 1)
. So you don't have to calculate the sd of just one number.if (nrow(BBloh) == 1){seg$CNA == "no"}
. Thus, it enters in the else statement of line 138 without errors.My BBloh looks like:
chr startpos endpos BAF pval LogR ntot
10 9 44891284 44900981 0.7768091 0.5352291 -0.05235313 1.919503
nMaj1_A nMin1_A frac1_A nMaj2_A nMin2_A frac2_A SDfrac_A
10 1 0 1 NA NA NA NA
SDfrac_A_BS frac1_A_0.025 frac1_A_0.975
10 NA NA NA
And SAMPLEsegs:
sampleID chrom arm start.pos end.pos n.probes mean
1 2102635TD X p 2600150 58563072 375739 -0.1034
2 2102635TD X q 61690452 155255094 636530 -0.1074
It is very likely that the error occurred because the case doesn't have a CNA, however I believe that this situation should be contemplated (or at least given a proper error).
Hello,
I am unable to download a few files from https://ora.ox.ac.uk/objects/uuid:2c1fec09-a504-49ab-9ce9-3f17bac531bc
for example battenberg_wgs_replic_correction_1000g_v3.tar.gz.
Is there another way to access these files?
Many Thanks,
Rashesh
Hi,
For some reasons I am only getting 0s at tall the positions for all the bases in all the [_alleleFrequencies_chr.txt] files. My reads are mapped to hg19, aren't the coordinates in ref files for hg38 by any chance? Or I am not really sure what I might be doing wrong.
Best
Mo
Hello. Where and how can one get access to the allelecounter.exe that is required for this to run?
Apologies if I'm missing something really obvious but I've been searching for a while and can't find it. There is a allelecounter.PY file here https://github.com/secastel/allelecounter but this doesn't work with the script I'm trying to use that relies on the .exe
Thanks in advance for your help
CGS (CRUK CI)
Hi,
Some samples had this error below at plothaplotypes.23. It seems that chrX caused some issues. Have you guys seen this error before?
Calls: plot.haplotype.data ... create.haplotype.plot -> plot -> plot.default -> localWindow -> plot.window
In addition: Warning messages:
1: In min(mut_data$Position, na.rm = T) :
no non-missing arguments to min; returning Inf
2: In max(mut_data$Position, na.rm = T) :
no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Execution halted
Thanks,
Taka
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.