Giter VIP home page Giter VIP logo

codex2's Introduction

CODEX2

Full-spectrum copy number variation detection by high-throughput DNA sequencing

Author

Yuchao Jiang, Nancy R. Zhang

Maintainer

Yuchao Jiang [email protected]

Description

High-throughput DNA sequencing enables detection of copy number variations (CNVs) on the genome-wide scale with finer resolution compared to array-based methods, but suffers from biases and artifacts that lead to false discoveries and low sensitivity. We describe CODEX2, a statistical framework for full-spectrum CNV profiling that is sensitive for variants with both common and rare population frequencies and that is applicable to study designs with and without negative control samples. We demonstrate and evaluate CODEX2 on whole-exome and targeted sequencing data, where biases are the most prominent. CODEX2 outperforms existing methods and, in particular, significantly improves sensitivity for common CNVs.

Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("CODEX")
BiocManager::install("WES.1KG.WUGSC")

install.packages('devtools')
devtools::install_github("yuchaojiang/CODEX2/package")

Questions?

If you have questions or encounter problems when using CODEX2, you can: (1) report directly here using the Issues tab by GitHub; (2) post in our Google user group https://groups.google.com/d/forum/codex2; (3) email us at [email protected].

Citation

Yuchao Jiang, Runjin Wang, Eugene Urrutia, Ioannis N. Anastopoulos, Katherine L. Nathanson, Nancy R. Zhang, 2018. CODEX2: full-spectrum copy number variation detection by high-throughput DNA sequencing. Genome Biology, 19 (1), 202, 2018. (link).

Running CODEX2

The figure below illustrates the two experimental designs for which CODEX2 can be applied: (i) case-control design with a group of negative control samples, where the goal is to detect CNVs disproportionately present in the ‘cases’ versus the ‘controls’; and (ii) detection of all CNVs present in all samples design, such as in the Exome Aggregation Consortium. The key innovation in CODEX2 is the usage of negative control genome regions in a genome-wide latent factor model for sample- and position-specific background correction, and the utilization of negative control samples, under a case-control design, to further improve background bias estimation under this model. The negative control genome regions defined by CODEX2 are regions that do not harbor common CNVs, but that are still allowed to harbor rare CNVs, and can be constructed from existing studies or learned from data.

R notebook with step-by-step demonstration is available here as html.

Demo code for CODEX2 is available here as Rmd.

IMPORTANT: CODEX2 for cancer genomics

  • In segmentation step, use fractional mode for somatic CNA detection (cancer is heterogenous) and interger mode for germline CNV detection (you will get CNV calls in your blood samples, which are germline).
  • For segmentation with paired tumor-normal experimental design, a modified CBS (circular binary segmentation) algorithm can be adopted, which ultilizes the pair information. Refer to the paired_tumor_normal_segmentation folder for code (not actively updated/maintained). Note that, from our experience, the default segmentation by CODEX2 (not using the pair information) does not make much difference. Normalization is the first order effect in WES study design.

CODEX2 for targeted sequencing

We've adapted CODEX2 for targeted sequencing. Instead of normalizing and segmenting each chromosome separately, for targeted sequencing, we combine all targets across the genome to perform normalization, followed by segmentation within each gene. Refer to codes below (need to source segment_targeted.R for gene-based segmentation).

Visualization by IGV

One can load CODEX2's CNV calling results into IGV for visualization by generating a tab-delimited seg file for each sample. Below is a sample code that we use in our daily practice -- for each sample, a *.seg.txt file is generated with six columns and header 'Sample', 'Chromosome','Start','End','Num_Probes','Segment_Mean', which correspond to sample name, chromosome, CNV start bp, CNV end bp, number of exonic targets, and log ratio of raw (i.e. observed) depths of coverage versus normalized (i.e. expected) coverage (deletion has a negative log ratio, duplication has a positive log ratio, copy-neutral region has a log ratio around 0).

CODEX2 for hg38?

CODEX2 by default is for hg19 reference. It can be adapted to hg38: only the calculations of GC content and mappability need to be changed; to get coverage for exons across samples stays the same (make sure that the exonic targets in the bed file are also in hg38 coordinates). To calculte GC content in hg38, you need to download the hg38 reference from Bioconductor. Then, after loading CODEX2, load the hg38 reference package and use the correct genome argument in the getgc() function to get the corresponding GC content.

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("BSgenome.Hsapiens.UCSC.hg38")

library(CODEX2)
library(BSgenome.Hsapiens.UCSC.hg38)
# The following object is masked from ‘package:BSgenome.Hsapiens.UCSC.hg19’:  Hsapiens

gc <- getgc(ref, genome = BSgenome.Hsapiens.UCSC.hg38)

For mappability, we download the 100mer mappability for hg19 from the ENCODE Project (link) and lifted over from hg19 to hg38 (link). The mappability for each exon/target/bin is taken as the mean mappability across all overlapped segments by ENCODE, weighted by the lengths of the segments.

Note that CODEX2 can also be adapted to the mouse genome, see below.

CODEX2 for mouse genome

CODEX2 can be applied to WES of the mouse genome. Only the calculation of GC content and mappability needs to be modified from the default (hg19). The library for the mm10 mouse genome sequencing needs to be loaded: BSgenome.Mmusculus.UCSC.mm10.

  • GC content can be calculated with the correct mouse genome:
library(BSgenome.Mmusculus.UCSC.mm10)
gc <- getgc(ref, genome = BSgenome.Mmusculus.UCSC.mm10)
  • Mappability pre-calculation: This step can be computationally extensive and thus parallel computing is recommended. There are two workarounds: 1) set all mappability to 1 using mapp=rep(1,length(gc)) since mappability is only used in the QC step to filter out exons with low mappability and thus should not affect the final output too much; 2) adopt QC procedures based on annotation results, e.g., filter out all exons within segmental duplication regions, which generally have low mappability.

Common questions

  • How many samples does CODEX2 need? Should I separately run samples from different batches?

    We have applied CODEX2 to data sets of sample size ranging from 30 to 500. Yes, samples from different batches are highly recommended to run separately. The Poisson latent factor can presumably capture the batch effects but if additional knowledge is available beforehand, it should be ultilized. If batch information is not available, sometimes we refer to the header within the bam files.

  • Error in glm.fit?

    Yes, we are aware that sometimes the normalize() function leads to error in glm.fit. CODEX2 adopts an iterative estimation procedure to estimate the Poisson latent factors via Poisson glm, the exon-specific bias by taking the median across all samples, and the GC content bias by fitting a non-parametric smooth.spline. We did our best to make sure that the iteration/estimation runs properly, yet sometimes the Poisson glm function in R still fails to converge due to: (1) extreme heterogeneity in the data (i.e., the data is just too noisy or the samples are from multiple batches); (2) the number of Poisson latent factors to estimate is too large. See question below.

  • What is the range of K? Which one is optimal?

    Based on our experience, very rarely do we run CODEX2 with greater than 10 latent factors (i.e., K = 1:10 suffices for most, if not all, datasets we have). The larger the K is, the longer the estimation takes. Also, refer to the previous question regarding potential pitfalls with a large value of K.

    CODEX2 includes three statistical metrics to help the users more wisely choose the optimal K: AIC, BIC, and residual variance. A pdf plot is automatically generated by the choiceofK() function. Sometimes the optimal value based on the metrics are not clear. In this case, we recommend a sanity check by focusing on known positive/negative controls and visualizing the normalization/segmentation results. For normalize2() which specifies the normal samples, the effect on different optimal K values diminishes since only the normal samples are used to estimate the exon-wise biases and latent factors.

codex2's People

Contributors

yuchaojiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

codex2's Issues

Problem with getcoverage

Hello,
I have Problems with running CODEX2. I want to use it for targeted panel CNV detection.
Somehow it does not find the coverage.

for: head(Y[,1:5]) it fives results like this:

129514_S1 136591_S9 136633_S1 136691_S3 136744_S2
1:26126702-26126925 0 0 0 0 0
1:26127514-26127672 0 0 0 0 0
1:26128487-26128629 0 0 0 0 0
1:26131613-26131787 0 0 0 0 0
1:26135051-26135301 0 0 0 0 0
1:26135497-26135662 0 0 0 0 0

which leads to an error in the next step:

Excluded NA exons due to extreme coverage.
Excluded 0 exons due to extreme exonic length.
Excluded 4 exons due to extreme mappability.
Excluded 1 exons due to extreme GC content.
After taking union, excluded NA out of 1160 exons in QC.
Error: logical subscript contains NAs

As fare as i can see the ref looks ok:

GRanges object with 1160 ranges and 3 metadata columns:
seqnames ranges strand | gc mapp gene
|
[1] 1 26126702-26126925 * | 86.16 1 1_gene_1
[2] 1 26127514-26127672 * | 54.72 1 1_gene_1
[3] 1 26128487-26128629 * | 51.75 1 1_gene_1
[4] 1 26131613-26131787 * | 61.14 1 1_gene_1
[5] 1 26135051-26135301 * | 65.34 1 1_gene_1
... ... ... ... . ... ... ...
[1156] X 153608030-153608175 * | 63.7 1 X_gene_39
[1157] X 153608282-153608400 * | 49.58 1 X_gene_39
[1158] X 153608574-153608748 * | 60.57 1 X_gene_39
[1159] X 153609093-153609183 * | 53.85 1 X_gene_39
[1160] X 153609222-153609578 * | 59.1 1 X_gene_39

The code i was running looks like this and is basically the same as the in the Demo

bambedObj <- getbambed(bamdir = bamdir, bedFile = bedFile,
sampname = sampname, projectname = "Codex_test")

bamdir <- bambedObj$bamdir

ref <- bambedObj$ref

sampname <- bambedObj$sampname

projectname <- bambedObj$projectname

gc <- getgc(ref)

mapp <- getmapp(ref)

gene=rep(NA,length(ref))

print(gene)

for(chr in as.matrix(unique(seqnames(ref)))){
chr.index=which(seqnames(ref)==chr)
gene[chr.index]=paste(chr,'gene',ceiling(chr.index/30),sep='')
}

values(ref) <- cbind(values(ref), DataFrame(gc, mapp, gene))

coverageObj <- getcoverage(bambedObj, mapqthres = 20)

Y <- coverageObj$Y

write.csv(Y, file = paste(projectname, '_coverage.csv', sep=''), quote = FALSE)
head(Y[,1:5])

qcObj <- qc(Y, sampname, ref, cov_thresh = c(20, Inf),
length_thresh = c(20, Inf), mapp_thresh = 0.9,
gc_thresh = c(20, 80))

Y_qc <- qcObj$Y_qc; sampname_qc <- qcObj$sampname_qc

does anybody have an idea what the problem could be?
thanks a lot in advance

segmentCBS error

Hi,

I am using the software on WES samples using h38.
The hole software runs well until this part:

> finalcall.CBS <- segmentCBS(Y_qc[chr.index,],
+                             Yhat, optK = which.max(BIC),
+                             K = 1:5,
+                             sampname_qc = sampname_qc,
+                             ref_qc = ranges(ref_qc)[chr.index],
+                             chr = chr, lmax = 400, mode = "integer")
Segmenting sample 1: WES_val18_1..
Error in rep(1:num, rep(lmax, num)) : invalid 'times' argument

First I tried 60 samples, in a second test I only used 7 samples.
Any ideas why I am getting this error?

Thank you for your help and kind regards,
Nicolas

getgc() function

chrtemp variable is named as chrX which causes paste('chr',chrtemp,sep='') to name it as "chrchrX" and fail.

getgc =function (ref, genome = NULL) {
  if(is.null(genome)){genome=BSgenome.Hsapiens.UCSC.hg19}
  gc=rep(NA,length(ref))
  for(chr in unique(seqnames(ref))){
    chr="chrX"
    message("Getting GC content for chr ", chr, sep = "")
    chr.index=which(as.matrix(seqnames(ref))==chr)
    ref.chr=IRanges(start= start(ref)[chr.index] , end = end(ref)[chr.index])
    if (chr == "X" | chr == "x" | chr == "chrX" | chr == "chrx") {
      chrtemp <- 'chrX'
    } else if (chr == "Y" | chr == "y" | chr == "chrY" | chr == 
               "chry") {
      chrtemp <- 'chrY'
    } else {
      chrtemp <- as.numeric(mapSeqlevels(as.character(chr), "NCBI")[1])
    }
    if (length(chrtemp) == 0) message("Chromosome cannot be found in NCBI Homo sapiens database!")
    chrm <- unmasked(genome[[paste('chr',chrtemp,sep='')]])
    seqs <- Views(chrm, ref.chr)
    af <- alphabetFrequency(seqs, baseOnly = TRUE, as.prob = TRUE)
    gc[chr.index] <- round((af[, "G"] + af[, "C"]) * 100, 2)
  }
  gc
}

I fix this on my data changing chrtemp <- 'chrX' to chrtemp <- 'X'

Let my know if this makes sense to you or it is just me!

BSgenome problem

I was following commands in the manual. Everything was fine until this command mapp <- getmapp(ref, genome = genome). I got an error message: Error in getmapp(ref, genome = genome) :
no slot of name "provider_version" for this object of class "BSgenome".

Thanks for any advice!

hg38

Hi,

is Codex2 able to handle hg38 genome ?
'cos during installation progress it means '(BSgenome.Hsapiens.UCSC.hg19)'

thx

getgc() failing on sex chromosomes

I have an issue with getgc() when I run the CODEX2 targeted demo. It runs fine with my .bed file, but fails when it gets to chrX. I have triple-checked the .bed file and verified that it runs for all lines up to the first chrX entry. The results from the pipeline (using a .bed file with only autosomal regions) look great. How can I get it to run with chrX and chrY?

Commands and output:

Initialize Variables

dirPath=getwd()
bamFile <- list.files(dirPath, pattern = '.q1plus..bam$')
bamdir <- file.path(dirPath, bamFile)
sampname <- substr(bamFile,20,27)
norm_index <- c(1,4:7,10:50)

bedFile <- list.files(dirPath, pattern = 'exons_25bp_pad.bed')

bambedObj <- getbambed(bamdir = bamdir, bedFile = bedFile, sampname = sampname, projectname = "q1plus")
bamdir <- bambedObj$bamdir; sampname <- bambedObj$sampname
ref <- bambedObj$ref; projectname <- bambedObj$projectname

ref
GRanges object with 55562 ranges and 0 metadata columns:
seqnames ranges strand

[1] chr1 948928-948981 *
[2] chr1 949338-949883 *
[3] chr1 955527-955778 *
[4] chr1 957555-957867 *
[5] chr1 970631-970729 *
... ... ... ...
[55558] chrY 14959138-14959277 *
[55559] chrY 14968239-14968446 *
[55560] chrY 14968532-14968795 *
[55561] chrY 14969465-14969611 *
[55562] chrY 14971178-14971366 *


seqinfo: 24 sequences from an unspecified genome; no seqlengths

Verify chromosome names match exactly

genome = BSgenome.Hsapiens.UCSC.hg19
head(seqnames(genome),25)
[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9"
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chr21" "chr22" "chrX" "chrY" "chrM"

unique(read.table(file="exons_25bp_pad_MSOC.bed")[,1])
[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9"
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chr21" "chr22" "chrX" "chrY"

unique(read.table(file="exons_25bp_pad_MSOC.bed")[,1]) == head(seqnames(genome),24)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Call getgc()

gc <- getgc(ref, genome = BSgenome.Hsapiens.UCSC.hg19)
Getting GC content for chr chr1
Getting GC content for chr chr2
Getting GC content for chr chr3
Getting GC content for chr chr4
Getting GC content for chr chr5
Getting GC content for chr chr6
Getting GC content for chr chr7
Getting GC content for chr chr8
Getting GC content for chr chr9
Getting GC content for chr chr10
Getting GC content for chr chr11
Getting GC content for chr chr12
Getting GC content for chr chr13
Getting GC content for chr chr14
Getting GC content for chr chr15
Getting GC content for chr chr16
Getting GC content for chr chr17
Getting GC content for chr chr18
Getting GC content for chr chr19
Getting GC content for chr chr20
Getting GC content for chr chr21
Getting GC content for chr chr22
Getting GC content for chr chrX
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'unmasked': no such sequence

codex2_targeted.R, getcoverage error

Hi,
This function return an error:

coverageObj=getcoverage(bambedObj,mapqthres=20)

> coverageObj=getcoverage(bambedObj,mapqthres=20)
Error in message("Getting coverage for chr ", chr, sep = "") :
argument "chr" is missing, with no default

?getcoverage
getcoverage {CODEX2}
(...)
getcoverage(bambedObj, mapqthres, chr)

Cheers
Damian Loska

Issues with qc function

Any ideas why I am getting this error?

Excluded 5316 exons due to extreme coverage.
Excluded 12333 exons due to extreme exonic length.
Excluded 12988 exons due to extreme mappability.
Excluded NA exons due to extreme GC content.
After taking union, excluded NA out of 217488 exons in QC.
Error: logical subscript contains NAs

An error is occurring-obtaining GC content and mappability stage

Hi,
I am using CODEX2 (version: 1.3.0). I have used BSgenome.Hsapiens.UCSC.hg38 as the reference genome.
After running the command for obtaining GC content, the following error has been occurring:

Getting GC content for chr chr1
Getting GC content for chr chr10
Getting GC content for chr chr11
Getting GC content for chr chr12
Getting GC content for chr chr13
Getting GC content for chr chr14
Getting GC content for chr chr15
Getting GC content for chr chr16
Getting GC content for chr chr17
Getting GC content for chr chr18
Getting GC content for chr chr19
Getting GC content for chr chr2
Getting GC content for chr chr20
Getting GC content for chr chr21
Getting GC content for chr chr22
Getting GC content for chr chr3
Getting GC content for chr chr4
Getting GC content for chr chr5
Getting GC content for chr chr6
Getting GC content for chr chr7
Getting GC content for chr chr8
Getting GC content for chr chr9
Getting GC content for chr chrM
Error in genome[[paste("chr", chrtemp, sep = "")]] : no such sequence
In addition: Warning message:
In getgc(ref, genome = genome) : NAs introduced by coercion

In addition, when I ran the command to obtain mappability, the following error has been occurring:

Error in getmapp(ref, genome = genome) :
no slot of name "metadata" for this object of class "BSgenome"

With reference to the above errors, can you please suggest some ways to fix the issue?

CODEX2, getbambed, wrong example

Hi,
in the example here:
https://github.com/yuchaojiang/CODEX2/blob/master/targeted_sequencing/codex2_targeted.R#L15-L20
there's an error when running this code:
library(CODEX2)

chr=1
# get bam directories, read in bed file, get sample names
bambedObj=getbambed(bamdir=bamdir,
                    bedFile=bedFile,
                    sampname=sampname,
                    projectname=projectname,chr)

Error in getbambed(bamdir = bamdir, bedFile = bedFile, sampname = sampname, :
unused argument (chr)

Should I use getbambed from CODEX?

EDIT: in the whole example file "codex2_targeted.R" the commands from CODEX and CODEX2 are mixed (This is my first time I use CODEX and/or CODEX2 so I'm a bit confused)

Cheers
Damian Loska

Need Help

Hello Dr Yuchao

I am having a few issues while running CODEX2. I have not used CODEX before and am unfamiliar with the Poisson model. I am sorry if i am asking some apparent questions.

My situation is this: I have 2 Truth Samples (samples whose CNV details I know are correct). I have a test batch of 35 samples.
I have run CODEX2 in these manners:
Run 1:
Sample BAMs taken - 35 from test batch | K = 1:5 | chr: 17
Output:

  sample_name chr cnv st_bp ed_bp length_kb st_exon ed_exon raw_cov norm_cov copy_no lratio mBIC
cnv1 Sample 1 chr17 dup 7687395 7689351 1.957 1307 1311 474 341 3 21.683 4.216
cnv2 Sample 1 chr17 del 19348563 19353709 5.147 799 800 49 104 1 18.038 4.88
cnv3 Sample 3 chr17 dup 3094629 3121197 26.569 1150 1156 2117 1548 3 84.334 66.263
cnv4 Sample 3 chr17 dup 28673501 28681954 8.454 874 875 206 129 3 19.022 68.109
cnv5 Sample 4 chr17 dup 73626838 73627110 0.273 532 533 604 446 3 21.891 4.485
cnv6 Sample 4 chr17 dup 19348563 19353709 5.147 799 800 389 279 3 18.219 5.588
cnv7 Sample 6 chr17 del 41209048 41215988 6.941 210 212 252 443 1 46.84 29.398

Run 2:
Sample BAMs taken - 35 from test batch | K = 1:10 | chr: 17
Output:

  sample_name chr cnv st_bp ed_bp length_kb st_exon ed_exon raw_cov norm_cov copy_no lratio mBIC
cnv1 Sample 1 chr17 dup 7687395 7689351 1.957 1307 1311 474 341 3 21.683 4.216
cnv2 Sample 1 chr17 del 19348563 19353709 5.147 799 800 49 104 1 18.038 4.88
cnv3 Sample 3 chr17 dup 3094629 3121197 26.569 1150 1156 2117 1548 3 84.334 66.263
cnv4 Sample 3 chr17 dup 28673501 28681954 8.454 874 875 206 129 3 19.022 68.109
cnv5 Sample 4 chr17 dup 73626838 73627110 0.273 532 533 604 446 3 21.891 4.485
cnv6 Sample 4 chr17 dup 19348563 19353709 5.147 799 800 389 279 3 18.219 5.588
cnv7 Sample 6 chr17 del 41209048 41215988 6.941 210 212 252 443 1 46.84 29.398

Run 3:
Sample BAMs taken - 35 from test batch | K = 1:5 | chr: 13 and 17
Output:

  sample_name chr cnv st_bp ed_bp length_kb st_exon ed_exon raw_cov norm_cov copy_no lratio mBIC
cnv1 Sample 1 chr13 dup 7687395 7689351 1.957 903 905 317 218 3 19.527 3.53
cnv2 Sample 1 chr17 dup 78403552 78410018 6.467 581 582 167 97 3 19.21 6.351
cnv3 Sample 4 chr13 dup 67647500 68331876 684.377 217 219 715 538 3 20.896 4.269
cnv4 Sample 4 chr17 del 10278985 10284601 5.617 924 925 122 208 1 19.442 8.308
cnv5 Sample 4 chr13 dup 75485511 75489741 4.231 228 229 439 319 3 18.492 11.985

Run 4:
Sample BAMs taken - 18 from test batch and 2 Truth Sample (named TS1 and TS2) | K = 1:20 | All Chr
Output:

  sample_name chr cnv st_bp ed_bp length_kb st_exon ed_exon raw_cov norm_cov copy_no lratio mBIC
cnv1 TS1 chr1 del 14862604 153993261 139130.658 791 836 6452 9534 1 295.137 279.455
cnv2 TS2 chr1 dup 14862604 153230197 138367.594 791 833 13222 10416 3 152.839 136.733
cnv3 Sample 2 chr1 del 14862604 154221443 139358.84 791 838 5468 10091 1 1255.645 1247.018
cnv4 Sample 4 chr1 dup 45609814 50117179 4507.366 558 559 269 182 3 18.066 1.993
cnv5 Sample 5 chr1 dup 151849769 151900169 50.401 353 354 820 601 3 31.968 15.849

Run 5:
Sample BAMs taken - 18 from test batch and 2 Truth Sample (named TS1 and TS2) | K = 1:5 | chr 17
Output:

  sample_name chr cnv st_bp ed_bp length_kb st_exon ed_exon raw_cov norm_cov copy_no lratio mBIC
cnv1 Sample 4 chr17 dup 5036717 5041065 4.349 127 131 980 755 3 19.839 2.346
cnv2 Sample 4 chr17 dup 8480533 8508320 27.788 305 307 301 205 3 19.54 4.606
cnv3 Sample 4 chr17 dup 36874077 36875856 1.78 590 591 751 574 3 17.492 5
cnv4 Sample 6 chr17 del 41209048 41215988 6.941 802 804 252 440 1 45.34 27.665

I am not able to get concurrent results from the above outputs.

  • I am getting the same output in Run 1 and Run 2, even if there is a change in K.
  • In Run 3, when I introduce 2 chromosomes, Chr 13 and 17, the output doesn't contain some of the observed events in Run 1 and 2.
  • In Run 4, Output states TS1 has del and TS2 has dup in Chr 1, which does not match the True CNV output.
  • In Run 5, when I only mention Chr 17, the output doesn't consider the BRCA1 deletion present in TS1, as per the True CNV output.

The code is as follows:

#Preprocessing ####
library(CODEX2)

# Set the path to the directory containing the BAM files 
dirPath = file.path('','home','bioinfo','Nilesh','CODEX2','Attempt_18B_Truth')

# Get the names of all BAM files in the directory
bamFile <- list.files(dirPath, pattern = '*.bam$')

# Create a new path by combining the directory path and the BAM file names
bamdir <- file.path(dirPath, bamFile)

# Extract the sample name from the BAM file name
sampname <- substr(bamFile,1,4)

# Set the path to the BED file
bedFile <- file.path(dirPath, "test_codex.bed")

# Create an object containing the BAM, BED, sample name, and project name
bambedObj <- getbambed(bamdir = bamdir, bedFile = bedFile, 
                       sampname = sampname, projectname = "CODEX2_18B-Truth")

# Extract the BAM directory, sample name, reference, and project name from the object
bamdir <- bambedObj$bamdir
sampname <- bambedObj$sampname
ref <- bambedObj$ref
projectname <- bambedObj$projectname
#------------------------------------------------------------------------------#

#GC Content and Mappability ####
genome = BSgenome.Hsapiens.UCSC.hg19 #hg19
#-- Defining Funcion getgc --#
# Define a function to calculate the GC content of a set of genomic ranges
getgc = function (ref, genome = NULL) {
  # If the genome argument is NULL, set it to the hg19 genome
  if(is.null(genome)){genome = BSgenome.Hsapiens.UCSC.hg19}
  
  # Create an empty vector to store the GC content for each range
  gc = rep(NA, length(ref))
  
  # Loop over each unique chromosome in the input ranges
  for (chr in unique(seqnames(ref))) {
    # Print a message to show which chromosome we are currently processing
    message("Getting GC content for chr ", chr, sep = "")
    
    # Get the indices of all ranges on the current chromosome
    chr.index = which(as.matrix(seqnames(ref)) == chr)
    
    # Create a new set of ranges that only includes the ranges on the current chromosome
    ref.chr = IRanges(start = start(ref)[chr.index], end = end(ref)[chr.index])
    
    # Check if the current chromosome is the X or Y chromosome and set the chrtemp variable accordingly
    if (chr == "X" | chr == "x" | chr == "chrX" | chr == "chrx") {
      chrtemp <- 'X'
    } else if (chr == "Y" | chr == "y" | chr == "chrY" | chr == "chry") {
      chrtemp <- 'Y'
    } else {
      chrtemp <- as.numeric(mapSeqlevels(as.character(chr), "NCBI")[1])
    }
    
    # If the current chromosome cannot be found in the NCBI Homo sapiens database, print a warning message
    if (length(chrtemp) == 0) message("Chromosome cannot be found in NCBI Homo sapiens database!")
    
    # Get the unmasked sequence for the current chromosome from the genome
    chrm <- unmasked(genome[[paste('chr',chrtemp,sep='')]])
    
    # Create a new set of sequences from the genome that only includes the ranges on the current chromosome
    seqs <- Views(chrm, ref.chr)
    
    # Calculate the base frequency for each range on the current chromosome
    af <- alphabetFrequency(seqs, baseOnly = TRUE, as.prob = TRUE)
    
    # Calculate the GC content for each range on the current chromosome
    gc[chr.index] <- round((af[, "G"] + af[, "C"]) * 100, 2)
  }
  
  # Return the GC content vector
  gc
}
#-- getgc Function end --#
# Calculate the GC content of the input ranges
gc <- getgc(ref, genome = genome)

# Calculate the mappability of the input ranges
mapp <- getmapp(ref, genome = genome)

# Create a vector of gene names for each range
gene = rep(NA, length(ref))

# Loop over each unique chromosome in the input ranges
for (chr in as.matrix(unique(seqnames(ref)))) {
  # Get the indices of all ranges on the current chromosome
  chr.index = which(seqnames(ref) == chr)
  
  # Generate a gene name for each range on the current chromosome by concatenating the chromosome name, 
  # the string "_gene_", and the index of the range divided by 30 and rounded up
  gene[chr.index] = paste(chr, '_gene_', ceiling(chr.index / 30), sep = '')
}

# Update the input ranges to include the GC content, mappability, and gene name for each range
values(ref) <- cbind(values(ref), DataFrame(gc, mapp, gene))
#------------------------------------------------------------------------------#

#Raw read Depth ####
# Calculate the coverage of the alignments
coverageObj <- getcoverage(bambedObj, mapqthres = 20)

# Get the coverage matrix from the coverage object
Y <- coverageObj$Y

# Write the coverage matrix to a CSV file
write.csv(Y, file = paste(projectname, '_coverage.csv', sep=''), quote = FALSE)

# Print the first four columns of the coverage matrix
head(Y[,1:4])

#------------------------------------------------------------------------------#

#Quality Control ####
# Perform quality control on the alignment data
qcObj <- qc(Y, sampname, ref, cov_thresh = c(20, 4000),
            length_thresh = c(20, 2000), mapp_thresh = 0.9,
            gc_thresh = c(20, 80))

# Get the filtered coverage matrix, sample names, and genomic ranges from the qc object
Y_qc <- qcObj$Y_qc
sampname_qc <- qcObj$sampname_qc
ref_qc <- qcObj$ref_qc
qcmat <- qcObj$qcmat; gc_qc <- ref_qc$gc

# Write the quality control matrix to a file
write.table(qcmat, file = paste(projectname, '_qcmat', '.txt', sep=''),
            sep = '\t', quote = FALSE, row.names = FALSE)

#------------------------------------------------------------------------------#

#Estimating library size factor for each sample ####
# Create a matrix of non-zero values from the quality-controlled coverage data
Y.nonzero <- Y_qc[apply(Y_qc, 1, function(x){!any(x==0)}),]

# Calculate the pseudo-sample median for each row in the non-zero matrix
pseudo.sample <- apply(Y.nonzero,1,function(x){exp(1/length(x)*sum(log(x)))})

# Divide each column in the non-zero matrix by the corresponding value in the pseudo-sample vector
# and take the median of the resulting values
N <- apply(apply(Y.nonzero, 2, function(x){x/pseudo.sample}), 2, median)

# Create a scatter plot of N against the total sum of reads in each sample
# and save the plot as a PDF file
pdf("Batch_Library_Size_Factor.pdf")
plot(N, apply(Y,2,sum), xlab='Estimated library size factor', ylab='Total sum of reads')
dev.off() 
#------------------------------------------------------------------------------#

#Running CODEX2 without specifying negative control samples/regions ####
# This can be run for One chromosome 
chr <- 'chr17'
#Or for Multiple chromosomes
#c('chr1','chr2','chr3','chr4','chr5','chr6','chr7','chr8','chr9','chr10','chr11','chr12','chr13','chr14','chr15','chr16','chr17','chr18','chr19','chr20','chr21','chr22','chrX','chrY')

chr.index <- which(seqnames(ref_qc)==chr)
normObj.null <- normalize_null(Y_qc = Y_qc[chr.index,],
                               gc_qc = gc_qc[chr.index],
                               K = 1:5, N = N)
Yhat.null <- normObj.null$Yhat
AIC.null <- normObj.null$AIC; BIC.null <- normObj.null$BIC
RSS.null <- normObj.null$RSS

#Choose the number of latent Poisson factors. BIC is used as the model selection metric by default. 
choiceofK(AIC.null, BIC.null, RSS.null, K = 1:5, filename = "codex2_null_choiceofK_1-5_for_18B-Truth_chr17.pdf")
#------------------------------------------------------------------------------#

#Running segmentation by CODEX2 ####
finalcall.CBS <- segmentCBS(Y_qc[chr.index,],  # recommended
                            Yhat.null, optK = which.max(BIC.null),
                            K = 1:5,
                            sampname_qc = colnames(Y_qc),
                            ref_qc = ranges(ref_qc)[chr.index],
                            chr = chr, lmax = 400, mode = "integer")
#------------------------------------------------------------------------------#

#Exporting Results ####
write.csv(finalcall.CBS, "./finalcall_CBS_K1-5_for_18B-Truth_chr17.csv", row.names = TRUE)

1 Could you explain what I am doing wrong?

2 I observed that there are two versions of the code equation for pseudo.sample, as highlighted below:

pseudo.sample <- apply(Y.nonzero,1,function(x){prod(x)^(1/length(x))})

pseudo.sample <- apply(Y.nonzero,1,function(x){exp(1/length(x)*sum(log(x)))})

Could you explain to me the difference between the equations?

3 Is it possible for gene names to be taken from the bed file directly, if a gene name column is provided?

problem and question with CODEX2

Good morning,
I'm Pasquale.
I have some major issues with using CODEX.
We conducted a sequencing of only 1 ZC3H10 gene on 96 samples (controls and patients). from the trace on IGV there seems to be a CNV, so I started using the CODEX tools.
I wonder ask yoi:

  • does it work even with such a small portion (about 1kb)? why is it worth testing samples?
  • because if I use chr <- 12 it calculates the contents of GC and mapp, but then the rest doesn't work for me, while if I use chr <- NC_000012.12 the rest works for me but not GC and mapp??
  • also this step gives me problems

qcObj <- qc(Y, sampname, ref, cov_thresh = c(20, 4000),
length_thresh = c(20, 2000), mapp_thresh = 0.9,
gc_thresh = c(20, 80))
Excluded 8 exons due to extreme coverage.
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'end': argument "ref" is missing, with no default

I would be grateful if you give me some suggestions.

missing or infinite values in inputs are not allowed

Hi,
I follow this tutorial: http://htmlpreview.github.io/?https://github.com/yuchaojiang/CODEX2/blob/master/demo/CODEX2.html

I have ~180 bam files that came from the same lab, but in different time periods (batches of 9-15 files).
When I analyze them together I get error:

Error in smooth.spline(gc_qc, z) : 
  missing or infinite values in inputs are not allowed

(maybe it comes from:

> pseudo.sample
  1:861321-861393   1:865534-865716   1:866418-866469   1:871151-871276   1:874419-874509   1:874654-874840   1:876523-876686   1:877938-878438 
              Inf               Inf               Inf               Inf               Inf               Inf               Inf               Inf 
  1:878632-878757   1:879077-879188   1:879287-879533   1:880073-880180   1:880436-880526   1:880897-881033   1:881552-881666   1:881781-881925 
              Inf  

)

are there some methods, to firstly run "batches" separately, and then merge the results and follow the tutorial?

Thanks for any advice,
Damian

CODEX2 targeted Error: subscript out of bounds

I am trying to run CODEX2 on couple of targeted resequencing samples. the bed file is broken down into intervals of 250 basepairs for the regions of interest. I got the targeted demo code from git and initiated the run. Below is my log and error that I ran into. Any help understanding this error and fixing it would be great!

Thanks!

`Getting GC content for chr 1
Getting GC content for chr 10
Getting GC content for chr 11
Getting GC content for chr 12
Getting GC content for chr 13
Getting GC content for chr 14
Getting GC content for chr 15
Getting GC content for chr 16
Getting GC content for chr 17
Getting GC content for chr 18
Getting GC content for chr 19
Getting GC content for chr 2
Getting GC content for chr 20
Getting GC content for chr 21
Getting GC content for chr 22
Getting GC content for chr 3
Getting GC content for chr 4
Getting GC content for chr 5
Getting GC content for chr 6
Getting GC content for chr 7
Getting GC content for chr 8
Getting GC content for chr 9
Getting GC content for chr X
Getting GC content for chr Y
Getting mappability for chr1
Getting mappability for chr10
Getting mappability for chr11
Getting mappability for chr12
Getting mappability for chr13
Getting mappability for chr14
Getting mappability for chr15
Getting mappability for chr16
Getting mappability for chr17
Getting mappability for chr18
Getting mappability for chr19
Getting mappability for chr2
Getting mappability for chr20
Getting mappability for chr21
Getting mappability for chr22
Getting mappability for chr3
Getting mappability for chr4
Getting mappability for chr5
Getting mappability for chr6
Getting mappability for chr7
Getting mappability for chr8
Getting mappability for chr9
Getting mappability for chrX
Getting mappability for chrY

Getting coverage for sample Sample1: read length 97.
Getting coverage for sample Sample2: read length 97.
Error in Y[, 1:5] : subscript out of bounds
Calls: head
Execution halted
`

failed in loading hg38

library(CODEX2)
library(BSgenome.Hsapiens.UCSC.hg38)

Attaching package: ‘BSgenome.Hsapiens.UCSC.hg38’

The following object is masked from ‘package:BSgenome.Hsapiens.UCSC.hg19’:

Hsapiens

The following object is masked from ‘package:BSgenome.Hsapiens.UCSC.hg19’: Hsapiens

gc <- getgc(ref, genome = BSgenome.Hsapiens.UCSC.hg38)
Error: object 'ref' not found

Some bugs and suggestion

Line 11 and Line 25 in qc.R:
You can replace end(ref) - start(ref)+1 by width(ref), for ref in of type IRranges
Line 67-68 in normalize.R:
Offset might be L[s,] , not log(L[s,]), because L[,] is already the logarithm of (Nmat * fhat * betahatmat)
as in line 54
Line 92 in codex_targeted.R:
The qcObj here is created across all chromosomes, but when call the qc function in qc.R
it seems that your qc function is for one chromosome, as in line 24,"rep(chr, length(ref))"
so when applied to targeted sequencing, we should alter the qc.R in the package
from "rep(chr, length(ref))" to 'chr' in line 24 and call it with 'chr.all' in condex_targeted.R

Error in genome[[chrtemp]] : no such sequence

Hi Yuchao,
When I tried to run the getgc function, I always get the error below. Can you please advise what's wrong? Thanks.

Getting GC content for chr 1
Getting GC content for chr 10
Getting GC content for chr 11
Getting GC content for chr 12
Getting GC content for chr 13
Getting GC content for chr 14
Getting GC content for chr 15
Getting GC content for chr 16
Getting GC content for chr 17
Getting GC content for chr 18
Getting GC content for chr 19
Getting GC content for chr 2
Getting GC content for chr 20
Getting GC content for chr 21
Getting GC content for chr 22
Getting GC content for chr 3
Getting GC content for chr 4
Getting GC content for chr 5
Getting GC content for chr 6
Getting GC content for chr 7
Getting GC content for chr 8
Getting GC content for chr 9
Getting GC content for chr M
Error in genome[[chrtemp]] : no such sequence
Calls: getgc -> unmasked -> [[ -> [[
Execution halted

Best,
Zeyan

First Pass - Normalizing Step - Error in smooth.spline

Dear Developer,

First, I'd like to thank you for all your efforts.

I sequenced some specific regions of chromosome 2, 5 and 12 using a custom panel. So my goal is to call CNV from targeted sequencing data.

n. Regions: some 3600
n, Samples: 16
Reference: HG38
R version: 3.6.3

I easily go through your demo (that perfectly fulfill my needs) up to the normalization step
Now, I designed my panel in order to use sequenced regions in chromosomes 2 and 5 as negative control region.

Despite this, I'd like to go for a 2-pass run calling CNV without considering negative control regions first. In this way, I can figure out exactly which regions on chromosome 12 I've to use as cnv_index in the second run.

I faced a problem trying to normalize my data without considering negative control regions.
This command:

chr <- 'chr12'
chr.index <- which(seqnames(ref_qc)==chr)
normObj.null <- normalize_null(Y_qc = Y_qc[chr.index,],
                               gc_qc = gc_qc[chr.index],
                               K = 1:5, N = N)

returns:

Computing normalization with k = 1 latent factors ...
Error in smooth.spline(gc_qc, z) : 
  'tol' must be strictly positive and finite
In addition: Warning message:
In matrix(nrow = nrow(Y_qc), ncol = ncol(Y_qc), data = N, byrow = TRUE) :
  data length exceeds size of matrix

After a little bit of debugging, I found that, due to problems with the library or during amplification phase, I totally missed some of the regions I was expecting to cover (some 50 small regions). I thought that maybe 0s in the coverage matrix were causing the problem.
I removed them but I still get the error message.

Am I missing something?

Thanks in advance,
Davide

some questions about CODEX2

Dear Dr.Jiang,
I am a student from China.I'm using CODEX2 to analyze copy number changes in human samples by target capture sequencing, but I am so confused by the bad results.I didn't make correct input files and there's no online tutorial of target capture sequencing.Therefore ,there are several questions I hope you could help me out :
1.how the bam files are made?Could you tell me if the way bam file of target capture sequencing data are made is the same as the WES?Are they only sorted?What are the requirements for the same batch of tested bam files e.g the difference of the read depth ?
2.how's the bed files cooked?I have the exact position of the capture region of target capture sequencing.
3.In addition,I don't know why the result shows that the duplicate and the deletion exist on the same chromosome of the same sample with high quality.
Thank you very much in advance.

Best regards,
Ziying Zhou

Error when installing

I get the following error when I try to install. It also kills the R session after. I have all of the packages listed: Rsamtools, GenomicRanges, BSgenome.Hsapiens.UCSC.hg19, S4Vectors, IRanges, Biostrings, GenomeInfoDb installed already and can load them without any problems

> devtools::install_github("yuchaojiang/CODEX2/package")
Downloading GitHub repo yuchaojiang/CODEX2@master
Skipping 7 packages not available: Rsamtools, GenomicRanges, BSgenome.Hsapiens.UCSC.hg19, S4Vectors, IRanges, Biostrings, GenomeInfoDb
✔  checking for file ‘/tmp/RtmpQrsnVv/remotes3f3115da431d/yuchaojiang-CODEX2-e47af62/package/DESCRIPTION’ ... OK
─  preparing ‘CODEX2’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  looking to see if a ‘data/datalist’ file should be added
─  building ‘CODEX2_1.3.0.tar.gz’ (10.3s)
   
* installing *source* package ‘CODEX2’ ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
/home/maherm/R/lib64/R/bin/INSTALL: line 34: 31405 Done                    echo 'tools:::.install_packages()'
     31406 Killed                  | R_DEFAULT_PACKAGES= LC_COLLATE=C "${R_HOME}/bin/R" $myArgs --slave --args ${args}
Error: Failed to install 'CODEX2' from GitHub:
  (converted from warning) installation of package ‘/tmp/RtmpQrsnVv/file3f3133e61203/CODEX2_1.3.0.tar.gz’ had non-zero exit status
> 
> 
> Terminated
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /home/maherm/R/lib64/R/lib/libRblas.so
LAPACK: /home/maherm/R/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.1

how to run CODEX2 with a pair of Tomor-Normal bam?

now,I have sevaral pairs of sorted.markup.bam to call CNV
In CODEX2.Rmd , I found following command to fill in my Tumor bam:
bamFile <- list.files(dirPath, pattern = '
.bam$')

But in the part of ###Running CODEX2 with negative control samples###

{

Running CODEX2 with negative control samples

For the case-control scenario, the normal sample index is known (samples without spike-in signals).

# Below are pre-computed demo dataset, stored as part of the CODEX2 R-package.
normObj <- normalize_codex2_ns(Y_qc = Y_qc[chr.index,],
                               gc_qc = gc_qc[chr.index], 
                               K = 1:5, norm_index = norm_index_demo,
                               N = N)
Yhat.ns <- normObj$Yhat; fGC.hat.ns <- normObj$fGC.hat;
beta.hat.ns <- normObj$beta.hat; g.hat.ns <- normObj$g.hat; h.hat.ns <- normObj$h.hat
AIC.ns <- normObj$AIC; BIC.ns <- normObj$BIC; RSS.ns <- normObj$RSS

Choose the number of latent Poisson factors. BIC is used as the model selection metric by default.

choiceofK(AIC.ns, BIC.ns, RSS.ns, K = 1:5 , filename = "codex2_ns_choiceofK.pdf")
par(mfrow = c(1, 3))
plot(1:5, RSS.ns, type = "b", xlab = "Number of latent variables", pch=20)
plot(1:5, AIC.ns, type = "b", xlab = "Number of latent variables", pch=20)
plot(1:5, BIC.ns, type = "b", xlab = "Number of latent variables", pch=20)
par(mfrow = c(1,1))

}

it seem that no normal bam can be filled in.
How can I run CODEX2 with a pair of Tomor-Normal bam?
Is it ture that Normal Bam can be the input of CODEX2?
Sincerely hoping to receive your reply.

Problem running qc

Hi,

I run qcObj <- qc(Y, sampname, ref, cov_thresh = c(20, 4000), length_thresh = c(20, 2000), mapp_thresh = 0.9, gc_thresh = c(20,80))

and get the following error

Error in apply(Y_qc, 1, median) : dim(X) must have a positive length

what is the issue?

Thanks

error with BSgenome.Hsapiens.UCSC.hg38

Hi,

chrX and chrY chromosomes do not seem to be recognized:

gc <- getgc(ref, genome = genome)
Getting GC content for chr chr1
Getting GC content for chr chr10
Getting GC content for chr chr11
Getting GC content for chr chr12
Getting GC content for chr chr13
Getting GC content for chr chr14
Getting GC content for chr chr15
Getting GC content for chr chr16
Getting GC content for chr chr17
Getting GC content for chr chr18
Getting GC content for chr chr19
Getting GC content for chr chr2
Getting GC content for chr chr20
Getting GC content for chr chr21
Getting GC content for chr chr22
Getting GC content for chr chr3
Getting GC content for chr chr4
Getting GC content for chr chr5
Getting GC content for chr chr6
Getting GC content for chr chr7
Getting GC content for chr chr8
Getting GC content for chr chr9
Getting GC content for chr chrX
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'unmasked': no such sequence

mapp <- getmapp(ref, genome = genome)
Getting mappability for chr1
Getting mappability for chr10
Getting mappability for chr11
Getting mappability for chr12
Getting mappability for chr13
Getting mappability for chr14
Getting mappability for chr15
Getting mappability for chr16
Getting mappability for chr17
Getting mappability for chr18
Getting mappability for chr19
Getting mappability for chr2
Getting mappability for chr20
Getting mappability for chr21
Getting mappability for chr22
Getting mappability for chr3
Getting mappability for chr4
Getting mappability for chr5
Getting mappability for chr6
Getting mappability for chr7
Getting mappability for chr8
Getting mappability for chr9
Getting mappability for chrX
Getting mappability for chrY
values(ref) <- cbind(values(ref), DataFrame(gc, mapp))
Error in FUN(X[[i]], ...) : object of type 'closure' is not subsettable

qcObj <- qc(Y, sampname, ref, cov_thresh = c(20, 4000),
length_thresh = c(20, 2000), mapp_thresh = 0.9,
gc_thresh = c(20, 80))
Excluded 260 exons due to extreme coverage.
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'end': argument "ref" is missing, with no default
Y_qc <- qcObj$Y_qc; sampname_qc <- qcObj$sampname_qc
ref_qc <- qcObj$ref_qc; qcmat <- qcObj$qcmat; gc_qc <- ref_qc$gc
Error in getListElement(x, i, ...) :
IRanges objects don't support [[, as.list(), lapply(), or unlist() at
the moment

seqnames(bambedObj$ref)
factor-Rle of length 32821 with 24 runs
Lengths: 2605 1405 1462 1529 1069 ... 1768 1611 1386 1709 202
Values : chr1 chr10 chr11 chr12 chr13 ... chr7 chr8 chr9 chrX chrY
Levels(24): chr1 chr10 chr11 chr12 chr13 chr14 ... chr7 chr8 chr9 chrX chrY

Questions about "Running CODEX2 without specifying negative control samples/regions"

Hi, I have 49 uncorrelated samples and run in batch with CODEX2 without specifying negative control samples/regions. But the results is quit terrible, the number of cnv were few and only detected chr1,chr2 and chr3, and even some samples could not find any cnv (One of the sample is NA12878). I was wondering where is wrong in my code:

library(CODEX2)
library(WES.1KG.WUGSC) # Load Toy data from the 1000 Genomes Project.

###################################################################################
# Pre-processing

bamFile <- list.files('/storage/wujh/Project/CNV_analysis/CODEX/bam', pattern = '*.bam$') 
bamdir <- file.path('/storage/wujh/Project/CNV_analysis/CODEX/bam', bamFile)
sampname <- substr(bamFile,1,8)
bedFile <- file.path('/home/sxuan/codex2', 'AgV6.bed')
bambedObj <- getbambed(bamdir = bamdir, bedFile = bedFile, 
                       sampname = sampname, projectname = 'CODEX2_test')
bamdir <- bambedObj$bamdir; sampname <- bambedObj$sampname
ref <- bambedObj$ref; projectname <- bambedObj$projectname


###################################################################################
# Getting GC content and mappability

gc <- getgc(ref, genome = BSgenome.Hsapiens.UCSC.hg19)
mapp <- getmapp(ref, genome = BSgenome.Hsapiens.UCSC.hg19)
values(ref) <- cbind(values(ref), DataFrame(gc, mapp))


####################################################################################
# Getting raw read depth

coverageObj <- getcoverage(bambedObj, mapqthres = 20)
Y <- coverageObj$Y
write.csv(Y, file = paste(projectname, '_coverage.csv', sep=''), quote = FALSE)


####################################################################################
# Quality control

print('Quality control')
qcObj <- qc(Y, sampname, ref, cov_thresh = c(20, Inf),
            length_thresh = c(20, Inf), mapp_thresh = 0.9,
            gc_thresh = c(20, 80))
Y_qc <- qcObj$Y_qc; sampname_qc <- qcObj$sampname_qc
ref_qc <- qcObj$ref_qc; qcmat <- qcObj$qcmat; gc_qc <- ref_qc$gc
write.table(qcmat, file = paste(projectname, '_qcmat', '.txt', sep=''),
            sep = '\t', quote = FALSE, row.names = FALSE)


###################################################################################
# Running CODEX2
print('Running CODEX2')

Y.nonzero <- Y_qc[apply(Y_qc, 1, function(x){!any(x==0)}),]
pseudo.sample <- apply(Y.nonzero,1,function(x){prod(x)^(1/length(x))})
N <- apply(apply(Y.nonzero, 2, function(x){x/pseudo.sample}), 2, median)

# Running CODEX2 without negative control samples/regions

print('Running CODEX2 without negative control samples/regions')
chr <- c(c(1:22),'X','Y')

chr.index <- which(seqnames(ref_qc)==chr)
norm_index <- c(1:length(sampname))

normObj.null <- normalize_null(Y_qc = Y_qc[chr.index,],
                           gc_qc = gc_qc[chr.index],
                           K = 1:5, N = N)
Yhat.null <- normObj.null$Yhat
AIC.null <- normObj.null$AIC; BIC.null <- normObj.null$BIC
RSS.null <- normObj.null$RSS

#Choose the number of latent Poisson factors. BIC is used as the model selection metric by default.
choiceofK(AIC.null, BIC.null, RSS.null, K = 1:5 , filename = "codex2_null_choiceofK.pdf")
par(mfrow = c(1, 3))
plot(1:5, RSS.null, type = "b", xlab = "Number of latent variables", pch=20)
plot(1:5, AIC.null, type = "b", xlab = "Number of latent variables", pch=20)
plot(1:5, BIC.null, type = "b", xlab = "Number of latent variables", pch=20)
par(mfrow = c(1,1))

##################################################################################
# Running segmentation by CODEX2
print('Running segmentation by CODEX2')
# recommended
finalcall.CBS <- segmentCBS(Y_qc[chr.index,],  
                            Yhat.null, optK = which.max(BIC.null),
                            K = 1:5,
                            sampname_qc = paste('sample',1:ncol(Y_qc),sep=''),
                            ref_qc = ranges(ref_qc)[chr.index],
                            chr = chr, lmax = 400, mode = "integer")
write.table(finalcall.CBS, file=paste('norm_null.final.cnv',sep=''), sep='\t', quote=F)

# not recommended
# finalcall.HMM <- segmentHMM(Y_qc[chr.index,],  
#                             Yhat.ns, optK = which.max(BIC.ns),
#                             K = 1:5,
#                             sampname_qc = paste('sample',1:ncol(Y_qc),sep=''),
#                             ref_qc = ranges(ref_qc)[chr.index],
#                             chr = chr, mode = "integer")


# Post-segmentation pruning and filtering are recommended based on CNV length (filter1), 
# length per exon (filter2), likelihood ratio (filter3), and number of exons (filter4)

filter1 <- finalcall.CBS$length_kb<=200
filter2 <- finalcall.CBS$length_kb/(finalcall.CBS$ed_exon-finalcall.CBS$st_exon+1)<50
finalcall.CBS.filter <- finalcall.CBS[filter1 & filter2, ]

filter3 <- finalcall.CBS.filter$lratio>40
filter4 <- (finalcall.CBS.filter$ed_exon-finalcall.CBS.filter$st_exon)>1
finalcall.CBS.filter=finalcall.CBS.filter[filter3|filter4,]

write.table(finalcall.CBS.filter, file='_norm_null.final.filtered.cnv', sep='\t', quote=F)

Any suggestions will be greatly appreciated !

the meaning of the final result

Dear Dr.J,
Thank you for your work in developing CODEX2. As mentioned in title, how to understand the title 'sample_name chr cnv st_bp ed_bp length_kb st_exon ed_exon raw_cov norm_cov copy_no lratio mBIC' in the final result.?
Best regards
Li'an Lin

Need help to understand the choice of optK

Hello Dr Yuchao

Thanks to you for your kind and clear reply. I hope you are enjoying this lovely Christmas time :-)
May I ask you one more question?
In general, the model which has the lowest BIC and RSS evaluated as the most efficient one.
However, CODEX2 determines the latent factor K which has the highest value of BIC.

Although I guess it is because of CODEX2 may have some different working principles from other programing models,
I couldn't clear what the difference is.

So could you give me some clues to clarify why in CODEX2 the K that has the highest value of BIC (instead of the lowest) is recommended as optK?

Demo code is this:

finalcall.CBS <- segmentCBS(Y_qc[chr.index,],  # recommended
                            Yhat.ns, optK = which.max(BIC.ns),
                            K = 1:5,
                            sampname_qc = colnames(Y_qc),
                            ref_qc = ranges(ref_qc)[chr.index],
                            chr = chr, lmax = 400, mode = "integer")

Need Help for Normalization w/ all chromosomes

Hello Dr Yuchao

Thanks to you for creating CODEX2, I am using this wonderful program well for work.
During running CODEX2, I had a difficulty in making a decision to make a code for normalization using all chromosomes.
So, I ask you if there's any problem with the code I used or if there is a better way.

My situation is this: I have 45 samples (without CNV details), and I want to identify which sample has germ-line variations on some specific genes in chr9. As I don’t have a specified negative control samples/regions, I want to use all chromosomes in all samples for better normalization. I checked the demo code “Running CODEX2 without specifying negative control samples/regions“, and have a question how to modify this code to use all chromosomes in all samples for normalization.

The one trial I made is:
normObj.null <- normalize_null(Y_qc = Y_qc, gc_qc = gc_qc, K = 1:4, N = N)

(cf. Given demo code is:
normObj.null <- normalize_null(Y_qc = Y_qc[chr.index,], gc_qc = gc_qc[chr.index], K = 1:5, N = N))

The whole code I used is as follows:

library(BiocManager)
library(CODEX2)
dirPath <- '/BiO/home/jekim/Data/HL_TJP2/Total-E'
bamFile <- list.files(dirPath, pattern = '*.bam$')
bamdir <- file.path(dirPath, bamFile)
bedFile <- file.path('/BiO/home/jekim/CNVkit', 'Exome_Xgen.bed')
sampname <- substr(bamFile,0,10)
bambedObj <- getbambed(bamdir = bamdir, bedFile = bedFile, 
                       sampname = sampname, projectname = "E4-all")
bamdir <- bambedObj$bamdir; sampname <- bambedObj$sampname
ref <- bambedObj$ref; projectname <- bambedObj$projectname

genome = BSgenome.Hsapiens.UCSC.hg19

getgc <- function (ref, genome = NULL) 
{
    if (is.null(genome)) {
        genome = BSgenome.Hsapiens.UCSC.hg19
    }
    gc = rep(NA, length(ref))
    for (chr in unique(seqnames(ref))) {
        message("Getting GC content for chr ", chr, sep = "")
        chr.index = which(as.matrix(seqnames(ref)) == chr)
        ref.chr = IRanges(start = start(ref)[chr.index], end = end(ref)[chr.index])
        if (chr == "X" | chr == "x" | chr == "chrX" | chr == 
            "chrx") {
            chrtemp <- "X"
        }
        else if (chr == "Y" | chr == "y" | chr == "chrY" | chr == 
            "chry") {
            chrtemp <- "Y"
        }
        else {
            chrtemp <- as.numeric(mapSeqlevels(as.character(chr), 
                "NCBI")[1])
        }
        if (length(chrtemp) == 0) 
            message("Chromosome cannot be found in NCBI Homo sapiens database!")
        chrm <- unmasked(genome[[paste("chr", chrtemp, sep = "")]])
        seqs <- Views(chrm, ref.chr)
        af <- alphabetFrequency(seqs, baseOnly = TRUE, as.prob = TRUE)
        gc[chr.index] <- round((af[, "G"] + af[, "C"]) * 100, 
            2)
     }
      gc
}
gc <- getgc(ref, genome = genome)
mapp <- getmapp(ref, genome = genome)
values(ref) <- cbind(values(ref), DataFrame(gc, mapp))
coverageObj <- getcoverage(bambedObj, mapqthres = 20)
Y <- coverageObj$Y
write.csv(Y, file = paste(projectname, '_coverage.csv', sep=''), quote = FALSE)
qcObj <- qc(Y, sampname, ref, cov_thresh = c(20, 4000),
            length_thresh = c(20, 2000), mapp_thresh = 0.9,
            gc_thresh = c(20, 80))
Y_qc <- qcObj$Y_qc; sampname_qc <- qcObj$sampname_qc
ref_qc <- qcObj$ref_qc; qcmat <- qcObj$qcmat; gc_qc <- ref_qc$gc
write.table(qcmat, file = paste(projectname, '_qcmat', '.txt', sep=''),
            sep = '\t', quote = FALSE, row.names = FALSE)
Y.nonzero <- Y_qc[apply(Y_qc, 1, function(x){!any(x==0)}),]
pseudo.sample <- apply(Y.nonzero,1,function(x){exp(1/length(x)*sum(log(x)))})
N <- apply(apply(Y.nonzero, 2, function(x){x/pseudo.sample}), 2, median)
normObj.null <- normalize_null(Y_qc = Y_qc,
                               gc_qc = gc_qc,
                               K = 1:4, N = N)
Yhat.null <- normObj.null$Yhat
AIC.null <- normObj.null$AIC; BIC.null <- normObj.null$BIC
RSS.null <- normObj.null$RSS
choiceofK(AIC.null, BIC.null, RSS.null, K = 1:4 , filename = "E4-all_choiceofK.pdf")
write.table(Yhat.null, file = paste( projectname,'_Yhat.null_test.txt',
                                     sep=''), sep='\t', quote=F, row.names=F)
chro <- c('chr1','chr2','chr3','chr4','chr5','chr6','chr7','chr8','chr9','chr10','chr11','chr12','chr13','chr14','chr15','chr16','chr17','chr18',
     'chr19','chr20','chr21','chr22','chrX','chrY')
for (chr in chro){
    chr.index <- which(seqnames(ref_qc)==chr)        
    finalcall.CBS <- segmentCBS(Y_qc[chr.index,],
Yhat.null, optK = which.max(BIC.null), K = 1:4,
sampname_qc = sampname_qc,
ref_qc = ranges(ref_qc)[chr.index],
chr = chr, lmax = 400, mode = "integer")   
write.table(finalcall.CBS, file = paste( projectname, chr, '_no_filter_test.txt',
                                     sep=''), sep='\t', quote=F, row.names=F)
 filter1 <- finalcall.CBS$length_kb<=200
    filter2 <- finalcall.CBS$length_kb/(finalcall.CBS$ed_exon-finalcall.CBS$st_exon+1)<50
    finalcall.CBS.filter <- finalcall.CBS[filter1 & filter2, ]
    filter3 <- finalcall.CBS.filter$lratio>40
    filter4 <- (finalcall.CBS.filter$ed_exon-finalcall.CBS.filter$st_exon)>1
    finalcall.CBS.filter=finalcall.CBS.filter[filter3|filter4,]
    # save filtered results
    write.table(finalcall.CBS.filter, file = paste( projectname, chr, '_filter.txt',
                                     sep=''), sep='\t', quote=F, row.names=F)
    }

and the Y_qc, gc_qc look like this:
image
image

image

  1. Is this code okay for normalization - though it doesn’t have chr indexing “Y_qc = Y_qc[chr.index,], gc_qc = gc_qc[chr.index]”?
  2. Could you suggest me a better solution for this-using all chromosomes in all samples for normalization?

chr in bambedObj is NULL / missing chr in getcoverage() in codex2_targeted.R

Dear CODEX2 developers,

I recently run CODEX2 v1.3.0 successfully in "exome mode" without controls, but I was unable to run your approach for targeted sequencing given in the codex2_targeted.R sccript.
First, I suppose that the getcoverage() call in line 22 misses argument chr. However, setting chr to bambedObj$chr fails in my case, as bambedObj$chr is NULL.

The bambedObj seems to be created as expected, at least without warnings and yielding:

>names(bambedObj)
[1] "bamdir"      "sampname"    "ref"         "projectname"

>bambedObj$ref
GRanges object with 4083 ranges and 0 metadata columns:
         seqnames               ranges strand
            <Rle>            <IRanges>  <Rle>
     [1]        1   [1634345, 1634438]      *
     [2]        1   [1634518, 1634708]      *
     [3]        1   [1634914, 1635063]      *
     [4]        1   [1635262, 1635379]      *
     [5]        1   [1635477, 1635585]      *
     ...      ...                  ...    ...
  [4079]       22 [40804797, 40804846]      *
  [4080]       22 [40804936, 40805022]      *
  [4081]       22 [40805253, 40805376]      *
  [4082]       22 [40805468, 40805529]      *
  [4083]       22 [40805685, 40805763]      *
  -------
  seqinfo: 21 sequences from an unspecified genome; no seqlengths

How can bambedObj$chr be initialized?

Many thanks in advance!

Difference between getmapp.R and mapp.R

I noticed there two different versions of function for calculating mappability and they take different inputs as arguments. What is the difference between those two? If I am using mouse genome, which one should I use?

Thanks

Differences between CODEX and CODEX2?

Hi,

The CODEX page redirects me here, and could you please tell the major differences between version one and two?

If possible, I think it is better to make an introduction of the improvements in the README documents.

Thanks a lot.

CODEX2 for low coverage WGS

We have a low pass WGS data set and I found none of them can pass the sd threshold during normalization:
nonCNV.index=apply(z.codex,1,sd) < 0.15

summary(apply(z.codex,1,sd))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.2869  0.5146  0.5487  0.5481  0.5822  0.736

What might be the concerns to increase the sd's threshold? say at median=0.55 or other fixed threshold?

The low coverage issue also found in the qc step where 104221 out of 135087 exons were excluded. (102876 due to extreme low coverage), I seem have to reduce the minimum coverage threshold as well. Any suggestions for it?

how to set sampname_qc correctly

I followed the pipeline provided in http://htmlpreview.github.io/?https://github.com/yuchaojiang/CODEX2/blob/master/demo/CODEX2.html. The output result like
image. Obviously, it miss the true sample name,how do i fix it or how to set sampname_qc parameter correctly?

finalcall.CBS <- segmentCBS(Y_qc[chr.index,], # recommended Yhat.ns, optK = which.max(BIC.ns), K = 1:5, sampname_qc = paste('sample',1:ncol(Y_qc),sep=''), ref_qc = ranges(ref_qc)[chr.index], chr = chr, lmax = 400, mode = "integer")

NaNs produced

Hello CODEX2 team,
I am using Codex to run Germline CNV for the batch of samples. I encountered an issue of NANs produced and the job was not proceeded.

Log file after getting coverage for samples are below (issue happened only at chromosome 8 but not others). Can you please help to take a look at this?

Excluded 53 exons due to extreme coverage.
Excluded 106 exons due to extreme exonic length.
Excluded 0 exons due to extreme mappability.
Excluded 23 exons due to extreme GC content.
After taking union, excluded 168 out of 6967 exons in QC.
k = 1
Iteration 1 beta diff =0.673 f(GC) diff =2.42e-08
hhat diff =0.233
hhat diff =0.000456
hhat diff =0.000289
hhat diff =0.000199
hhat diff =0.000149
hhat diff =0.000121
hhat diff =0.000107
hhat diff =0.000101
hhat diff =9.99e-05
Iteration 2 beta diff =0.000254 f(GC) diff =2.87e-12
hhat diff =0.233
hhat diff =0.000391
hhat diff =0.000245
hhat diff =0.000172
hhat diff =0.000132
hhat diff =0.000111
hhat diff =9.94e-05
Iteration 3 beta diff =4.18e-05 f(GC) diff =6.94e-13
hhat diff =0.234
hhat diff =0.000306
hhat diff =0.000178
hhat diff =0.00012
hhat diff =9e-05
Stop at Iteration 3.
AIC1 = 139378901.359
BIC1 = 139313074.174
RSS1 = 191.074
k = 2
Iteration 1 beta diff =0.673 f(GC) diff =2.42e-08
hhat diff =0.00384
hhat diff =0.00179
hhat diff =0.00156
hhat diff =0.00177
hhat diff =0.0016
hhat diff =0.000968
hhat diff =0.000497
hhat diff =0.00027
hhat diff =0.000165
hhat diff =0.000112
hhat diff =8.29e-05
Iteration 2 beta diff =0.000342 f(GC) diff =2.49e-12
hhat diff =0.12
hhat diff =0.119
hhat diff =0.000935
hhat diff =0.000615
hhat diff =0.000402
hhat diff =0.000268
hhat diff =0.000185
hhat diff =0.000134
hhat diff =0.000102
hhat diff =8.11e-05
Iteration 3 beta diff =4.38e-05 f(GC) diff =1.3e-12
hhat diff =0.119
hhat diff =0.00109
hhat diff =0.118
hhat diff =0.000502
hhat diff =0.000325
hhat diff =0.000218
hhat diff =0.000155
hhat diff =0.000116
hhat diff =9.01e-05
Stop at Iteration 3.
AIC2 = 139378994.766
BIC2 = 139247340.397
RSS2 = 167.802
...
k = 9
Iteration 1 beta diff =0.673 f(GC) diff =2.42e-08
hhat diff =0.11
hhat diff =0.0793
hhat diff =0.0524
hhat diff =0.0526
hhat diff =0.0527
hhat diff =0.0265
hhat diff =0.0264
hhat diff =0.0785
hhat diff =0.00116
hhat diff =0.000318
hhat diff =0.0524
hhat diff =0.0523
hhat diff =0.0523
hhat diff =3.6e-05
Iteration 2 beta diff =0.000382 f(GC) diff =7.2e-13
hhat diff =0.138
hhat diff =0.182
hhat diff =0.0524
hhat diff =0.0523
hhat diff =0.0528
hhat diff =0.0526
hhat diff =0.0263
hhat diff =0.0524
hhat diff =0.000283
hhat diff =0.0524
hhat diff =0.000507
Iteration 3 beta diff =3.61e-05 f(GC) diff =1.02e-13
hhat diff =0.11
hhat diff =0.0278
hhat diff =0.0528
hhat diff =0.000288
hhat diff =0.0262
hhat diff =0.0523
hhat diff =7.36e-05
Stop at Iteration 3.
AIC9 = 139346667.632
BIC9 = 138754222.974
RSS9 = 76.739
k = 10
Iteration 1 beta diff =0.673 f(GC) diff =2.42e-08
hhat diff =0.121
hhat diff =0.14
hhat diff =0.0472
hhat diff =0.0942
hhat diff =0.0472
hhat diff =0.000418
hhat diff =0.0705
hhat diff =0.0928
hhat diff =0.0707
hhat diff =0.0242
hhat diff =0.0473
hhat diff =0.000274
Iteration 2 beta diff =0.000376 f(GC) diff =6.84e-13
hhat diff =0.125
hhat diff =0.118
hhat diff =0.0707
hhat diff =0.0941
hhat diff =0.0705
hhat diff =0.0473
hhat diff =0.0238
hhat diff =0.0472
hhat diff =0.118
hhat diff =0.0943
hhat diff =0.0249
Iteration 3 beta diff =0.000182 f(GC) diff =4.19e-13
hhat diff =0.13
hhat diff =0.0944
hhat diff =0.0475
hhat diff =0.0474
hhat diff =0.0706
hhat diff =0.0706
hhat diff =0.00278
hhat diff =0.114
hhat diff =0.0942
hhat diff =0.0706
hhat diff =0.0706
hhat diff =0.000396
Iteration 4 beta diff =0.000165 f(GC) diff =1.08e-13
hhat diff =0.107
hhat diff =0.141
hhat diff =0.106
hhat diff =0.0946
hhat diff =0.138
hhat diff =0.0765
hhat diff =0.111
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
Calls: normalize2 -> lm -> lm.fit
In addition: Warning messages:
1: glm.fit: fitted rates numerically 0 occurred
2: glm.fit: fitted rates numerically 0 occurred
3: glm.fit: fitted rates numerically 0 occurred
4: step size truncated due to divergence
5: In log(L[s, normal_index]) : NaNs produced
Execution halted>

Error in genome[[chrtemp]] : no such sequence, getgc -> unmasked -> [[ -> [[

Hi,
I tried CODEX2 for some sample sets. The first two sets was just fine.
For the third set, I got the following error.

....
normalize

Getting GC content for chr chr1
Getting GC content for chr chr10
Getting GC content for chr chr11
Getting GC content for chr chr12
Getting GC content for chr chr13
Getting GC content for chr chr14
Getting GC content for chr chr15
Getting GC content for chr chr16
Getting GC content for chr chr17
Getting GC content for chr chr18
Getting GC content for chr chr19
Getting GC content for chr chr2
Getting GC content for chr chr20
Getting GC content for chr chr21
Getting GC content for chr chr22
Getting GC content for chr chr3
Getting GC content for chr chr4
Getting GC content for chr chr5
Getting GC content for chr chr6
Getting GC content for chr chr7
Getting GC content for chr chr8
Getting GC content for chr chr9
Getting GC content for chr chrM
Error in genome[[chrtemp]] : no such sequence
Calls: getgc -> unmasked -> [[ -> [[
In addition: Warning message:
In getgc(ref, genome = genome) : NAs introduced by coercion
Execution halted

Could you please suggest what could be the possible causes of this?
Many thanks.
--duangdao.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.