jianhong / atacseqqc Goto Github PK

ATAC-seq Quality Control

Home Page: https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html

R 99.03% TeX 0.97%

atacseqqc's Introduction

ATACseqQC

ATAC sequencing Quality Control

ATAC-seq, an assay for Transposase-Accessible Chromatin using sequencing, is a rapid and sensitive method for chromatin accessibility analysis. It was developed as an alternative method to MNase-seq, FAIRE-seq and DNAse-seq. Comparing to the other methods, ATAC-seq requires less amount of the biological samples and time to process. In the process of analyzing several ATAC-seq dataset produced in our labs, we learned some of the unique aspects of the quality assessment for ATAC-seq data.To help users to quickly assess whether their ATAC-seq experiment is successful, we developed ATACseqQC package partially following the guideline published in Nature Method 2013 (Greenleaf et al.), including diagnostic plot of fragment size distribution, proportion of mitochondria reads, nucleosome positioning pattern, and CTCF or other Transcript Factor footprints.

Installation

To install this package, start R and enter:

library(BiocManager)
BiocManager::install("ATACseqQC")

Documentation

To view documentation of ATACseqQC, start R and enter:

browseVignettes("ATACseqQC")

atacseqqc's People

Contributors

Stargazers

Watchers

Forkers

jakevc alexyfyf springtan mperalc jianguozhou3 micahgearhart pollytikhonova ycl6 biobit markgene sfeng666 nine-sarayut

atacseqqc's Issues

Question about the read filtering by bamQC

Hi Jianhong,

Besides the reads with low alignment scores, the reads derived from mitochondrial DNA and the PCR/optical duplicates, are there any other filters applied by bamQC?
What are the parameters used for removing the reads with low alignment scores?
Is it only reads with a MAPQ score (as in samtools view -q 30) below 30 (based on bowtie2 alignment) being removed?
Are any other reads being removed?

Thank you so much in advance!
Best,
Etienne

Error in TSSEscore calculation

Thanks for developing this pipeline. I have run it successfully with the test file, but when I try with my ATAC-Seq data I am getting an error.

#Transcription Start Site (TSS) Enrichment Score
tsse <- TSSEscore(gal1, txs)
Error in rbind(...) :
number of columns of matrices must match (see arg 27)

My data are from mouse cells, so I change all the libraries to the mm10 build.

Do you know what could have gone wrong?

Thanks

Pepe

splitBam fragment sizes for nucleosome-free DNA

Hi,

I'm using your package and have a general question regarding the sizes in which BAM file is split with splitBam to generate nucleosome-free, mononucleosome... fragments. I see that the size limits for splitting the BAM are similar to Buenrostro et al. 2013.
However, why is there a 38 bp lower limit for nucleosome-free fragments? Why are not nucleosome-free fragments simply specified as < 100 bp? I went through the ATACseqQC paper but could not find a justification for this.

thanks for your help!

should the original bam be deduped and chrM deleted for the analysis of fragSizeDist()?

Is it necessray for the title described? Thanks.

Adjust the read start sites of mouse data

Hi,
I installed ATACseqQC to offset ATAC-seq aligning.
The example code in the Guide is for human data.
I downloaded BSgenome.Mmusculus.UCSC.mm9 but I'm struggling with another part.

seqlev <- "chr1" ## subsample data for quick run
which <- as(seqinfo(Hsapiens)[seqlev], "GRanges")

I want to offset all sample data and I cannot figure out how to change seqinfo() instead of Hsapiens.

Thank you.

Processing multiple bam files at once

Hello,
Thank you very much for creating ATACseqQC!
It is a great tool!
I may have missed it but is there a way to process multiple bam files at once?
Thank you very much in advance for letting me know.
Best,
Etienne

different TSS.filter values affect the signals showing on the heatmap

Hello @jianhong
Thanks for pushing the patch fixing the NA problem. I can generate the complete heatmap now. However, the red signals of heatmap are still partially shown on the yellow background.

NTILE <- 101
dws <- ups <- 1010
sigs <- enrichedFragments(gal=objs[c("NucleosomeFree", "mononucleosome", "dinucleosome", "trinucleosome")], TSS=TSS, librarySize=librarySize, upstream = ups, downstream = dws, TSS.filter=0.5, seqlev = paste0("chr", c(1:19, "X", "Y")), n.tile = NTILE)
sigs.log2 <- lapply(sigs, function(.ele) log2(.ele+1))
ATACheatmap <- featureAlignedHeatmap(cvglists=sigs.log2, feature.gr=reCenterPeaks(peaks=TSS, width=ups+dws), upstream = ups, downstream = dws, zeroAt=0.5, n.tile=NTILE)

I found the area of red signals showing on the heatmap depends on the value of TSS.filter in sigs <- enrichedFragments(). I tried several different values of TSS.filter, but whatever the value is, I still cannot get the complete heatmap.
Here is an example for TSS.filter=10. The area of red signal becomes bigger, but still not completed.

sigs <- enrichedFragments(gal=objs[c("NucleosomeFree", "mononucleosome", "dinucleosome", "trinucleosome")], TSS=TSS, librarySize=librarySize, upstream = ups, downstream = dws, TSS.filter=10, seqlev = paste0("chr", c(1:19, "X", "Y")), n.tile = NTILE)
sigs.log2 <- lapply(sigs, function(.ele) log2(.ele+1))
ATACheatmap <- featureAlignedHeatmap(cvglists=sigs.log2, feature.gr=reCenterPeaks(peaks=TSS, width=ups+dws), upstream = ups, downstream = dws, zeroAt=0.5, n.tile=NTILE)

Here also attached my TSS score plot.

Appreciate it if you could give me some recommendations about how to determine the best TSS.filter for generating the complete heatmap!
Thanks!
Best,
YJ

Question about bamQC

Hi Jianhong,

Thank you again so much for creating ATACseqQC!

To get the report of all the QC metrics with bamQC, it looks like I should use:
bamQC(bamFile, outPath = NULL)

To get the bam files cleaned up, it looks like I should use:
bamQC(bamfile,
index = bamfile,
mitochondria = "chrM",
outPath = sub(".bam", ".clean.bam", basename(bamfile)),
doubleCheckDup = FALSE)

Am I correct?
Is there a way to get both the QC metrics and the bam files cleaned up at the same time?

Thank you very much in advance.
Best,
Etienne

Strange TSS Enrichment Result

Hello,

I am getting a very strange result for the plot of TSS enrichment (see attached image). I have tried other packages that do QC for ATACseq and they gave a much more expected enrichment plot so I am very confused as to what is going on. Any ideas? I have also attached my code and can send a minimal bam file to reproduce the result if desired.

Best, Alex.

`library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(ATACseqQC)

bamTop100 <- scanBam(BamFile("rmdup.rmmm.sort2.bam", yieldSize = 100), param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)>0]
tags
which <- as(seqinformation[seqlev], "GRanges")
gal <- readBamFile("rmdup.rmmm.sort2.bam", tag=tags, which=which, asMates=TRUE, bigFile=TRUE)

files will be output into outPath

outPath <- "splited"
dir.create(outPath)

shift the coordinates of 5'ends of alignments in the bam file

seqinformation <- seqinfo(TxDb.Hsapiens.UCSC.hg38.knownGene)
which <- as(seqinformation[seqlev], "GRanges")
gal <- readBamFile("rmdup.rmmm.sort2.bam", tag=tags, which=which, asMates=TRUE, bigFile=TRUE)
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
txs <- transcripts(TxDb.Hsapiens.UCSC.hg38.knownGene)

tsse <- TSSEscore(gal1, txs)
tsse$TSSEscore

plot(100*(-9:10-.5), tsse$values, type="b",
xlab="distance to TSS",
ylab="aggregate TSS score")

`

NA values in sigs makes heatmap not completed

Hello ATACseqQC,
I found that the heatmap of my mouse ATAC-seq was not completed. Only half was shown whatever I tried.

txs <- txs[seqnames(txs) %in% c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chrX", "chrY")]
genome <- Mmusculus
objs <- splitGAlignmentsByCut(obj=gal1, txs=txs, genome=genome, outPath="D:/HYJ/splited_Cracd_KO1/")
bamfiles <- file.path("D:/HYJ/splited_Cracd_KO1/", c("NucleosomeFree.bam", "mononucleosome.bam", "dinucleosome.bam", "trinucleosome.bam"))
TSS <- promoters(txs, upstream=0, downstream=1)
TSS <- unique(TSS)
librarySize <- estLibSize(bamfiles)
librarySize
D:/HYJ/splited_Cracd_KO1//NucleosomeFree.bam D:/HYJ/splited_Cracd_KO1//mononucleosome.bam 
                                     7552685                                      1484090 
  D:/HYJ/splited_Cracd_KO1//dinucleosome.bam  D:/HYJ/splited_Cracd_KO1//trinucleosome.bam 
                                     1497497                                            0 

NTILE <- 101
dws <- ups <- 1010
sigs <- enrichedFragments(gal=objs[c("NucleosomeFree", "mononucleosome", "dinucleosome", "trinucleosome")], TSS=TSS, librarySize=librarySize, upstream = ups, downstream = dws, TSS.filter=0.5, seqlev = paste0("chr", c(1:19, "X", "Y")), n.tile = NTILE)
sigs.log2 <- lapply(sigs, function(.ele) log2(.ele+1))
featureAlignedHeatmap(cvglists=sigs.log2, feature.gr=reCenterPeaks(peaks=TSS, width=ups+dws), upstream = ups, downstream = dws, zeroAt=0.5, n.tile=NTILE)

I think the reason may be the NA values in sigs. And the NA values may be caused by the not well-aligned bam files because there is no conservation=mm10_gscore in objs <- splitGAlignmentsByCut(obj=gal1, txs=txs, genome=genome, outPath="D:/HYJ/splited_Cracd_KO1/"). However, if I add conservation=mm10_gscore, the splitGAlignmentsByCut() will generate Error: subscript contains invalid names.

sigs
$NucleosomeFree
               [,1]       [,2]     [,3]     [,4]      [,5]     [,6]     [,7]     [,8]     [,9]     [,10]    [,11]
    [1,]         NA         NA       NA       NA        NA       NA       NA       NA       NA        NA       NA
    [2,]  1.5850972  1.5850972 0.000000 0.000000  0.000000 0.000000 0.000000 0.000000 0.000000  0.000000 0.000000
    [3,]  0.0000000  0.0000000 0.000000 0.000000  0.000000 0.000000 0.000000 2.451784 2.451784  2.451784 2.451784
    [4,]  0.0000000  0.0000000 0.000000 0.000000  0.000000 1.585097 1.585097 1.585097 1.585097  1.585097 0.000000
    [5,]  0.0000000  1.2258919 1.225892 1.225892  1.225892 1.225892 0.000000 0.000000 0.000000  0.000000 0.000000
    [6,]         NA         NA       NA       NA        NA       NA       NA       NA       NA        NA       NA
    [7,]         NA         NA       NA       NA        NA       NA       NA       NA       NA        NA       NA
    [8,]  0.0000000  0.0000000 0.000000 0.000000  0.000000 0.000000 0.000000 0.000000 0.000000  0.000000 0.000000
    [9,]         NA         NA       NA       NA        NA       NA       NA       NA       NA        NA       NA

Could you please help me with this issue?
Thanks!
Best,
YJ

shiftGAlignmentsList() error

Hello,

I've been running the function and see following error many times:
[bam_translate] RG tag "1" on read "K00180:164:HCG3YBBXX:2:1207:28057:2879" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] RG tag "1" on read "K00180:164:HCG3YBBXX:2:1124:23084:46627" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.

The input bam file has @rg header with ID:1
Is there any way to output RG tags correctly.
I'm running the following R commands:

library(ATACseqQC)
library(Rsamtools)

bamfile tags to be read in

possibleTag <- combn(LETTERS, 2)
possibleTag <- c(paste0(possibleTag[1, ], possibleTag[2, ]),
paste0(possibleTag[2, ], possibleTag[1, ]))
bamTop100 <- scanBam(BamFile(in_bam, yieldSize = 100),
param = ScanBamParam(tag=possibleTag))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)==100]

shift the coordinates of 5'ends of alignments in the bam file

gal <- readBamFile(in_bam, tag=tags, asMates=T, bigFile=T)
shiftGAlignmentsList(gal, outbam=out_bam)

Thank you,
Hiroko

Request: Supporting BamFile and BamFileList as `bamFiles` arg

Would you consider supporting BamFile and BamFileList objects (from the Rsamtools package), in addition to character vectors, as the bamFiles argument to functions in this package? This actually almost works with the current code, so I don't think it will require many changes to be made.

A concrete example of where this is useful: I have a bunch of ATAC-seq BAM files where the (.bam, .bai) pair are of the form (file.bam, file.bai). Right now, I have to create a symbolic link from file.bai to file.bam.bai because this is the format ATACseqQC expects. With this suggested change, I could instead do bfl <- Rsamtools::BamFileList("file1.bam", "file2.bam"), which automatically picks up the file1.bai and file2.bai indices, and all the Rsamtools and GenomicAlignments functions used by ATACseqQC would immediately work with these BamFile and BamFileList objects.

CIGAR and query sequence are of different length, as.Bam truncated input

sorry please delete this issue. just notice the bam file has CIGAR "35M2I101M" cannot be processed.

treated as unmapped - splitGAlignmentsByCut

Hello,
Currently I am using ATACseqQC to try quality accessment of my ATAC-seq data, when I apply splitGAlignmentsByCut function for preparation of heat;ap, it gives this warning message continuously.

[W::sam_parse1] mapped mate cannot have zero coordinate: treated as unmapped

It seems that it indicates the inconsistent of mate information. However, during the alignment, by post-alignment filtration, I only remained the reads who was properly and uniquely mapped.

Here is the final status of input bam file for splitGAlignmentsByCut, checked by samtools flagstat.

48281627 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
48281627 + 0 mapped (100.00% : N/A)
48281627 + 0 paired in sequencing
24163804 + 0 read1
24117823 + 0 read2
48281627 + 0 properly paired (100.00% : N/A)
48281627 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Could you give a piece of advice is possible? Thanks in advance,

estimateLibComplexity error

Hi ATACseqQC team,
thanks for your nice package for atac-seq quality control. I get an error while i am doing
estimateLibComplexity(readsDupFreq("test.bam"))
it gives me the following error:

Error in f(t * t.scale): could not find function "f"
Traceback:

estimateLibComplexity(readsDupFreq(poorR1))
suppressWarnings({
. result = ds.rSAC.bootstrap(histFile, r = 1, times = times)
. })
withCallingHandlers(expr, warning = function(w) if (inherits(w,
. classes)) tryInvokeRestart("muffleWarning"))
ds.rSAC.bootstrap(histFile, r = 1, times = times)
f.rSACs[times]

when I try it on estimateLibComplexity(readsDupFreq("GL1.bam")) it works well but if I work with GL2.bam and GL3.bam from package extdata files similar error comes up.

I am using ATACseqQC Version: 1.12.5

could you please let me know what is difference between them and how can i resolve it?

Thanks,

Noori

error in factorFootprints function

Hi,

Thanks for making the tool.
I am trying to run factorFootprints function and getting the following error:

sigs <- factorFootprints( bamfiles=shiftedBamfile,
                            pfm=Wt1[[1]] , 
                           genome = Mmusculus, 
                         min.score="90%", 
                           upstream=100, downstream=100 )

Error in asMethod(object) : 
  cannot create a GRanges object from a Seqinfo object with NA
  seqlengths

Here is my BAM file which is an output of shiftGAlignmentsList, the data is only for mouse chr19:
https://drive.google.com/open?id=1RM6W1jJt10jrrXtMtqzfj47u4--IlQDB

Question number 2: What are the exact formatting rules for a bindingSites argument of the factorFootprints function? How many columns, in what format etc.??

Thanks in advance and regards
Tim

shiftGAlignmentsList time

Hi, congrats on the package. I'm going to integrate it in my pipelines
I am particularly interested on shiftGAlignmentsList. I think it's really useful to shift reads while keeping bam format, even paired end. This is interesting for downstream analysis purposes.
However, the importing bam file+shifting + exporting is taking a lot of time for averaged size bam files.

Just wanted to ask if there is any chance that this gets improved, maybe giving the option to use more threads? Don't know if it's possible.

Thank you,

Error when ATACseqQC load preseqR function ds.mincount.bootstrap

Dear Jianhong,

I'm using the latest R 3.5.0 and Bioconductor 3.7 (BiocInstaller 1.30.0). I had the following error:
object 'ds.mincount.bootstrap' is not exported by 'namespace:preseqR'.

I think it is because of the latest preseqR 4.0.0 package don't have ds.mincount.bootstrap anymore, which is from preseqR 3.1.2. It seems the author replace it with another function called ds.rSAC.bootstrap which does the same job.

Can you please have a look at this issue?

Thank you very much.

How do I read a BAM file？

Hi，
Thanks for developing this user-friendly tool. This is my first contact with ATACseqQC. In your tutorial, the BAM file is built into the R package. How can I use my own BAM file for analysis?
I tried a few, as shown below:
bamfile <- readBamFile("/home/vip1/atac/align/ms3468-2.last.bam")
bamfile.labels <- "ms3468-2.last"
estimateLibComplexity(readsDupFreq(bamfile))
The first two lines of code worked fine, but the third line gave me an error.
The error is : Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'testPairedEndBam' for signature '"GAlignments"

I sincerely hope to get your kind reply.

Read the shifted bam and continue the QC analysis

Hello Jianhong,

Thanks for your excellent work!

My bam file is so big that it costs a lot of resources during the running.
In order to reduce the time consumption, I tried to perform the analysis step by step. It worked well for bamQC and shiftGAlignmentsList step. But it broke when I used the shifted bam as input and then performed PTscore analysis.

Could you please help with the problem? Many thanks in advance.

bamfile <- 'shifted.bam'
gal1 <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=FALSE)
pt <- PTscore(gal1, txs)

Error in PTscore(gal1, txs) : is(obj, "GAlignments") is not TRUE

Pls help with the error... I am using hg38

objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath,
conservation=phastCons100way.UCSC.hg38)

Error in mergeNamedAtomicVectors(seqlengths(x), seqlengths(y), what = c("sequence", :
sequence chr10 has incompatible seqlengths:

in 'x': 135534747
in 'y': 133797422

error running factorFootprints -- evaluations nested too deeply

When running factorFootprints, I'm getting the following error:

Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

I have tried multiple iterations of limiting the number of expressions:

options(expressions=1000)

but no luck in resolving this error. I am also getting a C stack error if I don't limit the number of expressions:

Error: C stack usage  7970500 is too close to the limit

Seems to me like this is a recursive loop error/possible bug. If you know where this is coming from, I'd appreciate the help in resolving this. Thanks!

EDIT: Also here's the session info (attached)
session_info.txt

factorFootprints error on system.file("extdata", "GL1.bam",package="ATACseqQC")

Hi, when I follow the examples to run factorFootprints for ATACseqQC v1.12.0 on GL1.bam file from the ATACseqQC manual document or from the ATACseqQC guide, I am getting segfault. Any idea what may be causing the error? Traceback says the following:

*** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
1: cairoVersion()
2: grSoftVersion()
3: symbolType1support()
4: symbolfamilyDefault(family)
5: svg(psfilename, width = 1, height = 1, bg = NA, pointsize = 72, family = font)
6: importSVG(font, color[i], rname[i], fontface)
7: coloredSymbols(ncha, font, pfm@color, rname, fontface = fontface)
8: motifStack::plotMotifLogoA(motif)
9: (function (Profile, Mlen = 0, xlab = "Dist. to motif (bp)", ylab = "Cut-site probability", legLabels = c("For. strand", "Rev. strand"), legTitle, xlim, ylim, newpage = TRUE, motif, segmentation) { stopifnot(is(motif, "pfm")) if (newpage) grid.newpage() S <- length(Profile) W <- ((S/2) - Mlen)/2 vp <- plotViewport(margins = c(5.1, 5.1, 4.1, 2.1), name = "plotRegion") pushViewport(vp) if (missing(xlim)) { xlim <- c(0, S/2 + 1) } if (missing(ylim)) { ylim <- c(0, max(Profile) * 1.12) } vp1 <- viewport(y = 0.4, height = 0.8, xscale = xlim, yscale = ylim, name = "footprints") pushViewport(vp1) grid.lines(x = 1:(S/2), y = Profile[1:(S/2)], default.units = "native", gp = gpar(lwd = 2, col = "darkblue")) grid.lines(x = 1:(S/2), y = Profile[(S/2 + 1):S], default.units = "native", gp = gpar(lwd = 2, col = "darkred")) if (!missing(segmentation)) { if (length(segmentation) == 4) { grid.segments(x0 = c(0, segmentation[1], W, W + Mlen, S/2 - segmentation[1]), x1 = c(segmentation[1], W, W + Mlen, S/2 - segmentation[1], S/2), y0 = c(segmentation[2], segmentation[3], segmentation[4], segmentation[3], segmentation[2]), y1 = c(segmentation[2], segmentation[3], segmentation[4], segmentation[3], segmentation[2]), default.units = "native", gp = gpar(lwd = 2, col = "red", lty = 2)) } } grid.xaxis(at = c(seq(1, W, length.out = 3), W + seq(1, Mlen), W + Mlen + seq(1, W, length.out = 3)), label = c(-(W + 1 - seq(1, W + 1, length.out = 3)), rep("", Mlen), seq(0, W, len = 3))) grid.yaxis() grid.lines(x = c(W, W, 0), y = c(0, max(Profile), ylim[2]), default.units = "native", gp = gpar(lty = 2)) grid.lines(x = c(W + Mlen + 1, W + Mlen + 1, S/2), y = c(0, max(Profile), ylim[2]), default.units = "native", gp = gpar(lty = 2)) upViewport() vp2 <- viewport(y = 0.9, height = 0.2, xscale = c(0, S/2 + 1), name = "motif") pushViewport(vp2) motifStack::plotMotifLogoA(motif) upViewport() upViewport() grid.text(xlab, y = unit(1, "lines")) grid.text(ylab, x = unit(1, "line"), rot = 90) if (missing(legTitle)) { legvp <- viewport(x = unit(1, "npc") - convertX(unit(1, "lines"), unitTo = "npc"), y = unit(1, "npc") - convertY(unit(1, "lines"), unitTo = "npc"), width = convertX(unit(14, "lines"), unitTo = "npc"), height = convertY(unit(3, "lines"), unitTo = "npc"), just = c("right", "top"), name = "legendWraper") pushViewport(legvp) grid.legend(labels = legLabels, gp = gpar(lwd = 2, lty = 1, col = c("darkblue", "darkred"))) upViewport() } else { grid.text(legTitle, y = unit(1, "npc") - convertY(unit(1, "lines"), unitTo = "npc"), gp = gpar(cex = 1.2, fontface = "bold")) } return(invisible())})(Profile = c(0.0495208651379799, 0.0866615139914648, 0.12380216284495, 0.0619010814224748, 0.0990417302759597, 0.0371406488534849, 0.0742812977069698, 0.0866615139914648, 0.185703244267425, 0.012380216284495, 0.111421946560455, 0.0742812977069698, 0.0990417302759597, 0.0990417302759597, 0.0742812977069698, 0.0742812977069698, 0.111421946560455, 0.12380216284495, 0.0247604325689899, 0.0742812977069698, 0.0619010814224748, 0.12380216284495, 0.0866615139914648, 0.111421946560455, 0.0247604325689899, 0.0990417302759597, 0.0866615139914648, 0.0247604325689899, 0.111421946560455, 0.0990417302759597, 0.0619010814224748, 0.111421946560455, 0.0742812977069698, 0.0990417302759597, 0.0495208651379799, 0.0990417302759597, 0.0866615139914648, 0.12380216284495, 0.0619010814224748, 0.17332302798293, 0.136182379129445, 0.0990417302759597, 0.012380216284495, 0.0990417302759597, 0.0990417302759597, 0.136182379129445, 0.0742812977069698, 0.0619010814224748, 0.0371406488534849, 0.14856259541394, 0.12380216284495, 0.0866615139914648, 0.111421946560455, 0.12380216284495, 0.111421946560455, 0.0371406488534849, 0.111421946560455, 0.235224109405404, 0.12380216284495, 0.111421946560455, 0.0866615139914648, 0.0619010814224748, 0.12380216284495, 0.0866615139914648, 0.136182379129445, 0.111421946560455, 0.0866615139914648, 0.12380216284495, 0.0866615139914648, 0.0866615139914648, 0.198083460551919, 0.0990417302759597, 0.185703244267425, 0.136182379129445, 0.0866615139914648, 0.12380216284495, 0.111421946560455, 0.12380216284495, 0.14856259541394, 0.17332302798293, 0.160942811698435, 0.14856259541394, 0.12380216284495, 0.111421946560455, 0.210463676836414, 0.111421946560455, 0.0742812977069698, 0.0619010814224748, 0.0619010814224748, 0.0495208651379799, 0.160942811698435, 0.0866615139914648, 0.0742812977069698, 0.0495208651379799, 0.0619010814224748, 0.0371406488534849, 0.0495208651379799, 0.0247604325689899, 0.0495208651379799, 0.0619010814224748, 0.0371406488534849, 0.0742812977069698, 0.0619010814224748, 0.0247604325689899, 0.0247604325689899, 0.0371406488534849, 0.0495208651379799, 0.0371406488534849, 0.0495208651379799, 0.0371406488534849, 0, 0.160942811698435, 0.0742812977069698, 0.0619010814224748, 0.14856259541394, 0.0619010814224748, 0.259984541974394, 0.012380216284495, 0.0247604325689899, 0.0495208651379799, 0.012380216284495, 0.0495208651379799, 0.0866615139914648, 0.0371406488534849, 0.012380216284495, 0.0619010814224748, 0.0619010814224748, 0.0742812977069698, 0.14856259541394, 0.0990417302759597, 0.111421946560455, 0.0990417302759597, 0.0990417302759597, 0.235224109405404, 0.235224109405404, 0.160942811698435, 0.17332302798293, 0.136182379129445, 0.0990417302759597, 0.0619010814224748, 0.14856259541394, 0.0866615139914648, 0.0990417302759597, 0.185703244267425, 0.160942811698435, 0.17332302798293, 0.0742812977069698, 0.0990417302759597, 0.136182379129445, 0.0247604325689899, 0.12380216284495, 0.0495208651379799, 0.0990417302759597, 0.185703244267425, 0.136182379129445, 0.185703244267425, 0.0495208651379799, 0.0371406488534849, 0.0742812977069698, 0.0866615139914648, 0.160942811698435, 0.0990417302759597, 0.0619010814224748, 0.0990417302759597, 0.111421946560455, 0.136182379129445, 0.0990417302759597, 0.0742812977069698, 0.0866615139914648, 0.17332302798293, 0.14856259541394, 0.14856259541394, 0.0742812977069698, 0.0990417302759597, 0.0866615139914648, 0.12380216284495, 0.111421946560455, 0.111421946560455, 0.111421946560455, 0.0866615139914648, 0.0742812977069698, 0.0866615139914648, 0.0742812977069698, 0.0990417302759597, 0.0990417302759597, 0.0247604325689899, 0.0866615139914648, 0.0742812977069698, 0.0371406488534849, 0.0619010814224748, 0.0371406488534849, 0.12380216284495, 0.0619010814224748, 0.0619010814224748, 0.0742812977069698, 0.111421946560455, 0.0495208651379799, 0.0990417302759597, 0.17332302798293, 0.136182379129445, 0.14856259541394, 0.0742812977069698, 0.0866615139914648, 0.0495208651379799, 0.0866615139914648, 0.0866615139914648, 0.0742812977069698, 0.0990417302759597, 0.012380216284495, 0.0742812977069698, 0.0495208651379799, 0.0619010814224748, 0.0866615139914648, 0.0495208651379799, 0.0742812977069698, 0.0990417302759597, 0.160942811698435, 0, 0.0619010814224748, 0.0557109732802274, 0.0804714058492173, 0.12380216284495, 0.0495208651379799, 0.185703244267425, 0.0619010814224748, 0.0866615139914648, 0.0866615139914648, 0.0742812977069698, 0.0371406488534849, 0.0619010814224748, 0.0990417302759597, 0.0619010814224748, 0.0619010814224748, 0.0619010814224748, 0.0619010814224748, 0.14856259541394, 0.0619010814224748, 0.0742812977069698, 0.0866615139914648, 0.0866615139914648, 0.0742812977069698, 0.0742812977069698, 0.0247604325689899, 0.0495208651379799, 0.0495208651379799, 0.0619010814224748, 0.0495208651379799, 0.0990417302759597, 0.0742812977069698, 0.0247604325689899, 0.111421946560455, 0.0247604325689899, 0.12380216284495, 0.111421946560455, 0.0990417302759597, 0.0619010814224748, 0.12380216284495, 0.0866615139914648, 0.0495208651379799, 0.0990417302759597, 0.222843893120909, 0.0619010814224748, 0.0990417302759597, 0.0742812977069698, 0.0990417302759597, 0.111421946560455, 0.111421946560455, 0.0371406488534849, 0.0495208651379799, 0.0495208651379799, 0.0495208651379799, 0.0495208651379799, 0.17332302798293, 0.14856259541394, 0.0495208651379799, 0.0619010814224748, 0.136182379129445, 0.12380216284495, 0.17332302798293, 0.0495208651379799, 0.0866615139914648, 0.14856259541394, 0.136182379129445, 0.0990417302759597, 0.0866615139914648, 0.0990417302759597, 0.0990417302759597, 0.0371406488534849, 0.136182379129445, 0.0990417302759597, 0.0866615139914648, 0.0990417302759597, 0.0866615139914648, 0.14856259541394, 0.0990417302759597, 0.14856259541394, 0.160942811698435, 0.160942811698435, 0.12380216284495, 0.17332302798293, 0.222843893120909, 0.160942811698435, 0.284744974543384, 0.14856259541394, 0.17332302798293, 0.12380216284495, 0.14856259541394, 0.185703244267425, 0.0990417302759597, 0.0990417302759597, 0.12380216284495, 0.14856259541394, 0.0866615139914648, 0.0990417302759597, 0.0619010814224748, 0.0866615139914648, 0.0866615139914648, 0.0247604325689899, 0.111421946560455, 0.0371406488534849, 0.0742812977069698, 0.012380216284495, 0.0742812977069698, 0.0619010814224748, 0.0247604325689899, 0.0371406488534849, 0.0371406488534849, 0.0371406488534849, 0.0247604325689899, 0.012380216284495, 0.0742812977069698, 0.0495208651379799, 0.136182379129445, 0.160942811698435, 0.0866615139914648, 0.111421946560455, 0.0495208651379799, 0.0619010814224748, 0.0371406488534849, 0.012380216284495, 0.0247604325689899, 0.0990417302759597, 0.0371406488534849, 0.0866615139914648, 0.0495208651379799, 0.0742812977069698, 0.0742812977069698, 0.0495208651379799, 0.0619010814224748, 0.111421946560455, 0.0742812977069698, 0.136182379129445, 0.259984541974394, 0.12380216284495, 0.309505407112374, 0.160942811698435, 0.235224109405404, 0.136182379129445, 0.111421946560455, 0.111421946560455, 0.0866615139914648, 0.136182379129445, 0.111421946560455, 0.17332302798293, 0.185703244267425, 0.0742812977069698, 0.17332302798293, 0.0742812977069698, 0.0990417302759597, 0.14856259541394, 0.0866615139914648, 0.14856259541394, 0.0990417302759597, 0.17332302798293, 0.136182379129445, 0.0742812977069698, 0.0619010814224748, 0.0495208651379799, 0.0742812977069698, 0.0495208651379799, 0.12380216284495, 0.12380216284495, 0.0990417302759597, 0.136182379129445, 0.0990417302759597, 0.0866615139914648, 0.111421946560455, 0.0619010814224748, 0.0866615139914648, 0.0742812977069698, 0.0742812977069698, 0.0990417302759597, 0.0247604325689899, 0.0742812977069698, 0.12380216284495, 0.111421946560455, 0.0866615139914648, 0.136182379129445, 0.0742812977069698, 0.160942811698435, 0.0990417302759597, 0.0990417302759597, 0.0866615139914648, 0.111421946560455, 0.0495208651379799, 0.0990417302759597, 0.0742812977069698, 0.0495208651379799, 0.0742812977069698, 0.0371406488534849, 0.0247604325689899, 0.0495208651379799, 0.0495208651379799, 0.0619010814224748, 0.0742812977069698, 0.0990417302759597, 0.0990417302759597, 0.0990417302759597, 0.111421946560455, 0.111421946560455, 0.0619010814224748, 0.17332302798293, 0.0742812977069698, 0.0742812977069698, 0.198083460551919, 0.0990417302759597, 0.0990417302759597, 0.0495208651379799, 0.0247604325689899, 0.0619010814224748, 0.0247604325689899, 0.012380216284495, 0.160942811698435, 0.0619010814224748, 0.12380216284495, 0.14856259541394, 0.14856259541394, 0.0495208651379799, 0.111421946560455, 0.0309505407112374), legLabels = c("For. strand", "Rev. strand"), ylab = "Cut-site probability", Mlen = 20L, motif = new("pfm", mat = c(0.173018103662065, 0.415970901876498, 0.246011407787055, 0.164999586674382, 0.22600644787964, 0.275026866165165, 0.228982392328677, 0.269984293626519, 0.288005290567909, 0.298999752004629, 0.258989832189799, 0.154005125237662, 0.104984706952137, 0.319996693395057, 0.268000330660494, 0.307018268992312, 0.333967099280813, 0.113995205422832, 0.416053567000083, 0.135984128296272, 0.0600148797222452, 0.326031247416715, 0.553938993138795, 0.0600148797222452, 0.0519963627345623, 0.562040175250062, 0.0519963627345623, 0.333967099280813, 0.0370339753657932, 0.00495990741506159, 0.826981896337935, 0.13102422088121, 0.022980904356452, 0.960072745308754, 0.0129784244027445, 0.00396792593204927, 0.000991981483012317, 0.997024055550963, 0.000991981483012317, 0.000991981483012317, 0.245019426304042, 0.669918161527651, 0.0270314954120856, 0.0580309167562206, 0.000991981483012317, 0.689013805075639, 0.000991981483012317, 0.309002231958337, 0.000991981483012317, 0.997024055550963, 0.000991981483012317, 0.000991981483012317, 0.0500123997685377, 0.0429858642638671, 0.0170290154583781, 0.889972720509217, 0.253037943291725, 0.0729933041249897, 0.525006199884269, 0.148962552699016, 0.00396792593204927, 0.418037529966107, 0.546003141274696, 0.0319914028271472, 0.172026122179053, 0.150037199305613, 0.0549723071835992, 0.622964371331735, 0.000991981483012317, 0.000991981483012317, 0.997024055550963, 0.000991981483012317, 0.0190129784244027, 0.0629908241712821, 0.865007853186741, 0.0529883442175746, 0.192031082086468, 0.43192527072828, 0.150037199305613, 0.22600644787964), name = "motif", alphabet = "DNA", color = c(A = "#00811B", C = "#2000C7", G = "#FFB32C", T = "#D00001"), background = c(A = 0.25, C = 0.25, G = 0.25, T = 0.25), tags = list(), markers = list()), segmentation = c(pos = 51.75, distal_abun = 0.0826919030164958, proximal_abun = 0.114054366237735, binding = 0.0615915760153625))
10: do.call(plotFootprints, args = args)
11: do.call(plotFootprints, args = args)
12: doTryCatch(return(expr), name, parentenv, handler)
13: tryCatchOne(expr, names, parentenv, handlers[[1L]])
14: tryCatchList(expr, classes, parentenv, handlers)
15: tryCatch({ args <- list(...) if (groupFlag) { args$Profile <- c(Profile[["+"]], Profile[["-"]]) args$legLabels <- c("For. strand", "Rev. strand") } else { args$Profile <- c(Profile[[1]], Profile[[2]]) args$legLabels <- names(Profile) } args$ylab <- ifelse(anchor == "cut site", "Cut-site probability", "reads density (arbitrary unit)") args$Mlen <- wid args$motif <- pwm2pfm(pfm) args$segmentation <- Profile.seg do.call(plotFootprints, args = args)}, error = function(e) { message(e)})
16: factorFootprints(shiftedBamfile, pfm = CTCF[[1]], genome = genome, min.score = "90%", seqlev = seqlev, upstream = 100, downstream = 100)

Question about readBamFile function

Hi,
First thanks to build this package it is very interesting tool.
So I'm trying to integrate your package in my ATACseq pipeline.

But I have some issue. In fact, when I trying to build the "gal" object as your exemple in the vignette, this object is empty but my bamfile is not. the fact is i'm charging my own bamfile because it's not on your package. in end I have a empty splitted bam file.

here you have the code that I run :

Loading Input files / directory

bamfile_wthDup=scanBam(bamFilePath_wthDup)
bamfile_whtoutDup=scanBam(bamFilePath_wthoutDup)

bamfile tags to be read in

possibleTag <- combn(LETTERS, 2)

possibleTag <- c(paste0(possibleTag[1, ], possibleTag[2, ]),
paste0(possibleTag[2, ], possibleTag[1, ]))

bamTop100 <- scanBam(BamFile(bamFilePath_wthoutDup, yieldSize = 100),
param = ScanBamParam(tag=possibleTag))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)>0]
tags

library(BSgenome.Cmelo.FLOCAD.CMiso1.1)

seqlev <- "CMiso1.1chr01" # sumsmple data for quick run
which <- as(seqinfo(CMiso)[seqlev], "GRanges")
gal <- readBamFile("bamfile_whtoutDup", tag=tags, which=which, asMates=TRUE, bigFile=TRUE) # TODO : here the bamfile seems to be not read
shiftedBamfile <- file.path(outPath, "shifted.bam")

`
Have any idea what's I'm doing wrong ?
Thanks in advance

factorFootprints 'seqnames' issue

Hi @jianhong

I am having an issue with the factorFootprints coding. I can run everything up to this point fine, but when I try to run this through, I get the following error. Our bioinformatics team have looked into it but can't figure it out. Any thoughts you have would be much appreciated.

Thanks,

library(BSgenome.Mmusculus.UCSC.mm10)
library(MotifDb)
CTCF <- query(MotifDb, c("CTCF"))
CTCF <- as.list(CTCF)
sigs <- factorFootprints(shiftedBamfile, pfm=CTCF[[1]],
genome=genome,
min.score="90%", seqlev=seqlev,
upstream=100, downstream=100)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'seqnames' for signature '"NULL"'

single reads and distal regions

Hi,

I have two questions regarding ATACseqQC:

Can it be applied on single end bams?
Can the analysis be restricted to distal elements (provided as a bed file)?

Thanks in advance!

Best Regards,
Attila

Error : object 'gscores' is not exported by 'namespace:GenomicScores'

I am trying to install this package on my system and am getting the following error:

> install_github("jianhong/ATACseqQC")
Downloading GitHub repo jianhong/ATACseqQC@master
from URL https://api.github.com/repos/jianhong/ATACseqQC/zipball/master
Installing ATACseqQC
'/n/apps/CentOS7/install/r-3.4.3/lib64/R/bin/R' --no-site-file --no-environ  \
  --no-save --no-restore --quiet CMD INSTALL  \
  '/tmp/RtmpH7HxxH/devtools503610f09089/jianhong-ATACseqQC-e4a0c97'  \
  --library='/home/jv2245/R/x86_64-pc-linux-gnu-library/3.4' --install-tests 

* installing *source* package ‘ATACseqQC’ ...
** R
** inst
** tests
** preparing package for lazy loading
Error : object 'gscores' is not exported by 'namespace:GenomicScores'
ERROR: lazy loading failed for package 'ATACseqQC'
* removing '/home/jv2245/R/x86_64-pc-linux-gnu-library/3.4/ATACseqQC'
Installation failed: Command failed (1)
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /n/apps/CentOS7/install/r-3.4.3/lib64/R/lib/libRblas.so
LAPACK: /n/apps/CentOS7/install/r-3.4.3/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] devtools_1.13.5

loaded via a namespace (and not attached):
 [1] httr_1.3.1           compiler_3.4.3       R6_2.2.2            
 [4] BiocInstaller_1.28.0 tools_3.4.3          withr_2.1.2         
 [7] curl_3.2             memoise_1.1.0        knitr_1.20          
[10] git2r_0.21.0         digest_0.6.15

I also tried installing after upgrading to the latest version of GenomicScores and got the same error.
Anyone else have this problem?

splitGAlignmentsByCut 'mergeBam' error

Hey!

Enjoying the package so far, just having difficulties with this error, details below.

The error arises when I try to split using any more than one chromosome as input to splitGAlignmentsByCut;

files will be output into outPath

outPath <- "splited"
dir.create(outPath)
## shift the coordinates of 5'ends of alignments in the bam file
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
which <- as(seqinfo(Mmusculus), "GRanges")[1:2]
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE)
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE, flag = scanBamFlag(isSecondaryAlignment = FALSE, isUnmappedQuery = FALSE, isNotPassingQualityControls = FALSE, isSupplementaryAlignment = FALSE))
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
genome <- Mmusculus
objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath)

Error:

Error in value[[3L]](cond) : 
  'mergeBam' 'destination' exists, 'overwrite' is FALSE
  destination: splited/NucleosomeFree.bam

There are still split BAMs in the output directory, however they only contain the 2nd chromosome in this case.

If I include more chromosomes (1-X,Y), it seems to output only the last chromosome in the list (along with same error).

Just the first chromosome (i.e. which <- as(seqinfo(Mmusculus), "GRanges")[1] results in no error, but hoping for all chromosomes in output.

I could try manually setting the mergeBam to overwrite = TRUE, but not sure this is the correct approach.

Optimizing Time and Memory Usage for Large BAM Files

Hi,
I've been trying to use ATACseqQC for quality control of some plant BAM files (exceeding 100GB in size). However, I encountered difficulties when attempting to run it, even after allocating a large amount of memory in our HPC system. I am looking for ways to optimize both time and memory usage, especially when dealing with significantly large BAM files. I couldn't find any option, for example, for the shiftGAlignmentsList function.

Many thanks in advance for your assistance.

phastCons100way.UCSC.hg19 equivalent for mm10 ?

Hi,

I work with mouse genome (mm10) and i don't find any phastCons100way.UCSC.hg19 equivalent for mm10 to do the "Split reads" part of bioconductor tutorial.

Sorry for my bad english.

Thanks in advance for your help.

no "estimateLibComplexity"

Hi,

I have tried the command for estimating library complexity, but it doesn't work.
The error is 'Error in estimateLibComplexity(readsDupFreq(bamfile)) :'
'no "estimateLibComplexity" function'. Anyone knows the solution?

Issue: loading bam file using readBamFile uses huge amount of memory causing crashes

Hi, I've been trying to get this QC package to work as part of an ATAC-seq pipeline I've been working on.
I have followed the instructions from the ATACseqQC vignette on Bioconductor, and I've been able to get plots, etc. working for individual chromosomes (I am working with Zebrafish danRer10).

However, trying to perform QC for a whole sample (single bam file) for all chromosomes 1-25 results in very heavy memory use (I've run readBamFile on a ~9 gb bam input file, and about 30 minutes later my system is reporting that 38/64gb or 80.5% of available memory is used!!)

Is this intended behaviour? Because I've tried running my pipeline that uses part of ATACseqQC, only to have it crash at the last step because the R process got killed for memory overconsumption. The system isn't exactly tiny with 64 gb RAM, yet it gets close to crashing for just one sample.

I am able to cause this on my system using the following standalone code:

fragSize <- fragSizeDist(bamfile, 'test')
library(BSgenome.Drerio.UCSC.danRer10)
seqlev <- paste0('chr', 1:25)
tags <- c("AS", "XN", "XM", "XO", "XG", "NM", "MD", "YS", "YT")
which <- as(seqinfo(Drerio)[seqlev], 'GRanges')
gal <- readBamFile(bamfile, tag = tags, which = which, asMates = T)

My bam file is properly indexed and has a size of 9524 Mb.

I am using version 1.2 of ATACseqQC on R 3.4.3. Please let me know if this has been addressed in the latest version of ATACseqQC.

Thanks
Stephen

Error: subscript contains invalid names for splitGAlignmentsByCut() function

Hello ATACseqQC,
I'm now running ATACseqQC on mouse data. All steps work fine until splitGAlignmentsByCut(), i got error Error: subscript contains invalid names. I check several times but find no spelling errors for the arguments of splitGAlignmentsByCut(). By the way, these codes work fine for running the official tutorial. Could you please help me to solve this error?
Thanks!
Best,
YJ

library(ATACseqQC)
library(ChIPpeakAnno)
library(MotifDb)
library(GenomicAlignments)
library(Rsamtools)
library(GenomicScores)
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
mm10_gscore <- getGScores("phastCons60way.UCSC.mm10")

possibleTag <- list("integer"=c("AM", "AS", "CM", "CP", "FI", "H0", "H1", "H2", "HI", "IH", "MQ", "NH", "NM", "OP", "PQ", "SM", "TC", "UQ"), "character"=c("BC", "BQ", "BZ", "CB", "CC", "CO", "CQ", "CR", "CS", "CT", "CY", "E2", "FS", "LB", "MC", "MD", "MI", "OA", "OC", "OQ", "OX", "PG", "PT", "PU", "Q2", "QT", "QX", "R2", "RG", "RX", "SA", "TS", "U2"))
bamTop100 <- scanBam(BamFile(file="D:/HYJ/postQC_sorted_Cracd_KO1.bam", yieldSize = 100), param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)>0]
seqinformation <- seqinfo(TxDb.Mmusculus.UCSC.mm10.knownGene)
which <- as(seqinformation, "GRanges")
gal <- readBamFile(bamFile="D:/HYJ/postQC_sorted_Cracd_KO1.bam", which=which, tag=tags, asMates=TRUE, bigFile=TRUE)
gal1 <- shiftGAlignmentsList(gal, outbam="D:/HYJ/shifted_postQC_sorted_Cracd_KO1.bam")

txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
genome <- Mmusculus

objs <- splitGAlignmentsByCut(obj=gal1, txs=txs, genome=genome, conservation=mm10_gscore, outPath="D:/HYJ/splited_Cracd_KO1/")
Error: subscript contains invalid names

What does the x axis mean for Nucleosome Free Regions (NFR) score plot?

Hello ATACseqQC,
Thanks for developing this amazing package.
Could you please tell me what's the meaning of the x axis of Nucleosome Free Regions (NFR) score plot? It says 'the average signals of 400 bp window as X-axis', but the title of NFR score plot shows 'NFRscore for 200bp flanking TSSs'.

Thanks!
Best,
YJ

splitGAlignmentsByCut Error in value[[3L]](cond) : 'mergeBam' 'destination' exists, 'overwrite' is FALSE destination: splitBam/NucleosomeFree.bam

Hi,
I know this is an old error, but I have the same problem. I'm using ATACseqQC 1.8.5 and R 3.6.1, and I get the same error when I run more than one chromosome. It works with no problems for one chromosome but not for multiple chromosomes. Any help is appreciated.
the error message
Error in value[3L] :
'mergeBam' 'destination' exists, 'overwrite' is FALSE
destination: splitBam/NucleosomeFree.bam

My code:

load the library

rm(list=ls())
library(ChIPpeakAnno)
library(ATACseqQC)
library(GenomicRanges)
library(EnsDb.Hsapiens.v75)
library(AnnotationDbi)
library(Rsamtools)
library(BSgenome.TroutArrlyShort.NCBI.PRJNA623027)

setwd("/localstorage/ATACseq/result/")
trout_txdb <-makeTxDbFromGFF("/localstorage/TroutNewGenome/OmyArlee_1_1/OmyArrlyShort/GCF_013265735.2_USDA_OmykA_1.1_genomic_Short.gff")
saveDb(trout_txdb, file="Trout.sqlite")
trout_Annota <- loadDb("Trout.sqlite")
txs <- transcripts(trout_txdb)

read bam file

bamfile <- ("/localstorage/ATACseq/fastq_Adapter_cut/alignments_ARl/F1B_Arlsorted_RMmitDup_short.bam")
bamfile.labels <- gsub(".bam", "", basename(bamfile))
#bam file status
#bamQC(bamfileT, outPath = NULL)

generate fragement size distribution

fragSize <- fragSizeDist(bamfile, bamfile.labels)

#bamfile tags to be read in
possibleTag <- combn(LETTERS, 2)
possibleTag <- c(paste0(possibleTag[1, ], possibleTag[2, ]),
paste0(possibleTag[2, ], possibleTag[1, ]))

bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100),
param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)>0]
tags
tags <- tags[tags!="PG"]

outPath <- "splitBam"
if (dir.exists(outPath))
{
unlink(outPath, recursive = TRUE, force = TRUE)
}
dir.create(outPath)

seqlev <-(c( "NC_048565.1","NC_048566.1"))

which <- as(seqinfo(RainbowTroutArrly)[seqlev], "GRanges")
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE)

shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=RainbowTroutArrly, outPath=outPath )
dir(outPath)

Problem with shiftGAlignmentsList()

Thanks for your package! I'm going to generate a TSSE score plot for an bulk ATAC-seq data and I use code below:

possibleTag <- list("integer"=c("AM", "AS", "CM", "CP", "FI", "H0", "H1", "H2",
"HI", "IH", "MQ", "NH", "NM", "OP", "PQ", "SM",
"TC", "UQ"),
"character"=c("BC", "BQ", "BZ", "CB", "CC", "CO", "CQ", "CR",
"CS", "CT", "CY", "E2", "FS", "LB", "MC", "MD",
"MI", "OA", "OC", "OQ", "OX", "PG", "PT", "PU",
"Q2", "QT", "QX", "R2", "RG", "RX", "SA", "TS",
"U2"))
library(Rsamtools)
bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100),
param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)>0]

outPath <- "splited"
dir.create(outPath)

library(BSgenome.Hsapiens.UCSC.hg19)
seqlev <- "chr1" ## subsample data for quick run
which <- as(seqinfo(Hsapiens)[seqlev], "GRanges")
library(ATACseqQC)
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE)
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)

It runs smoothly using data in the package, but when I use my bam file, the last command runs more than 10 hours and gives a warning:
> gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
[bam_sort_core] merging from 68 files and 1 in-memory blocks...

It seems that gal1 object is created but the shifted.bam is not. So I failed to do downstream analysis. Is there anything wrong with my steps? or any suggestions for running this command? Thanks!

Jingyu

adjusting readBamFile output for plotCorrelation

Hi,
(in ATACseqQC 1.12.0)
the example of using plotCorrelation that is written in the ATACseqQC guide works out of the box only when asMates and bigFile parameters are FALSE in readBamFile. Otherwise output format of readBamFile does not cooperate with plotCorrelation, is there a way around to modify output of readBamFile with asMates/bigFile being TRUE so that plotCorrelation can be used? And more importantly, does it make sense to use plotCorrelation in such cases?
Thanks for any help!

Interpreting the outputs of the function TSSEscore() comparing to the reference from ENCODE

Hello,

First of all, thank you for developing ATACseqQC.

I am having troubles interpreting the outputs of the function TSSEscore(). I am using the TSSE score definition used in ENCODE standards and then compare the score to the reference values used in this table in the Current Standards section form the ENCODE ATAC-seq guideline.

Fist of all I noticed the way they compute TSSE score is different than the one that is explained in the TSSEscore() documentation:

ENCODE definition: The TSS enrichment calculation is a signal to noise calculation. The reads around a reference set of TSSs are collected to form an aggregate distribution of reads centered on the TSSs and extending to 2000bp in either direction (for a total of 4000bp). This distribution is then normalized by taking the average read depth in the 100 bps at each of the end flanks of the distribution (for a total of 200bp of averaged data) and calculating a fold change at each position over that average read depth. This means that the flanks should start at 1, and if there is high read signal at transcription start sites (highly open regions of the genome) there should be an increase in signal up to a peak in the middle. We take the signal value at the center of the distribution after this normalization as our TSS enrichment metric.
TSSEscore() uses as default 1000bp upstream and downstream of the TSS.

I want to compare to the reference values indicated in the TSSE table from ENCODE. To do so I used the function as the following:

txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
tsse <- TSSEscore(gal1, txs, upstream = 2000, downstream = 2000)
tsse$TSSEscore

I got this output:

> tsse$TSSEscore
[1] 11.21616

Is this the right way to go? Also, are the values in the table from ENCODE referring to this number?

Thank you,
Sergio

shiftGAlignmentsList after samtools fixmate error

Hello-

I used samtools (see below) to sort some bam files (paired end reads) and ran ATACseqQC code and ran shiftGAlignmentsList and got an error (see below):
#The sort according to name
samtools sort -n sorted-chrM.bam -o nchrM.bam
#Add ms and MC tags for markdup to use later
samtools fixmate -rm nChrM.bam -o fixmate.bam
#Markdup needs position order
samtools sort -o fixmate.bam fixmatePos.bam
#Finally mark duplicates
samtools markdup -r fixmatePos.bam sorted-chrM-markup.bam
samtools index -@ 1 sorted-chrM-markup.bam

This error comes on:

Only single end reads
Error in names(this.mpos) <- paste(mcols(attempt to set an attribute on NULL))

Any chance you can tell me if the fixmate is causing this error?

Remove interactive() in readBamFile.R

ATACseqQC/R/readBamFile.R

Line 47 in 2996d20

if(file.size(bamFile)>1e8 && !bigFile && interactive()){

The interactive() is anti-human and redundant. This is in particular annoying in Rstudio server. I highly suggest removing this condition.

shiftGAlignmentsList Error

When I try to offset the read starting position using shiftGAlignmentsList, I get the following error:

all(elementNROWS(gal) < 3) is not TRUE
Calls: shiftGAlignmentsList -> stopifnot
Execution halted

It might be relevant that I aligned paired end reads using bwa and then sorted the file to get the BAM file.

I'm not sure what I should change in the BAM processing to get algorithm working and would appreciate any help.

Thank you!

How the cut-site probability is calculated?

Hi, I am reading https://bioconductor.org/packages/release/bioc/vignettes/ATACseqQC/inst/doc/ATACseqQC.html and for the TF footprint figure, how the y-axis cut-site probability is calculated?

Is it the average read counts for each base pair (100 bp flanking the motif) and then the average count of each base pair across all the motif occurrence was calculated? How this value is normalized?

Thanks,
Ming

Error on running splitBam

While running the command
objs <- splitBam(bamfile, tags=tags, outPath=outPath,
txs=txs, genome=genome,
conservation=phastCons100way.UCSC.hg38)
I get the following error-
Error in asMethod(object) :
cannot coerce a non-logical 'Rle' or a logical 'Rle' with NAs to an IRanges object

inconsistencies between the package code and user guide

Hi,

In the guide's section 2.4.4 (Transcription Start Site (TSS) Enrichment Score): function TSSEscore returns $TSSEscore and $values, however, today's version of the function does not return them.
In the guide's section 2.5 (plot footprints): function featureAlignedHeatmap is used, while there is not such function available in the today's version of the package.

Anyway, thanks for providing the package!

plot correlation is not symmetric

Hi Jianhong,
I used plotCorrelation to find out that which of the bam files is highly different from others but surprisingly the out put is not symmetric. I mean as the picture below shows for instance for input6-input5 there is two different colors (one of them is darker). can you please tell me what is wrong?
thanks
Noori

TF footprint figure

Hi. I created TF footprint figure according to ATACseqQC Guide (https://bioconductor.org/packages/release/bioc/vignettes/ATACseqQC/inst/doc/ATACseqQC.html).

5 red dashed lines seems to be averages of y-axis.
How are the ranges of x-axis of these 5 red dashed lines set?

Thanks,
Kenji

Output bam files become smaller after shiftGAlignmentsList

Hi,
Thanks for developing this user-friendly tool for ATAC-seq. I found some problems after running shiftGAlignmentsList, and the output BAM file is about half the size of the input file. I don't know if this is a normal situation or if ATACseqQC does some filtering.

My code :

library(ATACseqQC)
bamfile <- '/home/zfzf/atac_ljh/alignment/neg2_701504_sorted.bam'
bamfile.labels <- gsub(".bam", "", basename(bamfile))
#bamQC(bamfile, outPath=NULL)
estimateLibComplexity(readsDupFreq(bamfile))
## generate fragement size distribution
fragSize <- fragSizeDist(bamfile, bamfile.labels)


## bamfile tags to be read in
possibleTag <- list("integer"=c("AM", "AS", "CM", "CP", "FI", "H0", "H1", "H2", 
                                "HI", "IH", "MQ", "NH", "NM", "OP", "PQ", "SM",
                                "TC", "UQ"), 
                    "character"=c("BC", "BQ", "BZ", "CB", "CC", "CO", "CQ", "CR",
                                  "CS", "CT", "CY", "E2", "FS", "LB", "MC", "MD",
                                  "MI", "OA", "OC", "OQ", "OX", "PG", "PT", "PU",
                                  "Q2", "QT", "QX", "R2", "RG", "RX", "SA", "TS",
                                  "U2"))

library(Rsamtools)

bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100),
                     param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)>0]
tags

## files will be output into outPath
outPath <- "align_adjusted"
dir.create(outPath)

## shift the coordinates of 5'ends of alignments in the bam file
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
gal <- readBamFile(bamfile, tag=tags, asMates=TRUE, bigFile=TRUE)
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)

error when install ATACseqQC

Dear,

When I install ATACseqQC, the following error occurred:
RandPSSMGen.cpp:32:31: fatal error: gsl/gsl_histogram.h: No such file or directory
#include <gsl/gsl_histogram.h>
The reason is that I have install gsl (GNU scientific library) at a non-standard position. I am not an root user, So I couldn't install it in the default path /usr/local/include/gsl. I have added the path of gsl to my environmental variables. However, I still can't install ATACseqQC. May be I need to change the header of ATACseqQC source file. Could you help me?

Best regards

splitBam return a asBam error

Hi,
I have a problem with just one bam file for splitBam step.
My script do splitBam step for each chromosome of each bamfiles in bamfile list.

bamfile <-  filenames <- Sys.glob(paths = "...")
                       
wd="..."

indexBam(bamfile)


outPath <- "splited"
dir.create(outPath)

txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
genome=Mmusculus
seqlev <- c("chr1","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chrX","chrY")
which <- as(seqinfo(Mmusculus)[seqlev], "GRanges")


possibleTag <- combn(LETTERS, 2)
possibleTag <- c(paste0(possibleTag[1, ], possibleTag[2, ]),
                 paste0(possibleTag[2, ], possibleTag[1, ]))

possibleTag=possibleTag[possibleTag!="PG"]


for (i in bamfile) {
  

bamTop100 <- scanBam(BamFile(i, yieldSize = 100),
                     param = ScanBamParam(tag=possibleTag))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)==100]


bamfile.labels <- gsub(".bam", "", basename(i))

outPathsample=paste(outPath,bamfile.labels,sep = "//")

dir.create(outPathsample)



readBam <- readBamFile(i, which=which, asMates=TRUE, bigFile=TRUE)


for (chr in seqlev){


chrdir=paste(outPathsample,chr,sep="//")
  
dir.create(chrdir)
print(chr)
objs <- splitBam(i, outPath= chrdir,tags =tags,txs=txs,index = i,seqlev=chr, genome=genome)

}
}

splitBam renvoie cette erreur pourtant le fichier bam de base n'est pas tronquée.

Error in value[[3L]](cond) : 
  'asBam' truncated input file at record 3933292
  SAM file: 'C:\Users\pollep02\AppData\Local\Temp\RtmpeU0Spu\file5dc8594e76c0.sam'

My file is too big to be send with the issue.
Thanks in advance.

How to normalize cut-site probability between samples

Hello,

I want to compare cut-site probability between samples.
However, I found that the backgrounds of cut-site probabilities of each sample were linearly correlated with mapped read counts of each sample.
Background was calculated as mean cut-site probability sufficiently far from motif (+5kb ~ +10kb).
Do you have any ideas to normalize cut-site probability between samples?

Thanks in advance,

jianhong / atacseqqc Goto Github PK

atacseqqc's Introduction

ATACseqQC

Installation

Documentation

atacseqqc's People

Contributors

Stargazers

Watchers

Forkers

atacseqqc's Issues

files will be output into outPath

shift the coordinates of 5'ends of alignments in the bam file

bamfile tags to be read in

shift the coordinates of 5'ends of alignments in the bam file

Loading Input files / directory

bamfile tags to be read in

files will be output into outPath

load the library

read bam file

generate fragement size distribution

Recommend Projects

Recommend Topics

Recommend Org