r3fang / snapatac Goto Github PK

Analysis Pipeline for Single Cell ATAC-seq

License: GNU General Public License v3.0

R 100.00%

single-cell-atac-seq single-cell-analysis bioinformatics-pipeline epigenetics machine-learning-algorithms sequencing

snapatac's Introduction

SnapATAC (Latest Updates: 2019-09-19)

SnapATAC (Single Nucleus Analysis Pipeline for ATAC-seq) is a fast, accurate and comprehensive method for analyzing single cell ATAC-seq datasets.

Please find the latest version SnapATAC (2.0) by the following link: https://github.com/kaizhang/SnapATAC2

Latest News

FAQs

Requirements

Linux/Unix
Python (>= 2.7 & < 3.0) (SnapTools) (highly recommanded for 2.7);
R (>= 3.4.0 & < 3.6.0) (SnapATAC) (3.6 does not work for rhdf5 package);

Pre-print

Rongxin Fang, Sebastian Preissl, Xiaomeng Hou, Jacinta Lucero, Xinxin Wang, Amir Motamedi, Andrew K. Shiau, Eran A. Mukamel, Yanxiao Zhang, M. Margarita Behrens, Joseph Ecker, Bing Ren. Fast and Accurate Clustering of Single Cell Epigenomes Reveals Cis-Regulatory Elements in Rare Cell Types. bioRxiv 615179; doi: https://doi.org/10.1101/615179

Installation

SnapATAC has two components: Snaptools and SnapATAC.

SnapTools - a python module for pre-processing and working with snap file.
SnapATAC - a R package for the clustering, annotation, motif discovery and downstream analysis.

Install snaptools from PyPI. See how to install snaptools on FAQs. NOTE: Please use python 2.7 if possible.

$ pip install snaptools

Install SnapATAC R pakcage (development version).

$ R
> library(devtools)
> install_github("r3fang/SnapATAC")

Galleries & Tutorials (click on the image for details)

snapatac's People

Contributors

Stargazers

Watchers

Forkers

jianguozhou3 ltosti translationalbioinformaticsunit zorrodong welch-lab jun-lizst shiywa andrewhill157 hmyh1202 z5ouyang daisukeray chenweng1991 xinhuang420 brandongonzalez01 jingxinguo nbahti yuanwang0 ellenhong1 jinxu9 linwang6 ritututeja manalb95 gaelcge emdann bacemdatascience shawnzhangyx kuze99 yu1033704806 yxian9 biolchen dabai2 drbecavin jackiegao1130 cnk113 jackieshen68 cxzhu gusevfe booew raivivek amybingo yjchen1201 thesallygardens yxiao832 jverploegen yejg2017 ytliu1985 songeric1107 chpngyu shellloman geng-lee van1yu3 chloexwang znavidi rr1859 xunwangcaltech changliangwang ttriche luolq willey2020 xjyx kundajelab sebastian-kunz sebboegel benxiahu sel041 mukamel-lab adichand mief hbeale anderkj2 xxz19900 biomystery dongshengbai fumi-github yuzhenpeng ronfinn cleliacort yunzhouyang renchaochen fiona-pan stjordanis mandyzhang6 akhileshkaushal marvinquiet changwn shiwei23 yingyuan830 beyondpie yirenheihei kchiou hugohooverwang mardzix jules-samaran wkl1990 colin986 zyzyzyy624 xshaonrc l-z-l jome0169 bobia9991

snapatac's Issues

Converting pmat to Cicero CDS

Hello, I'm sorry to bother you again. I'm having some trouble exporting the GRanges object of my peak matrix (pmat) from my snap object to a format usable in Cicero.

From their documentation, they want a 3-column file in the following format.
chr1_start_end | cellBarcode | readsInPeak

I know that this data is available in the object, but I'm not sure how to compile it all correctly while maintaining the cell barcodes. If you could help me in retrieving this information that would be very helpful. Additionally, I think this would be good to include in some of your examples as this may help in integrating other tools with yours.

Thank you!

snaptools was supported in windows?

I installed snaptools in windows ,but installation was failed.
was failure reason that snaptools didn't support windows?
this is the error information:
< C:\Users\wangshiyou>pip install snaptools
Collecting snaptools
Using cached https://files.pythonhosted.org/packages/09/31/2e1e6283d860efd3c5fe7ac0e43644a916041e5fba5a5afa97e6a1ec9741/snaptools-1.4.7.tar.gz
Collecting pysam (from snaptools)
Using cached https://files.pythonhosted.org/packages/15/f6/ce0611aaa1865a616f7dc164fbf046eaf38f2b17c6d404403c56250beb93/pysam-0.15.2.tar.gz
Complete output from command python setup.py egg_info:
'.' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
'.' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
make: ./version.sh: Command not found
make: ./version.sh: Command not found
make: uname: Command not found
Makefile:228: Extraneous text after else' directive Makefile:231: Extraneous text after else' directive
Makefile:231: *** only one `else' per conditional. Stop.
# pysam: no cython available - using pre-compiled C
# pysam: htslib mode is shared
# pysam: HTSLIB_CONFIGURE_OPTIONS=None
# pysam: htslib configure options: None
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\WANGSH1\AppData\Local\Temp\pip-install-kf_meyu3\pysam\setup.py", line 223, in
htslib_make_options = run_make_print_config()
File "C:\Users\WANGSH1\AppData\Local\Temp\pip-install-kf_meyu3\pysam\setup.py", line 69, in run_make_print_config
stdout = subprocess.check_output(["make", "-s", "print-config"])
File "E:\anaconda\envs\python2\lib\subprocess.py", line 395, in check_output
**kwargs).stdout
File "E:\anaconda\envs\python2\lib\subprocess.py", line 487, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['make', '-s', 'print-config']' returned non-zero exit status 2.

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in C:\Users\WANGSH~1\AppData\Local\Temp\pip-install-kf_meyu3\pysam>

Bmat required before Pmat runJDA?

Hi,

I was having trouble completing the runJDA step on a pmat, and kept on getting this error:

Error in runJDA(obj = x.sp, input.mat = "pmat", bin.cov.zscore.lower = -2, :
input matrix contains empty rows, remove empty rows first
Calls: system.time -> runJDA
In addition: Warning message:
In max(numeric(0), ..., na.rm = na.rm) :

Then I realized that I must run the addBmatToSnap step before addPmatToSnap for runJDA to work on pmat. This is not very intuitive to me as I was thinking the two are completely independent matrices, so don't see why Bmat would be required for Pmat to work. Is this a bug or intended? Here is my code that ultimately worked below (commenting out the addBmatToSnap breaks the code):

library(Rtsne)
library(umap)
library(GenomicRanges)
library(SnapATAC)

x.sp = createSnap(file="mapped.snap",sample="mapped",num.cores=20)
summarySnap(x.sp)


x.sp = filterCells(obj=x.sp, subset.names=c("fragment.num", "UMI"),low.thresholds=c(1000,500),high.thresholds=c(Inf, Inf))
x.sp = addBmatToSnap(obj=x.sp, bin.size=10000, num.cores=40) #bin size of genome --250, 1kb, 5kb, 10kb, or 20kb -- 10kb worked well
x.sp = addPmatToSnap(obj=x.sp, do.par=T, num.cores=40) #peak matrix

x.sp = makeBinary(x.sp, mat="pmat")

idy1 = grep("mitochondria|chloroplast", x.sp@feature)
x.sp = x.sp[,-idy1, mat="pmat"]
summarySnap(x.sp)
length(idy1)

system.time({x.jda.sp = runJDA(obj=x.sp, input.mat="pmat", bin.cov.zscore.lower=-2, bin.cov.zscore.upper=2, pc.num=50, norm.method="normOVE", max.var=5000,
		do.par=TRUE, ncell.chunk=1000, num.cores=20, seed.use=10, tmp.folder=tempdir()) })

peak_bc_matrix_mex_matrix.mtx file from 10X?

Hello,
I recently did a single cell ATAC-seq experiment, the facility gave me back a bam file, bed file and a matrix called peak_bc_matrix...mtx, can I start the analysis using the matrix? if so, how?
Thank you!
E.

runJaccard error

Hi Rongxin,

I am running the snATAC workflow following your tutorial (PBMC) and I got this error while running runJaccard.
Could you help with that?

Thank you very much!
Paola

x.sp = runJaccard(
x.sp,
tmp.folder=tempdir(),
mat = "bmat",
max.var=2000,
ncell.chunk=1000,
seed.use=10,
num.cores=5
)

Epoch: splitting obj into chunks ...
Epoch: scheduling CPUs ...
Epoch: calculating jaccard index for each chunk ...
Error in isOpen(con): invalid connection
Traceback:

$ snaptools snap-pre

Hello,
I'm trying to execute that in my terminal, i've changed the directory and input the file name.
I'm using human genome, so in --genome-name=GRCh38
then genome-size=GRCh38 .chrom.sizes\ and everyting else I kept the same.
$ snaptools snap-pre
--input-file=demo.bam
--output-snap=demo.snap
--genome-name=GRCh38
--genome-size=GRCh38.chrom.sizes
--min-mapq=30
--min-flen=0
--max-flen=1000
--keep-chrm=TRUE
--keep-single=TRUE
--keep-secondary=False
--overwrite=True
--min-cov=100
--verbose=True

I get an error snaptools snap-pre: error: the following arguments are required: --genome-size,
How do I fix that?
Thank you

snaptools dex-fastq input argument 10x barcodes

Hi,
I had a question about the --index-fastq-list argument to dex-fastq command,is this input a file with cell indices like 10x barcode file for 10x fastq files?

I wanted to make sure it is actually R2 file instead of I1 file ,the tutorial here says I1(https://github.com/r3fang/SnapATAC/wiki/FAQs#CEMBA_snap).

Best regards
Sasi

error in bedGraphToBigWig

hi, rongxing
when i have installed the package bedGraphToBigWig, i found that i can't use it because shared library.

$bedGraphToBigWig
bedGraphToBigWig: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

could you please give me some suggestions ?

regards !

Human vs. Mice Step 3. Fragments-in-promoter ratio

Hello,
If Im using Human genome, will the promotor file provided in step 3 work or I need a human promoter.bed file?
Also Is this a necessary step? can I move to step 4 without going through step 3?
Thank you

error in generating snap file from bed file

hi, rongxin,

I used your pipeline in 10X data and it has great performance.

But when I want to use it in bed file, I have this error:
GSE96772_human_HSC.bed is not a sorted by read name!

My command is :
snaptools snap-pre
--input-file=GSE96772_human_HSC/GSE96772_human_HSC.bed
--output-snap=GSE96772_human_HSC/GSE96772_human_HSC.snap
--genome-name=hg38
--genome-size=./hg38.chrom.sizes
--min-mapq=30
--min-flen=0
--max-flen=1000
--keep-chrm=TRUE
--keep-single=TRUE
--keep-secondary=False
--overwrite=True
--min-cov=100
--verbose=True

My bed file is like this:

I sorted this bed file by 'sort -k1,1 -k2,2n -k4,4'

My bed file is merged from many small bed files, where one bed file represent one cell. These data was generated by sciATAC.

Could you please tell me the meaning of this error?
Could you please help me figure out it?

Thank you very much!
I am looking forward to seeing the result from my bed file by this great pipeline.

Regards,
Xin

How to extract matrices made by "snap-pre"?

Hi, I'm thinking of applying cell-by-bin matrices to another scATAC-analysis pipelines such as pseudo-trajectory analysis.

In order to do so, I tried to extract cellxbin matrix (bmat) from mos object imported by data(mos).
I have confirmed that mos contains "bmat" attribute by attributes() function, however, I couldn't get bmat by mos$bmat.
Error in mos$bmat : $ operator not defined for this S4 class

Could you teach me how to extract bmat from "snap" object?

Sample integration - advice

Hi there,

I have four 10X scATACseq libraries (from four independent tissue donors) and have run them through SnapATAC pipeline (same as in the 10X PBMC tutorial).

However there is a clear batch/sample-specific effect when I look at the UMAP visualisation (see below). I have tried both normOVE and normOVN at the runNormJaccard step. I have also performed scRNA-seq on the exact same samples, and when I normalise/integrate those matching datasets I have not seen any major sample-specific effects.

Do you have any advice on improving or troubleshooting this?

Thanks,

Hamish

Combine 10X scATAC FASTQs before dex-fastq ?

Hi there,
I just wanted to know if there is any problem in combining FASTQ files from multiple sequencing lanes from the same 10X scATAC library before the dex-fastq command, rather than after as shown in the tutorial. I have data from two NextSeq runs so each library has eight lanes total, and I have four samples. Seems much more convenient to therefore combine FASTQs together before the dex-fastq but I just wanted to check that this wouldn't create any problems later in the analysis pipeline. FASTQ files are made with standard cellranger-atac mkfastq. This is what I had in mind.

cat FASTQ/GC-HK-8248/outs/fastq_path/H3J5TBGXB/BCP003/*R1* \
FASTQ/GC-HK-8293/outs/fastq_path/H7JLKBGXB/BCP003/*R1* > BCP003_scATAC_R1_fastq.gz
cat FASTQ/GC-HK-8248/outs/fastq_path/H3J5TBGXB/BCP003/*I1* \
FASTQ/GC-HK-8293/outs/fastq_path/H7JLKBGXB/BCP003/*I1* > BCP003_scATAC_I1_fastq.gz
cat FASTQ/GC-HK-8248/outs/fastq_path/H3J5TBGXB/BCP003/*R3* \
FASTQ/GC-HK-8293/outs/fastq_path/H7JLKBGXB/BCP003/*R3* > BCP003_scATAC_R3_fastq.gz

snaptools dex-fastq --input=BCP003_scATAC_R1_fastq.gz \
--output=BCP003_scATAC_R1_fastq.dex.gz \
--index-fastq-list=BCP003_scATAC_I1_fastq.gz

snaptools dex-fastq --input=BCP003_scATAC_R3_fastq.gz \
--output=BCP003_scATAC_R3_fastq.dex.gz \
--index-fastq-list=BCP003_scATAC_I1_fastq.gz

Thanks in advance

Hamish

runJDA with Knit Rmarkdown

hi Rongxin Fang,

While working with SnapATAC I'm using a R markdown file (in order to more easily show stuff during work discussions).
I noticed however that the runJDA command breaks the Knit process. My code runs fine but kniting the document breaks at the runJDA command giving the following error:

Error in isIncomplete(con) : invalid connection
Calls: ... handle_condition -> handle_output -> -> isIncomplete
Quitting from lines 19-36 (file.Rmd)
Error in isOpen(con) : invalid connection
Calls: ... -> evaluate_call -> -> isOpen
Execution halted

I tried to also generate a knited document from the demo.sp dataset ant the same problem is occurring there. Rmd code used:

title: "SnapATAC_test"
author: "Jsmits"
date: "May 22, 2019"
output: html_document

library('SnapATAC')

data(demo.sp)
demo.sp = makeBinary(demo.sp)
demo.sp = runJDA(
obj=demo.sp, 
input.mat="bmat", 
 bin.cov.zscore.lower=-2,
 bin.cov.zscore.upper=2,
pc.num=50,
norm.method="normOVE",
tmp.folder=tempdir(),
max.var=2000,
do.par=TRUE,
ncell.chunk=1000,
num.cores=5,
seed.use=10)

Am I doing something wrong or is it a small bug somewhere?

Greetings Jos Smits

PS: here is my session info:

R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Gentoo/Linux

Matrix products: default
BLAS/LAPACK: /home/jsmits/anaconda3/envs/P3_SnapATAC/lib/R/lib/libRblas.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] SnapATAC_1.0.0 rhdf5_2.26.2 Matrix_1.2-17

loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 plyr_1.8.4 compiler_3.5.1 RColorBrewer_1.1-2 GenomeInfoDb_1.18.2 XVector_0.22.0
[7] bitops_1.0-6 iterators_1.0.10 tools_3.5.1 zlibbioc_1.28.0 digest_0.6.19 evaluate_0.13
[13] Rtsne_0.15 lattice_0.20-38 pkgconfig_2.0.2 doSNOW_1.0.16 foreach_1.4.4 igraph_1.2.4.1
[19] yaml_2.2.0 parallel_3.5.1 xfun_0.7 GenomeInfoDbData_1.2.0 raster_2.9-5 knitr_1.23
[25] S4Vectors_0.20.1 IRanges_2.16.0 stats4_3.5.1 locfit_1.5-9.1 grid_3.5.1 snow_0.4-3
[31] bigmemory_4.5.33 bigmemory.sri_0.1.3 rmarkdown_1.12 RANN_2.6.1 sp_1.3-1 irlba_2.3.3
[37] limma_3.38.3 Rhdf5lib_1.4.3 edgeR_3.24.3 magrittr_1.5 scales_1.0.0 codetools_0.2-16
[43] htmltools_0.3.6 BiocGenerics_0.28.0 GenomicRanges_1.34.0 colorspace_1.4-1 munsell_0.5.0 RCurl_1.95-4.12
[49] doParallel_1.0.14

GREAT Input

Hi Rongxin,

What is the most facile way to generate BED files for passing to GREAT?

Thanks!
J

Problem with runCluster and leiden

Hello Rongxin,

When I tried runCluster, the following error comes out. I thought it might be caused by that I installed leidenalg in Python3 and it imports package from python2. But after I set my default python version to python3 and use import_from_path(), it still does not work. Do you have any idea what happened here? Thank you for your time!

Best,
Sean

Error in H5Dread

Hi Rongxin,

Thanks for creating such a fantastic tool!

Unfortunately I'm running into some expected errors when analyzing a scATAC-seq dataset (starting with .bam files). I have completed Step 6 so far (following https://github.com/r3fang/SnapATAC/blob/master/examples/10X_P50/README.md) and got both .snap file and .snap.qc file.

But when I ran createSnap, the error happened.

Epoch: reading the barcode session ...
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  : 
  HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  : 
  HDF5. Dataset. Read failed.
Error in data.frame(barcode, TN, UM, PP, UQ, CM): arguments imply differing number of rows: 0, 2034
Traceback:

1. createSnap(file = "Buenrostro2018.snap", sample = "Buenrostro2018", 
 .     do.par = TRUE, num.cores = 5)
2. createSnap.default(file = "Buenrostro2018.snap", sample = "Buenrostro2018", 
 .     do.par = TRUE, num.cores = 5)
3. mclapply(as.list(seq(fileList)), function(i) {
 .     createSnapSingle(file = fileList[[i]], sample = sampleList[[i]])
 . }, mc.cores = num.cores)
4. lapply(X = X, FUN = FUN, ...)
5. FUN(X[[i]], ...)
6. createSnapSingle(file = fileList[[i]], sample = sampleList[[i]])
7. readMetaData(file)
8. readMetaData.default(file)
9. data.frame(barcode, TN, UM, PP, UQ, CM)
10. stop(gettextf("arguments imply differing number of rows: %s", 
  .     paste(unique(nrows), collapse = ", ")), domain = NA)

I tried to research online but without success. I would truly appreciate it if you could help me out here.

I'm listing some info of .qc file and also output from Step 6 in case it might be helpful for you.
from .qc

Total number of unique barcodes:             2034
TN - Total number of fragments:              495280829
UM - Total number of uniquely mapped:        180290059
SE - Total number of single ends:            0
SA - Total number of secondary alignments:   0
PE - Total number of paired ends:            180290059
PP - Total number of proper paired:          180131064
PL - Total number of proper frag len:        180115491
US - Total number of usable fragments:       180115491
UQ - Total number of unique fragments:       42089297
CM - Total number of chrM fragments:         17396249

from Step6

===== reading the barcodes and bins ======
@AM     nBinSize:1
@AM     binSizeList: [5000]
@AM     binSize:5000    nBin:627478

Many thanks!

Best,
Huidong

error in clustering by `leiden`

hi rongxing,
when i clustered cells by leiden, i got the error of packages import:

> library(leiden)
> x.sp = runCluster(
+ obj=x.sp,
+ tmp.folder=tempdir(),
+ louvain.lib="leiden",
+ seed.use=10,
+ resolution=1
+ );
Epoch: checking input parameters
Epoch: finding clusters using leiden
Error in py_module_import(module, convert = convert) :
  ImportError: No module named leidenalg

then i install leidenalg successfully,

/zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/bin/pip3.7 install leidenalg numpy igraph
Collecting leidenalg
  Using cached https://files.pythonhosted.org/packages/b6/cc/d76baf78a3924ba6093a3ce8d14e2289f1d718bd3bcbb8252bb131d12daa/leidenalg-0.7.0.tar.gz
Requirement already satisfied: numpy in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (1.15.4)
Collecting igraph
  Downloading https://files.pythonhosted.org/packages/91/15/7c606c483a401dfdcdd19f2688c83585ee3b5ef401bd4e0e647660ef5b3f/igraph-0.1.11-py2.py3-none-any.whl (119kB)
     |████████████████████████████████| 122kB 1.9MB/s
Collecting python-igraph>=0.7.1.0 (from leidenalg)
  Downloading https://files.pythonhosted.org/packages/0f/a0/4e7134f803737aa6eebb4e5250565ace0e2599659e22be7f7eba520ff017/python-igraph-0.7.1.post6.tar.gz (377kB)
     |████████████████████████████████| 378kB 2.2MB/s
Requirement already satisfied: ipython in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from igraph) (7.2.0)
Requirement already satisfied: traitlets>=4.2 in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from ipython->igraph) (4.3.2)
Requirement already satisfied: setuptools>=18.5 in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from ipython->igraph) (40.6.3)
Requirement already satisfied: jedi>=0.10 in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from ipython->igraph) (0.13.2)
Requirement already satisfied: pickleshare in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from ipython->igraph) (0.7.5)
Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from ipython->igraph) (2.0.7)
Requirement already satisfied: pexpect; sys_platform != "win32" in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from ipython->igraph) (4.6.0)
Requirement already satisfied: backcall in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from ipython->igraph) (0.1.0)
Requirement already satisfied: pygments in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from ipython->igraph) (2.3.1)
Requirement already satisfied: decorator in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from ipython->igraph) (4.3.0)
Requirement already satisfied: six in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from traitlets>=4.2->ipython->igraph) (1.12.0)
Requirement already satisfied: ipython-genutils in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from traitlets>=4.2->ipython->igraph) (0.2.0)
Requirement already satisfied: parso>=0.3.0 in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from jedi>=0.10->ipython->igraph) (0.3.1)
Requirement already satisfied: wcwidth in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython->igraph) (0.1.7)
Requirement already satisfied: ptyprocess>=0.5 in /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/lib/python3.7/site-packages (from pexpect; sys_platform != "win32"->ipython->igraph) (0.6.0)
Building wheels for collected packages: leidenalg, python-igraph
  Building wheel for leidenalg (setup.py) ... done
  Stored in directory: /home/wangshiyou/.cache/pip/wheels/29/55/48/5a04693a10f50297bcda23819ca23ab3470a61dd911851c8bd
  Building wheel for python-igraph (setup.py) ... done
  Stored in directory: /home/wangshiyou/.cache/pip/wheels/41/d6/02/34eebae97e25f5b87d60f4c0687e00523e3f244fa41bc3f4a7
Successfully built leidenalg python-igraph
Installing collected packages: python-igraph, leidenalg, igraph
Successfully installed igraph-0.1.11 leidenalg-0.7.0 python-igraph-0.7.1.post6

and installed leiden again. I also tried to install the developmental version but it didn't work.
had you ever met this problem?

regards!

promoter file not found

Hi, I ran system("wget http://renlab.sdsc.edu/r3fang/share/Fang_2019/published_scATAC/PBMC_10k_10X/atac_v1_pbmc_10k_fastqs/promoter.bed") in Rstudio but got:

--2019-05-01 11:25:27--  http://renlab.sdsc.edu/r3fang/share/Fang_2019/published_scATAC/PBMC_10k_10X/atac_v1_pbmc_10k_fastqs/promoter.bed
Resolving renlab.sdsc.edu (renlab.sdsc.edu)... 198.202.90.216
Connecting to renlab.sdsc.edu (renlab.sdsc.edu)|198.202.90.216|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2019-05-01 11:25:27 ERROR 404: Not Found.

Thanks

About the filtering cells based on FIP ratio

In your criteria in filtering cells based on ratio, it seems that you calculated the ratio of sequencing reads belong to the "bins" overlapped with promoter region.
However, this ratio would be affected by the size of bins.
(If the binsize became larger, the chance of the bin overlapped with promoter would also become higher.)

Is there any good way to tune the threshold of FIP ratio based on the binsize?

promoter.df = read.table("promoter.bed");
promoter.gr = GRanges(promoter.df[,1], IRanges(promoter.df[,2], promoter.df[,3]));
ov = findOverlaps(x.sp@feature, promoter.gr);
idy = queryHits(ov);
promoter_ratio = SnapATAC::rowSums(x.sp[,idy, mat="bmat"], mat="bmat") / SnapATAC::rowSums(x.sp, mat="bmat");
plot(log(SnapATAC::rowSums(x.sp, mat="bmat") + 1,10), promoter_ratio, cex=0.5, col="grey", xlab="log(count)", ylab="FIP Ratio", ylim=c(0,1 ));
idx = which(promoter_ratio > 0.2 & promoter_ratio < 0.8);
x.sp = x.sp[idx,]

snap-add-bmat

Hello,
I generated the .snap file,
following the tutorial, the next step is the snap-add-bmat,
I input this code:
$ snaptools snap-add-bmat
--snap-file=scATAC_BMMC_CLL_donor.snap
--bin-size-lis 5000
--verbose=True

Get an error: error: argument : invalid choice: 'snap-add-bmat--snap-file=atac_v1_adult_brain_fresh_5k.snap' (choose from 'dex-fastq', 'index-genome', 'align-paired-end', 'align-single-end', 'snap-pre', 'snap-add-bmat', 'snap-add-pmat', 'snap-add-gmat', 'dump-fragment', 'dump-barcode', 'call-peak', 'louvain', 'snap-del')

Any help on how to deal with that?

single-cell cell type labels - 10x Genomics

Hi,

I would like to know where should I find the cell type labels of a single cell dataset like this 10x Genomics link, which is required for evaluating SnapATAC outputs:
https://support.10xgenomics.com/single-cell-atac/datasets/1.0.1/atac_v1_hgmm_500

Thanks

Warning during barcode selection

I've previously been using your P50 10X brain tutorial without issue but tried the new code for FRIP/UMI selection and get the warning:

x.sp = x.sp[which(x.sp@barcode %in% barcodes.sel$barcode)];
Warning message:
In max(i) : no non-missing arguments to max; returning -Inf

This leads to x.sp having 0 barcodes. (prior to that line it has 8033).

The barcodes data.frame has 499703 obs. of 18 variables

Prior to that line, the plots look identical and there are 4098 objects in barcodes.sel

Do you have any suggestions?

Keep up the great work with all of the additions!

chromVAR integration

Apologies for the many questions, I just want to confirm that I'm understanding the nuances of SnapATAC.

For the motif variability matrix, do we take that output and plot it as shown here:
https://greenleaflab.github.io/chromVAR/articles/Articles/Applications.html

Thanks in advance!

Merging 3+ samples

Thanks for the intuitive package and frequent updates with more functionality!

What is the procedure for combining 3 samples or more as you mention below?

Hi,

Thank you for your kind message and thanks for try it! This is great question.

The answer is yes. SnapTools does not support merge of multiple "snap" files but "SnapATAC" supports to merge multiple snap object in R. Here is one example how SnapATAC analyzes two samples together here. You can merge multiple snap objects once they share cell-by-bin matrix of the same bin size. Last time i tried, I have no problem of merging up to 70 samples.

The reason that snaptools does not merge multiple snap files in to a single one is because I am trying to avoid creating a giant matrix that slows down the read-in function in SnapATAC. Also, as you mentioned, it would be nice to keep each sample separate so this can be ran in parallel.

Does it answer your question?

Best
-Rongxin

Originally posted by @r3fang in #13 (comment)

Human blacklist file in step5 Bin filtration (SnapATAC)

Hello,
Im using a Human genome, where can I find the blacklist file necessary for step 5?
Thank you

scaleCountMatrix function error in step 13

Hi,

I am following this sample analysis steps:
https://github.com/r3fang/SnapATAC/blob/master/examples/10X_P50/README.md
and till the step 13 seems it works fine!

I face this error:
Error in .Ops.recycle.ind(e1, len = l2) :
vector too long in Matrix - vector operation

while running this command:
x.sp = scaleCountMatrix(
obj=x.sp,
cov=SnapATAC::rowSums(x.sp, mat="bmat"),
mat="gmat",
method = "RPM"
)

Apparently same errors in google are related to the sparse matrix. but there is no specific solution for that. Have you ever seen this error or do you have any idea how it could be solved? because I am running it on a single cell data form 10x Genomics:
https://support.10xgenomics.com/single-cell-atac/datasets/1.1.0/atac_v1_pbmc_5k
and it seems you have analyzed these datasets before. I would appreciate any help.

Best

Step 13 - human genes - marker genes

I am running the 13 th step of snapATAC and I downloaded the human hg19 gene list from here:
http://genome.ucsc.edu/cgi-bin/hgTables?command=start

But when I run the codes, my genes list format and gene names are different from those you have declared in marker.genes. Where should I download the compatible version of human genes bed file with marker genes?

And also I would like to know that is the marker.genes list the same in every human and every dataset? or we should change it based on out dataset. And what should we select?

Thanks
Best

runSnapAddPmat Error

Hi Rongxin,

As discussed in the runMACS issue, I was able to call the cluster peaks successfully by updating to the most recent version of snaptools. However, I am now running into an issue on the next step when I try to add the peak x cell matrix to the snap object. I made sure I was running the latest version of Snaptools and SnapATAC before running. Here is the error:

runSnapAddPmat(
obj=combined,
tmp.folder=getwd(),
peak=peak.gr,
path.to.snaptools="/home/rziffra/.local/bin/snaptools",
buffer.size=500
)
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: adding cell-by-peak matrix into snap file ...
Traceback (most recent call last):
File "/home/rziffra/.local/bin/snaptools", line 38, in
parse_args()
File "/home/rziffra/.local/lib/python3.5/site-packages/snaptools/parser.py", line 176, in parse_args
verbose=args.verbose)
File "/home/rziffra/.local/lib/python3.5/site-packages/snaptools/add_pmat.py", line 145, in snap_pmat
dump_read(snap_file, fout_frag.name, buffer_size, None, tmp_folder, True);
File "/home/rziffra/.local/lib/python3.5/site-packages/snaptools/snap.py", line 836, in dump_read
fout.write(("\t".join(map(str, item)) + "\n").encode())
TypeError: write() argument must be str, not bytes
Error in runSnapAddPmatSingle(file, peak = peak, path.to.snaptools = path.to.snaptools, :
'runSnapAddPmat' call failed

runMACS error

Hello,

I am running into the following error when I try to call peaks using the runMACS function:

peaks_C16.df = runMACS(obj = combined[which(combined@cluster==16),],tmp.folder=getwd(),output.prefix="combined",path.to.snaptools="/home/rziffra/.local/bin/snaptools",path.to.macs="/usr/local/bin/macs2",gsize="hs",buffer.size=500,macs.options="--nomodel --shift 37 --ext 73 --pvalue 1e-2 -B --SPMR --call-summits",num.cores=2)
Epoch: checking input parameters ...
Epoch: extracting fragments from each snap files ...
INFO @ Thu, 28 Mar 2019 10:43:59:
Command line: callpeak -t /media/RND/HDD-5/rziffra/V1_GW20_SC/file290a2d88f415.bed.gz -f BED -g hs --nomodel --shift 37 --ext 73 --pvalue 1e-2 -B --SPMR --call-summits -n combined
ARGUMENTS LIST:
name = combined
format = BED
ChIP-seq file = ['/media/RND/HDD-5/rziffra/V1_GW20_SC/file290a2d88f415.bed.gz']
control file = None
effective genome size = 2.70e+09
band width = 300
model fold = [5, 50]
pvalue cutoff = 1.00e-02
qvalue will not be calculated and reported as -1 in the final output.
Larger dataset will be scaled towards smaller dataset.
Range for calculating regional lambda is: 10000 bps
Broad region calling is off
Paired-End mode is off
Searching for subpeak summits is on
MACS will save fragment pileup signal per million reads

INFO @ Thu, 28 Mar 2019 10:43:59: #1 read tag files...
INFO @ Thu, 28 Mar 2019 10:43:59: #1 read treatment tags...
Exception ZeroDivisionError: 'integer division or modulo by zero' in 'MACS2.IO.Parser.GenericParser.tsize' ignored
INFO @ Thu, 28 Mar 2019 10:44:00: #1 tag size is determined as 0 bps
INFO @ Thu, 28 Mar 2019 10:44:00: #1 tag size = 0
INFO @ Thu, 28 Mar 2019 10:44:00: #1 total tags in treatment: 0
INFO @ Thu, 28 Mar 2019 10:44:00: #1 user defined the maximum tags...
INFO @ Thu, 28 Mar 2019 10:44:00: #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s)
INFO @ Thu, 28 Mar 2019 10:44:00: #1 tags after filtering in treatment: 0
Traceback (most recent call last):
File "/usr/local/bin/macs2", line 617, in
main()
File "/usr/local/bin/macs2", line 57, in main
run( args )
File "/usr/local/lib/python2.7/dist-packages/MACS2/callpeak_cmd.py", line 112, in run
info("#1 Redundant rate of treatment: %.2f", float(t0 - t1) / t0)
ZeroDivisionError: float division by zero
Error in runMACS(obj = combined[which(combined@cluster == 16), ], tmp.folder = getwd(), :
'MACS' call failed

runCluster silencer error.

Hi Rongxin,

I tried to runCluster with the path to the snaptools, however, it failed to cluster without reporting any error. On the other hand, the R internal igraph cluster method worked (without giving it the path to snaptools). Could you help me figure out why?
I can send you the link to RData file over Slack if you need it.

runmacsforall error python locale encoding

Hello,

I generated a snap file from cellranger bam output. Then I used snapatac to cluster the cells. However, now when I try to run the runMACSForAll() function, I get the following errors. Is there something else I need to install for the python code to run properly?

Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: checking input parameters ...
Epoch: extracting fragments from each snap files ...
Epoch: extracting fragments from each snap files ...
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002b89d875a940 (most recent call first):
Epoch: extracting fragments from each snap files ...
Epoch: extracting fragments from each snap files ...
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002b357bcc3940 (most recent call first):
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002ae4b2dd3940 (most recent call first):
Epoch: extracting fragments from each snap files ...
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002ac2cea7e940 (most recent call first):
cat: /tmp/RtmpiYdj3N/file9c57222969d.bed.gz: No such file or directory
Epoch: extracting fragments from each snap files ...
Epoch: extracting fragments from each snap files ...
Epoch: extracting fragments from each snap files ...
cat: /tmp/RtmpiYdj3N/fileb597222969d.bed.gz: No such file or directory
cat: /tmp/RtmpiYdj3N/filea067222969d.bed.gz: No such file or directory
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002b90d2200940 (most recent call first):
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002aab0111c940 (most recent call first):
cat: /tmp/RtmpiYdj3N/fileac67222969d.bed.gz: No such file or directory
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
Epoch: extracting fragments from each snap files ...
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002b3eea938940 (most recent call first):
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002b69b0418940 (most recent call first):
Epoch: extracting fragments from each snap files ...
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
cat: /tmp/RtmpiYdj3N/fileb6f7222969d.bed.gz: No such file or directory
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
cat: /tmp/RtmpiYdj3N/filea447222969d.bed.gz: No such file or directory
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002b77a3c3a940 (most recent call first):
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
cat: /tmp/RtmpiYdj3N/filea687222969d.bed.gz: No such file or directory
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002af0b68a0940 (most recent call first):
cat: /tmp/RtmpiYdj3N/filebb77222969d.bed.gz: No such file or directory
Epoch: extracting fragments from each snap files ...
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
cat: /tmp/RtmpiYdj3N/filec377222969d.bed.gz: No such file or directory
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
cat: /tmp/RtmpiYdj3N/filec5c7222969d.bed.gz: No such file or directory
Epoch: extracting fragments from each snap files ...
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002b3709e39940 (most recent call first):
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
Epoch: extracting fragments from each snap files ...
Epoch: extracting fragments from each snap files ...
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
cat: /tmp/RtmpiYdj3N/filec467222969d.bed.gz: No such file or directory
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002ba948910940 (most recent call first):
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002ac727b77940 (most recent call first):
Fatal Python error: Py_Initialize: Unable to get the locale encoding
File "/usr/local/python/2.7.14/lib/python2.7/encodings/init.py", line 123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax

Current thread 0x00002b5ff2356940 (most recent call first):
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
cat: /tmp/RtmpiYdj3N/filec517222969d.bed.gz: No such file or directory
cat: /tmp/RtmpiYdj3N/filec3e7222969d.bed.gz: No such file or directory
cat: /tmp/RtmpiYdj3N/filec0d7222969d.bed.gz: No such file or directory
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
Traceback (most recent call last):
File "/usr/local/python/2.7.14/bin/macs2", line 25, in
import argparse as ap
File "/usr/local/python/2.7.14/lib/python2.7/argparse.py", line 85, in
import collections as _collections
File "/usr/local/python/2.7.14/lib/python2.7/collections.py", line 20, in
from _collections import deque, defaultdict
ImportError: No module named _collections
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘reduce’ for signature ‘"list"’
In addition: Warning message:
In parallel::mclapply(as.list(levels(obj@cluster)), function(x) { :
scheduled core 1, 2, 7, 5, 8, 3, 4, 9, 11, 15, 13, 14, 10, 12 encountered error in user code, all values of the job will be affected

10X dataset after snaptools dex-fastq

Hi,
In the "How to create snap file from 10X dataset" section, it is mentioned that we should run snaptools dex-fastq module to integrate the 10X barcode into the read name, and the output names are:
Library1_1_L001_R1_001.dex.fastq.gz
Library1_1_L001_R3_001.dex.fastq.gz
Library1_2_L001_R1_001.dex.fastq.gz
Library1_2_L001_R3_001.dex.fastq.gz

but you also mentioned that run the rest of the pipeline using Library1_L001_R1_001.fastq.dex.gz and Library1_L001_R3_001.fastq.dex.gz.
The extension of the output of previous step and the files that you mentioned we should use for the rest of the pipeline is different (.dex.fastq.gz and .fastq.dex.gz). Is it a mistake in the names? Otherwise what do you mean and where are these files?

Thanks

snap file for non-standard genome

Hello,

I'm using a plant genome and ran into the issue that the genome_name must match one pre-defined in the accepted list.

I ran this command:

snaptools snap-pre --input-file=mapped.bam --output-snap=mapped.snap --genome-name=plant --genome-size=plant.gs --min-mapq=30 --min-flen=0 --max-flen=1000 --keep-chrm=TRUE --keep-single=FALSE --keep-secondary=FALSE --overwrite=True --min-cov=100 --verbose=True

and got this error: "error: --genome-name unrecoginized genome identifier plant".

All of the steps before this worked perfectly including index creation and read mapping. Is there any specific reason the package can't be used on any arbitrary genome, and is there a quick fix that can relax the constraint of the genome matching one in the GENOMELIST?

Thanks for your help!
Vikram

Merging snap objects

I really love a lot of the ideas and functionality in SnapTools/SnapATAC, thanks for developing them both and congrats on the preprint!

Looking through the available functions, and am wondering if you think there might be a way to support merging of snap objects. I could obviously grab the binary matrices and merge them easily enough, but wondering if you think there might be a way to genuinely merge the objects so that all the other functionality in SnapTools could still be used as intended. The reason I ask is that we have several datasets that we have run in parallel to speed things up, but would be nice to be able to combine later for integrated analysis.

Thanks for the help!

runCluster error

Hi, when I try to find clusters using 'runCluster', I get the following error:
File "/Users/smorris/snaptools/bin/snaptools", line 36, in
from snaptools.parser import parse_args
ImportError: No module named snaptools.parser
Error in runCluster.default(x.sp, pca_dims = 1:10, k = 30, resolution = 1, :
'runCluster' call failed

How to determine the parameter of significant PC/K in KNN graph construction

I am very afraid of asking this kind of basic question but, I'm wondering how to set reasonable parameters in runKNN() functions.

In "Mouse Secondary Motor Cortex 10k Nuclei" example, pca.dims=2:40 was set, but I couldn't get how to select "significant PCs" from the output of plotDimReductPW(). (The distribution of 2D plots after PC29 vs PC30 look quite similar for me.)
Are there any tips to determine "significant PCs" in runKNN() functions?

In addition to this, are there any good ways of fine-tuning K for KNN in clustering?

Error in creating Pmat

Hi,

I am trying to generate a cell-by-peak matrix. Instead of running MCAS in snapATAC, I used the narrow peaks that I generated before with MACS2, and converted it to GRanges.

np <- read.table("5226cells_insertions_peaks.narrowPeak", header = F, stringsAsFactors = F)
npGr <- GRanges(np[, 1], IRanges(np[, 2], np[, 3]))

It looks OK:

GRanges object with 336764 ranges and 0 metadata columns:
           seqnames              ranges strand
              <Rle>           <IRanges>  <Rle>
       [1]     chr1         10045-10207      *
       [2]     chr1         16173-16419      *
       [3]     chr1       564481-566092      *
       [4]     chr1       566431-567376      *
       [5]     chr1       567520-568613      *
       ...      ...                 ...    ...
  [336760]     chr9 141014021-141014201      *
  [336761]     chr9 141014493-141014917      *
  [336762]     chr9 141015618-141015870      *
  [336763]     chr9 141029042-141029331      *
  [336764]     chr9 141074103-141074361      *
  -------
seqinfo: 22 sequences from an unspecified genome; no seqlengths

But when I run

hcc <- createPmat(obj = hcc,
                  peak = npGr,
                  num.cores = 3)

I got the error

Error in .M.kind(x) : not yet implemented for matrix with typeof NULL

error in addBmatToSnap

when I run x.sp = addBmatToSnap(x.sp, bin.size=5000, num.cores=1) , i got the warning information :

Error in value[[3L]](cond) :
  Warning @addBmat: 'AM/bin.size/idx' not found in  /zfssz2/ST_MCHRI/COHORT/wangshiyou/scATAC/data/testdata/atac_v1_adult_brain_fresh_5k_fastqs/atac_v1_adult_brain_fresh_5k.snap

I don't know what is AM/bin.size/idx , then I check the log of snap-add-bmat , tail of log is

229500000       tags, 4666.174429655075 seconds
229600000       tags, 4668.497059583664 seconds
229700000       tags, 4670.345653295517 seconds
===== reading the barcodes and bins ======
@AM     nBinSize:2
@AM     binSizeList: [5000, 100000]
@AM     binSize:5000    nBin:546206
@AM     binSize:100000  nBin:27348

it seems done successfully. So, i didn't know what wrong with it.

em, i also found result of showBinSizes in mine was different from your documentation,

> showBinSizes("atac_v1_adult_brain_fresh_5k.snap");
[1]   5000 100000

I didn't have 1000 and i though that was duo to snap-add-bmat function parameter --bin-size-lis was set as 5000 and 100000. Right?

regards

intstall snaptools in linux by python 3.7 in anaconda

Hi,
when I use pip3.7 and python 3.7 installed by anaconda, I got an error :

$pip3 install snaptools Collecting snaptools Requirement already satisfied: pysam in ./anaconda3/lib/python3.7/site-packages (from snaptools) (0.15.2) Collecting pybedtools>=0.7 (from snaptools) Using cached https://files.pythonhosted.org/packages/ca/b6/af143d5247cfe331e32c96ca92056293140eb8ce788d37842f6dcea734b4/pybedtools-0.8.0.tar.gz Requirement already satisfied: h5py in ./anaconda3/lib/python3.7/site-packages (from snaptools) (2.8.0) Requirement already satisfied: future in ./anaconda3/lib/python3.7/site-packages (from snaptools) (0.17.1) Requirement already satisfied: numpy in ./anaconda3/lib/python3.7/site-packages (from snaptools) (1.15.4) Collecting python-louvain (from snaptools) Requirement already satisfied: six in ./anaconda3/lib/python3.7/site-packages (from pybedtools>=0.7->snaptools) (1.12.0) Requirement already satisfied: networkx in ./anaconda3/lib/python3.7/site-packages (from python-louvain->snaptools) (2.2) Requirement already satisfied: decorator>=4.3.0 in ./anaconda3/lib/python3.7/site-packages (from networkx->python-louvain->snaptools) (4.3.0) Building wheels for collected packages: pybedtools Building wheel for pybedtools (setup.py) ... error ERROR: Complete output from command /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/bin/python3.7 -u -c 'import setuptools, tokenize;file='"'"'/tmp/pip-install-9zkhoi0h/pybedtools/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-0raptwhn --python-tag cp37: ERROR: running bdist_wheel The [wheel] section is deprecated. Use [bdist_wheel] instead. running build running build_py creating build creating build/lib.linux-x86_64-3.7 creating build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/helpers.py -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/parallel.py -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/genome_registry.py -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/stats.py -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/bedtool.py -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/version.py -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/logger.py -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/paths.py -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/main.py -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/filenames.py -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/init.py -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/settings.py -> build/lib.linux-x86_64-3.7/pybedtools creating build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/test_iter.py -> build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/test_len_leak.py -> build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/test_helpers.py -> build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/test_cbedtools.py -> build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/test_issues.py -> build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/tfuncs.py -> build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/test_gzip_support.py -> build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/regression_tests.py -> build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/init.py -> build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/test1.py -> build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/test_scripts.py -> build/lib.linux-x86_64-3.7/pybedtools/test copying pybedtools/test/test_contrib.py -> build/lib.linux-x86_64-3.7/pybedtools/test creating build/lib.linux-x86_64-3.7/pybedtools/contrib copying pybedtools/contrib/intersection_matrix.py -> build/lib.linux-x86_64-3.7/pybedtools/contrib copying pybedtools/contrib/bigbed.py -> build/lib.linux-x86_64-3.7/pybedtools/contrib copying pybedtools/contrib/bigwig.py -> build/lib.linux-x86_64-3.7/pybedtools/contrib copying pybedtools/contrib/venn_maker.py -> build/lib.linux-x86_64-3.7/pybedtools/contrib copying pybedtools/contrib/init.py -> build/lib.linux-x86_64-3.7/pybedtools/contrib copying pybedtools/contrib/long_range_interaction.py -> build/lib.linux-x86_64-3.7/pybedtools/contrib copying pybedtools/contrib/plotting.py -> build/lib.linux-x86_64-3.7/pybedtools/contrib creating build/lib.linux-x86_64-3.7/pybedtools/scripts copying pybedtools/scripts/intersection_matrix.py -> build/lib.linux-x86_64-3.7/pybedtools/scripts copying pybedtools/scripts/venn_mpl.py -> build/lib.linux-x86_64-3.7/pybedtools/scripts copying pybedtools/scripts/intron_exon_reads.py -> build/lib.linux-x86_64-3.7/pybedtools/scripts copying pybedtools/scripts/venn_gchart.py -> build/lib.linux-x86_64-3.7/pybedtools/scripts copying pybedtools/scripts/init.py -> build/lib.linux-x86_64-3.7/pybedtools/scripts copying pybedtools/scripts/peak_pie.py -> build/lib.linux-x86_64-3.7/pybedtools/scripts copying pybedtools/scripts/py_ms_example.py -> build/lib.linux-x86_64-3.7/pybedtools/scripts copying pybedtools/scripts/annotate.py -> build/lib.linux-x86_64-3.7/pybedtools/scripts creating build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/init.py -> build/lib.linux-x86_64-3.7/pybedtools/test/data running egg_info creating pybedtools.egg-info writing pybedtools.egg-info/PKG-INFO writing dependency_links to pybedtools.egg-info/dependency_links.txt writing requirements to pybedtools.egg-info/requires.txt writing top-level names to pybedtools.egg-info/top_level.txt writing manifest file 'pybedtools.egg-info/SOURCES.txt' reading manifest file 'pybedtools.egg-info/SOURCES.txt' writing manifest file 'pybedtools.egg-info/SOURCES.txt' copying pybedtools/cbedtools.cpp -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/featurefuncs.cpp -> build/lib.linux-x86_64-3.7/pybedtools creating build/lib.linux-x86_64-3.7/pybedtools/include copying pybedtools/include/bedFile.cpp -> build/lib.linux-x86_64-3.7/pybedtools/include copying pybedtools/include/fileType.cpp -> build/lib.linux-x86_64-3.7/pybedtools/include copying pybedtools/include/gzstream.cpp -> build/lib.linux-x86_64-3.7/pybedtools/include copying pybedtools/test/data/gdc.50.200.bam.bai -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/a.links.html -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/gdc.othersort.bam -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/exons.gff -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/BEAF_Mbn2_Bushey_2009.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/SuHw_Mbn2_Bushey_2009.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/venn.c.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/x.bam -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/test.fa.fai -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/gdc.gff.gz -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/test_tsses.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/test.fa -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/BEAF_Kc_Bushey_2009.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/Cp190_Mbn2_Bushey_2009.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/d.gff -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/bedpe2.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/y.bam -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/small.fastq -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/gdc.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/1000genomes-example.vcf -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/snps.bed.gz -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/gdc.gff -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/164.gtf -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/expand_test.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/rmsk.hg18.chr21.small.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/gdc.sorted.bam -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/a.igv_script -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/gdc.sorted.bam.bai -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/hg38-base.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/hg19.gff -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/tag_test1.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/democonfig.yaml -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/gdc.bam -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/a.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/v.vcf -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/hg38-problem.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/gdc.1.100.bam -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/issue_121.bam -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/bedpe.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/CTCF_Kc_Bushey_2009.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/SuHw_Kc_Bushey_2009.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/test_bedpe.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/m1.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/Cp190_Kc_Bushey_2009.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/vcf-stderr-test.vcf -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/multibamcov_test.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/CTCF_Mbn2_Bushey_2009.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/mm9.bed12 -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/vcf-stderr-test.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/dm3-chr2L-5M.gff.gz -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/tag_test2.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/c.gff -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/test_peaks.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/rmsk.hg18.chr21.small.bed.gz -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/x.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/dm3-chr2L-5M-invalid.gff.gz -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/gdc.50.200.bam -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/venn.b.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/reads.bam -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/small.bam -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/b.bed -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/test/data/gdc.1.100.bam.bai -> build/lib.linux-x86_64-3.7/pybedtools/test/data copying pybedtools/cbedtools.pyx -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/featurefuncs.pyx -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/_Window.pyx -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/cbedtools.pxd -> build/lib.linux-x86_64-3.7/pybedtools copying pybedtools/scripts/pybedtools -> build/lib.linux-x86_64-3.7/pybedtools/scripts creating build/lib.linux-x86_64-3.7/pybedtools/scripts/examples copying pybedtools/scripts/examples/pbt_plotting_example.py -> build/lib.linux-x86_64-3.7/pybedtools/scripts/examples running build_ext building 'pybedtools.cbedtools' extension creating build/temp.linux-x86_64-3.7 creating build/temp.linux-x86_64-3.7/pybedtools creating build/temp.linux-x86_64-3.7/pybedtools/include /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cc -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -fPIC -Ipybedtools/include/ -I/zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/include/python3.7m -c pybedtools/cbedtools.cpp -o build/temp.linux-x86_64-3.7/pybedtools/cbedtools.o cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ In file included from pybedtools/include/bedFile.h:16:0, from pybedtools/cbedtools.cpp:660: pybedtools/include/gzstream.h:35:10: fatal error: zlib.h: No such file or directory #include <zlib.h>
`^~~~~~~~ compilation terminated. error: command '/zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cc' failed with exit status 1`

`ERROR: Failed building wheel for pybedtools
Running setup.py clean for pybedtools
ERROR: Complete output from command /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/bin/python3.7 -u -c 'import setuptools, tokenize;file='"'"'/tmp/pip-install-9zkhoi0h/pybedtools/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' clean --all:
ERROR: usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: -c --help [cmd1 cmd2 ...]
or: -c --help-commands
or: -c cmd --help

error: option --all not recognized`

ERROR: Failed cleaning build dir for pybedtools Failed to build pybedtools Installing collected packages: pybedtools, python-louvain, snaptools Running setup.py install for pybedtools ... error ERROR: Complete output from command /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/bin/python3.7 -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-9zkhoi0h/pybedtools/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-pfq4ta4c/install-record.txt --single-version-externally-managed --compile: ERROR: running install running build running build_py copying pybedtools/version.py -> build/lib.linux-x86_64-3.7/pybedtools running egg_info writing pybedtools.egg-info/PKG-INFO writing dependency_links to pybedtools.egg-info/dependency_links.txt writing requirements to pybedtools.egg-info/requires.txt writing top-level names to pybedtools.egg-info/top_level.txt reading manifest file 'pybedtools.egg-info/SOURCES.txt' writing manifest file 'pybedtools.egg-info/SOURCES.txt' running build_ext building 'pybedtools.cbedtools' extension /zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cc -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -fPIC -Ipybedtools/include/ -I/zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/include/python3.7m -c pybedtools/cbedtools.cpp -o build/temp.linux-x86_64-3.7/pybedtools/cbedtools.o cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ In file included from pybedtools/include/bedFile.h:16:0, from pybedtools/cbedtools.cpp:660: pybedtools/include/gzstream.h:35:10: fatal error: zlib.h: No such file or directory #include <zlib.h> ^~~~~~~~ compilation terminated. error: command '/zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cc' failed with exit status 1
----------------------------------------
ERROR: Command "/zfssz2/ST_MCHRI/COHORT/wangshiyou/software/anaconda3/bin/python3.7 -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-9zkhoi0h/pybedtools/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-pfq4ta4c/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-9zkhoi0h/pybedtools/
Do you know the reason of that ? how can I solve it ?

regards!

Add doSNOW to Imports or Suggests?

Hi Rongxin,

I ran into this error during installation on a Windows machine (below). After running install.packages("doSNOW"), it worked fine.

It looks like this might only be required for runLDA(), so maybe it should go in Suggests in DESCRIPTION.

Thanks!
-Lucas

installing source package 'SnapATAC' ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
** inst
** byte-compile and prepare package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called 'doSNOW'
Calls: ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted
ERROR: lazy loading failed for package 'SnapATAC'
removing 'C:/Users/lucasg/Documents/R/win-library/3.6/SnapATAC'

Memory Issue with small bins / merge individual snap objects in R?

Hello!

This is a really great approach to scATAC. Any help would be greatly appreciated!

I have several biologically similar samples. The ability to resolve the individual samples in umap and tsne improves when using smaller bins (10k, 5k, 500bp). However, 500bp bin cannot separate the most similar samples, so we would like to use an even small bin size but are running into memory issues (I think). Even when I used 1.5TB of memory, I got the following error while running addBmatToSnap:
Epoch: reading cell-bin count matrix session ...
Error in .rbind2Csp(x, y) :
Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 92
Calls: addBmatToSnap ... rbind -> rbind2 -> rbind2 -> rbind2sparse -> .rbind2Csp
Execution halted

Do you think this is a memory issue? Or something else?
Would it be easier/possible to make individual snap objects in R then merge them?

Thank you!
Josephine

UMAP and t-SNE

Hello Rongxin,

I just made a t-SNE plot(Figure1) and want to see what does UMAP look like. But in this case, the UMAP plot(Figure2) looks very weird. Are there any other things I need to make change other than use "umap" instead of "Rtsne" and "tsne"? Also, is there a way I can pull out the barcode of a cluster other than just the cluster index?
Thank you for your time!

Best,
Sean

x.sp = runViz( obj=x.sp, tmp.folder=tempdir(), dims=2, pca.dims=2:30, weight.by.sd=TRUE, method="Rtsne", fast_tsne_path=NULL, Y.init=NULL, seed.use=10, num.cores=5 )

plotViz( obj=x.sp, method="tsne", point.size=0.5, point.shape=19, point.alpha=0.8, point.color="cluster", text.add=FALSE, text.size=1.5, text.color="black", text.halo.add=TRUE, text.halo.color="white", text.halo.width=0.2, down.sample=10000, pdf.file.name=NULL, pdf.width=7, pdf.height=7, legend.add=TRUE )

Figure1

Figure2

DARs.C2 = findDAR(
obj=x.sp,
mat="pmat",
cluster.pos=2,
cluster.neg=10,
bcv=0.1,
fdr=5e-2,
pvalue=1e-2,
test.method="exactTest",
seed.use=10);
Epoch: checking inputs ...
Epoch: identifying DARs for positive cluster ...
Error: NA counts not allowed
In addition: Warning message:
In DGEList(counts = data.use, group = group) :
library size of zero detected

How to fix it>?
Thanks.

install SnapATAC permission fails

Error: ERROR: no permission to install to directory ‘/opt/R/local/lib’
Installation failed: Command failed (1)
'/opt/R/lib64/R/bin/R' --no-site-file --no-environ --no-save --no-restore  \
  --quiet CMD INSTALL '/tmp/RtmpxpKUju/devtools176f4e50941d/doSNOW'  \
  --library='/opt/R/local/lib' --install-tests

align-single-end error

Hi Rongxin,

For some odd reasons, align-single-end would not run unless I comment out line #116 in parser.py.

Best,

Dinh

remove cells with no counts on bmat, gmat and pmat

error in createSnap()

When I running the createSnap() I got these following error information:

> x.sp = createSnap(file="atac_v1_adult_brain_fresh_5k_2.snap",sample="atac_v1_adult_brain_fresh_5k",num.cores=1)
Epoch: reading the barcode session ...
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  : 
  HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  : 
  HDF5. Dataset. Read failed.
Error in data.frame(barcode, TN, UM, PP, UQ, CM) : 
  arguments imply differing number of rows: 0, 8033

I check my R version: 3.5.1 and upgrade the rhdf5 package to version: 2.26.2.
But still get these information, how can I fix it?

Best,

r3fang / snapatac Goto Github PK

snapatac's Introduction

SnapATAC (Latest Updates: 2019-09-19)

Latest News

FAQs

Requirements

Pre-print

Installation

Galleries & Tutorials (click on the image for details)

snapatac's People

Contributors

Stargazers

Watchers

Forkers

snapatac's Issues

title: "SnapATAC_test" author: "Jsmits" date: "May 22, 2019" output: html_document

error: option --all not recognized`

Recommend Projects

Recommend Topics

Recommend Org

title: "SnapATAC_test"
author: "Jsmits"
date: "May 22, 2019"
output: html_document