caleblareau / bap Goto Github PK

View Code? Open in Web Editor NEW

31.0 31.0 8.0 128.51 MB

Bead-based single-cell atac processing

Home Page: http://caleblareau.github.io/bap

License: MIT License

Python 66.61% R 33.39%

bead-doublets epigenetics epigenomics genomics scatac single-cell

bap's Introduction

Hi there 👋

Check out our lab webpage here

bap's People

Contributors

Stargazers

Watchers

Forkers

buenrostrolab gurungkto haiminli0 anamariaelek cflerin cacaonib gordian-biotechnology guochengying-7824

bap's Issues

output stringing containing a period gives errors

handle more directly up front

.bam file parsing

using a .bam file like N711-Exp31-Sample9.ready2.bam.allele.bam

yields

bap_out/temp/filt_split/N711-Exp31-Sample9.ready2_overlapCount.rds.allele.chr6.bam

Presumably due to the gsub-like command runnign in snakemake

remove fragments in blacklist when computing jaccard frag overlap metric

Parameterize it with a file; make it blank when called from command line

GRCh38/mm10 species mixing genome --mito-chromosome

Hello,

I am trying to run bap2 on a species mixing dataset. I have successfully loaded the custom genome. How can I specify both the human and mouse mitochondrial (GRCh38_chrM and mm10_chrM) chromosome names in the --mito-chromosome option? I am using the GRCh38/mm10 hybrid genome by 10x (https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#GRCh38mm10_3.1.0).

Kind regards,

Florian

Regenerate the fragment for the human/mouse cell line in the Nature Biotechnology paper

Hello,

I have an interest in the human/mouse mix data (https://www.ncbi.nlm.nih.gov/sra?term=SRX5124174) in your Nature Biotechnology paper. The fragment.tsv file is not available at GEO unlike other samples in the same paper.

I was wondering if you could share the fragment.tsv file like you did for other samples for example the mouse. Or could you show me how to process the fastq to create the fragment.tsv file.

It has 4 pairs of reads:

SRR8310510_1.fastq.gz  SRR8310511_2.fastq.gz  SRR8310513_1.fastq.gz
SRR8310510_2.fastq.gz  SRR8310512_1.fastq.gz  SRR8310513_2.fastq.gz
SRR8310511_1.fastq.gz  SRR8310512_2.fastq.gz

I am not clear about how to chain up the steps. For example, I have questions in below:

Should I use v2.1-multi to bap-barcode?
How to run bap-reanno? What is this command doing? Do I just simply bap-reanno <in_bam> <out_bam>?
Since the run has 4 pairs of reads, does the bap bam -i support multiple bam files in the same time? If not, does it mean I will process them one-by-one to generate the fragment.tsv and finally merge all these fragment.tsv files?

Many thanks.

weird use case

            BeadBarcode                        DropBarcode OverlapReads
1 cagaattatagttgttcgtac N715_Exp68_sample12_S1_BC02127_N09        42709
2 cagaattattgcctttcgtac N715_Exp68_sample12_S1_BC02127_N09        42709
3 cagaattcatcttattcgtac N715_Exp68_sample12_S1_BC02127_N09        42709
4 cagaattccacttgttcgtac N715_Exp68_sample12_S1_BC02127_N09        42709
5 cagaattcgcggcgttcgtac N715_Exp68_sample12_S1_BC02127_N09        42709
6 cagaattcgtagccttcgtac N715_Exp68_sample12_S1_BC02127_N09        42709
7 cagaattcgtttgattcgtac N715_Exp68_sample12_S1_BC02127_N09        42709
8 cagaattctaactcttcgtac N715_Exp68_sample12_S1_BC02127_N09        42709
9 cagaattctgcgccttcgtac N715_Exp68_sample12_S1_BC02127_N09        42709
  UniqueNuclear TotalNuclear TotalMito TotalNC
1          3349         3534       108      34
2          4247         4505       112      46
3         19984        22448       596     158
4          6667         7188       192      51
5         29135        33861       831     207
6          8022         8647       216      57
7         47522        60270      1668     314
8         10193        10992       319      74
9         59317        78918      2109     439

Yields a uniqueNuclearFrags < 0; pointed out by @zburkett

Not sure what the steps should be to handle it, but this almost certainly a bad barcode set (barcodes only differ by what is in the middle)

Error in generating `*frag.bedpe.annotated.tsv` file

I encountered two related issues when trying to run the pipeline on my bam files mapped to a custom genome.

Initially, the pipeline was failing to generate *frag.bedpe.annotated.tsv files for chromosomes that have no reads mapped to them, becaue the empty *frag.bedpe.gz input files were generating error in the downstream R script.
cell_barcodes.MAPQ30.CB.snakemake.log
After trying to work around this by removing these chromomes from the genome file, the same job still fails, but this time failing to generate the *frag.bedpe.annotated.tsv for the last file being analyzed, even though the relevan imputs are generated and are not empty. I am quite sure this is not the problem of a particular scaffold or a set of files for it, because when I remove the failing scaffold, I get the same error for the scaffold that is now the last one being processed.

Here is the relevant bit of the log file, and below I attach the whole file as well:

Error in merge.data.table(frags, bead_read, by = "read_name") :
  Elements listed in `by` must be valid column names in x and y
Calls: %>% -> na.omit -> merge -> merge -> merge.data.table
Execution halted
gzip: /users/asebe/aelek/proj/scATAC_nvec_Oct20/scATAC_pro/output_50m_PFA/bap2_out/temp/filt_split/cell_barcodes.MAPQ30.CB.scaffold_811.frag.bedpe.
annotated.tsv: No such file or directory

The only idea that I could think of is that this is maybe a latency issue as suggested in the bap log output, and if so, is there a way to run the snakemake pipeline within bap with increased value of --latency-wait parameter?

Thank you in advance,
Anamaria

regex varied filtering

As per SM's request (mostly for ATAC/ChIP), the ability to variably threshold cells based on barcode regex. Potential format attached

memory overflow

In step 11,

multiprocessing.pool.MaybeEncodingError: Error sending result:

Because of too many reads

width - fragment - parameter

more sophisticated de-duplication

sort alphabetically based on read name
likely going to affect down down down stream counts in peaks etc

Write raw barcode and corrected barcode in FASTQ name comment section so bwa -C can write the BAM tags directly

Write raw barcode and corrected barcode in FASTQ name comment section when using bap-barcode
https://github.com/caleblareau/bap/wiki/Working-with-BioRad-data

bwa supports the -C option:

-C | Append append FASTA/Q comment to SAM output. This option can be used to transfer read meta information (e.g. barcode) to the SAM output. Note that the FASTA/Q comment (the string after a space in the header line) must conform the SAM spec (e.g. BC:Z:CGTAC). Malformated comments lead to incorrect SAM output.

lh3/bwa#317 (comment)

This avoid the need to run bap-reanno afterwards on the BAM, as the tags will be added already by BWA.

# bwa mem: Add -C and -t.
# samtools view: Write uncompressed BAM when converting SAM output to BAM (makes no sense to compress, when sorting it in the next step)
# samtools sort: Put input BAM (-) argument last.
bwa mem -C -t 8 /path/to/hg19.fa debarcode-c001_1.fastq.gz debarcode-c001_2.fastq.gz \
    | samtools view -b -u - \
    | samtools sort -@ 4 -o debarcode.bam -

samtools index debarcode.bam

auto-generate summarized experiment

add new flag to specify peaks; if flag is a valid file, produce a SE counts object

Error running bap-frag

Hi, thanks for the nice tool. I would like to use bap-frag tool to generate fragments.tsv from the Bam file. However, I encountered this error as shown below. Could you provide insight into how to fix this? Thanks.

Jason

Fri May 01 17:16:57 HKT 2020: Starting bap-frag pipeline v0.6.5
Traceback (most recent call last):
  File "/usr/local/bin/bap-frag", line 11, in <module>
    load_entry_point('bap-atac', 'console_scripts', 'bap-frag')()
  File "/home/leetl/.local/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/leetl/.local/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/leetl/.local/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/leetl/.local/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/leetl/bap/bap/cli_bap_frag.py", line 99, in main
    bead_tag)
  File "/home/leetl/bap/bap/bapFragProjectClass.py", line 72, in __init__
    self.bap_version = get_distribution('bap').version
  File "/home/leetl/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 481, in get_distribution
    dist = get_provider(dist)
  File "/home/leetl/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 357, in get_provider
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
  File "/home/leetl/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/leetl/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 786, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'bap' distribution was not found and is required by the application

Missing bgzf EOF marker

[Tue Jul 10 22:59:49 2018] rule reannotate_droplets:
    input: Exp100-Sample15_bap/final/Exp100-Sample15.st.barcodeTranslate.tsv, Exp100-Sample15_bap/temp/filt_split/Exp100-Sample15.st.chr20.raw.bam
    output: Exp100-Sample15_bap/temp/drop_barcode/Exp100-Sample15.st.chr20.raw.bam, Exp100-Sample15_bap/temp/drop_barcode/Exp100-Sample15.st.chr20.raw.bam.bai
    jobid: 5
    wildcards: name=Exp100-Sample15.st.chr20.raw
[Tue Jul 10 22:59:49 2018] 
[E::bgzf_uncompress] Inflate operation failed: progress temporarily not possible, or in() / out() returned an error
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
Finished parsing bam
[Tue Jul 10 22:59:53 2018] Finished job 10.

Potential easy fix: port https://github.com/peterjc/picobio/blob/master/sambam/bgzf_add_eof.py

Support .csi index

The pipeline checks that the input bam file is indexed by checking for the .bai extension, which means that indexes .csi are not allowed (although they should be).

bap/bap/cli_bap2.py

Lines 145 to 146 in 44061cd

 if(not os.path.exists(input + ".bai")): 

 sys.exit("Index supplied .bam before proceeding")

proper NC flag

requires both ends of the read to be considered. need to run on each bam split by chr individually .

** ABSOLUTELY MAKE THIS PARAMETERIZABLE **

could export the read name, chr, and start during the 11_* script into a plain text format that I could run R on to produce a read name : count vector (through a separate SNAKEMAKE call)

then, in the bam_reannotate in the existing snakemake call, have a second annotation dictionary

getCounts updates

need to make sure that column names are in the matrix
need to make sure that each read in peak is only counted once. Should be able to tack that onto the data frame before the dplyr call

bam file for bap2 input

Hi,
Does the bam file for bap2 input require the reads been duplicates marked, e.g., using Picard to mark the dups but not remove dups? What if I don't run mark dup, the bap2 process will generate some issues, e.g., the jaccard index computations will be too much off?
Thanks a lot!
Yu

Improved remove duplicates logic

Need to move toward this as a means of deduplication:
https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.0.0/picard_sam_markduplicates_UmiAwareMarkDuplicatesWithMateCigar.php

add a soft fail for not detecting sufficient barcode numbers

in the python script after 11 is called, verify that the output file produces at least one barcode with sufficient quantifications such that it can be considered downstream.

Maybe start log file with this too?

rapidfuzz not present in bap Python package dependencies

bap/setup.py

Line 7 in 44061cd

 dependencies = ['biopython','fuzzysearch','click', 'pytest', 'snakemake', 'optparse-pretty', 'multiprocess', 'regex', 'pysam', 'ruamel.yaml', 'biopython'] 

This lead to a missing dependency in our internally built conda package metadata that wasn't discovered until runtime.

Significantly different clustering results from bap BAM compared to original

Hi, I'm not quite sure how to phrase this question/discussion.
I've performed a timecourse of scATAC-Seq and previously have shown that it demonstrates good temporal progression as viewed from UMAP projections. This is maintained through a variety of different modes of analysis.

Appending unique identifiers and processing via snapATAC, calling peaks from all cells within a timepoint, and then merging these peak calls together. Processing and visualization using Monocle3.
Above except calling peaks based on clusters within each timepoint.
The same cells as 1&2, creating a peak matrix from 10X called peaks.

Despite these three different peaksets, (190k, 290k, and 373k peaks), the clustering looks pretty identical.

However, when I process all of my samples using bap and merge the resulting bam files, I obtain much less separation between known timepoints. I have inspected the knee plots for bap and it seems fine, and I merge an average of ~10% of my 10X called cells in line with expectations. Even by creating matricies between bap-cells and the identical peaksets, there is a loss of separation. Calling peaks from the new bap cells also looks identical to the previous peaksets. Could this be an artifact of bap? Would how it modifies the aligned reads lead to loss of cell information?

Digging deeper, I tried to identify if there were any clear clusters of bap-merged cells within my non-bap corrected data. There were not. Could you provide insight into why processing via bap is resulting in loss of biological signal?

Non-Bap processed data, 10X called peaks

Bap processed data, 10X called peaks

add other genome

Hello
Following the reading of your bioRxiv paper, I want to test my scAtacSeq datasets.
But my data are on pig, so I need non referent genome.
I was no able to find in the wiki how to add new genome for bap.

Do you have some tutorial ready for that ?
Thank you

deduplicate reads in the first pass

python loop

implement cell barcode filtering

most obvious place is in the 13_call...R script (maybe add an additional column or something)

Careful to do this right because the jaccard metric can be adapted to get # unique fragments --> important QC metric

Unable to install

Hi,

I was trying to install the bap package, but I can't. Could you post some information on how to install it?
Thanks!

Sincerely,
Z

bap-barcode v2.1 generate truncated fasta file

Hi,

I was using bap-barcode to handle biorad data from your Nature biotechnology paper. For most data, it generates correct fasta file that has the number of lines of generated fiiles are multiple of 4. However for data sets SRR8994134 and SRR8994139, bap-barcode generates a file that the number of lines is not the multiple of 4. Could you help me to see where might the problems come from?
Thanks!

Zonghao

wc -l SRR8994139-c002_1.fastq.gz 539482 SRR8994139-c002_1.fastq.gz

wc -l SRR8994139-c002_2.fastq.gz 342438 SRR8994139-c002_2.fastq.gz

'wc -l SRR8994134-c007_1.fastq.gz
699103 SRR8994134-c007_1.fastq.gz'

'wc -l SRR8994134-c007_2.fastq.gz
466513 SRR8994134-c007_2.fastq.gz'

Many barcodes failed to pass Knee filtering

Hi, I got into trouble but have no idea about it when using bap2.
I have 8 libraries, all using the same processes, 2 libraries have few barcodes pass the Knee filtering, like this

And normal libraries like this,

I wondered what may cause this problem? Bad libraries or should I change some parameters (I used default parameters)?

bap-barcode extremely low success rate

Hi,

I follow the instruction in wiki on how to debarcode the files, and debarcode the Nature Biotech paper, and the success rate is very low, all below 5% which make the processed file very small. Could you take a look on this issue?

`cat *.log

Parsing read pairs:
SRR8994135_1.fastq.gz
SRR8994135_2.fastq.gz

735648 reads parsed with barcodes (1.96% success)
Total reads that failed: 36787203

Of reads that could not be parsed that had valid, detectable constants:
1774 had a bad BC1 barcode sequence
40 had a bad BC2 barcode sequence
122 had a bad BC3 barcode sequence
82 had a bad Tn5 barcode sequence`

-rwxrwxrwx 1 zonghao zonghao 243 Jun 7 14:34 ConvertName.sh -rw-rw-r-- 1 zonghao zonghao 4.0M Jun 8 12:46 debarcode-c001_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 3.3M Jun 8 12:46 debarcode-c001_2.fastq.gz -rw-rw-r-- 1 zonghao zonghao 4.0M Jun 8 13:06 debarcode-c002_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 3.3M Jun 8 13:06 debarcode-c002_2.fastq.gz -rw-rw-r-- 1 zonghao zonghao 4.0M Jun 8 13:26 debarcode-c003_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 3.3M Jun 8 13:26 debarcode-c003_2.fastq.gz -rw-rw-r-- 1 zonghao zonghao 4.0M Jun 8 13:47 debarcode-c004_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 3.3M Jun 8 13:47 debarcode-c004_2.fastq.gz -rw-rw-r-- 1 zonghao zonghao 4.0M Jun 8 14:07 debarcode-c005_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 3.2M Jun 8 14:07 debarcode-c005_2.fastq.gz -rw-rw-r-- 1 zonghao zonghao 4.0M Jun 8 14:28 debarcode-c006_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 3.3M Jun 8 14:28 debarcode-c006_2.fastq.gz -rw-rw-r-- 1 zonghao zonghao 4.0M Jun 8 14:48 debarcode-c007_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 3.3M Jun 8 14:48 debarcode-c007_2.fastq.gz -rw-rw-r-- 1 zonghao zonghao 138K Jun 8 07:42 debarcode-c008_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 119K Jun 8 07:42 debarcode-c008_2.fastq.gz -rw-rw-r-- 1 zonghao zonghao 363 Jun 8 12:27 debarcode-debarcode.sumstats.log -rw-rw-r-- 1 zonghao zonghao 0 Jun 8 12:27 debarcode.stderr.txt -rw------- 1 zonghao zonghao 14K Jun 8 12:27 nohup.out -rw-rw-r-- 1 zonghao zonghao 345 Jun 8 00:50 RunBAPBarcode.sh drwxrwxr-x 2 zonghao zonghao 4.0K Jun 8 01:24 SRR8994127 -rw-rw-r-- 1 zonghao zonghao 555M Jun 7 17:34 SRR8994127_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 251M Jun 7 17:34 SRR8994127_2.fastq.gz drwxrwxr-x 2 zonghao zonghao 4.0K Jun 8 01:56 SRR8994128 -rw-rw-r-- 1 zonghao zonghao 542M Jun 7 17:33 SRR8994128_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 245M Jun 7 17:33 SRR8994128_2.fastq.gz drwxrwxr-x 2 zonghao zonghao 4.0K Jun 8 02:29 SRR8994129 -rw-rw-r-- 1 zonghao zonghao 555M Jun 7 17:34 SRR8994129_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 252M Jun 7 17:34 SRR8994129_2.fastq.gz drwxrwxr-x 2 zonghao zonghao 4.0K Jun 8 03:01 SRR8994130 -rw-rw-r-- 1 zonghao zonghao 558M Jun 7 17:34 SRR8994130_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 253M Jun 7 17:34 SRR8994130_2.fastq.gz drwxrwxr-x 2 zonghao zonghao 4.0K Jun 8 05:22 SRR8994131 -rw-rw-r-- 1 zonghao zonghao 2.3G Jun 7 17:55 SRR8994131_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 1.1G Jun 7 17:55 SRR8994131_2.fastq.gz drwxrwxr-x 2 zonghao zonghao 4.0K Jun 8 07:42 SRR8994132 -rw-rw-r-- 1 zonghao zonghao 2.3G Jun 7 17:55 SRR8994132_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 1.1G Jun 7 17:55 SRR8994132_2.fastq.gz drwxrwxr-x 2 zonghao zonghao 4.0K Jun 8 10:02 SRR8994133 -rw-rw-r-- 1 zonghao zonghao 2.3G Jun 7 17:55 SRR8994133_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 1.1G Jun 7 17:55 SRR8994133_2.fastq.gz drwxrwxr-x 2 zonghao zonghao 4.0K Jun 8 12:27 SRR8994134 -rw-rw-r-- 1 zonghao zonghao 2.3G Jun 7 17:55 SRR8994134_1.fastq.gz -rw-rw-r-- 1 zonghao zonghao 1.1G Jun 7 17:55 SRR8994134_2.fastq.gz

low memory option

Easiest way -> covert bams into RDS files before reading in for QC report (which you need to change anyways); verify where this gets hung up on workstation

ValueError: invalid contig `chrM`

Hello! Thank you for the great tool :)
I have a question and I will be grateful if you can help me~

When I run "bap2 bam -i alignment.last.bam -o bap_out -r hg19", I got the following error message.
"""
Traceback (most recent call last):
File "/Users/zhangshijing/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/Users/zhangshijing/anaconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/Users/zhangshijing/Documents/scATAC-Seq/bap/bap/bin/python/20_names_split_filt.py", line 71, in writeBeadReadName
Itr = bam.fetch(str(chrom),multiple_iterators=True)
File "pysam/libcalignmentfile.pyx", line 1082, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig chrM
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/zhangshijing/Documents/scATAC-Seq/bap/bap/bin/python/20_names_split_filt.py", line 102, in
toy_out = pool.map(writeBeadReadName, zip(chrs, read_barcode_file))
File "/Users/zhangshijing/anaconda3/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/Users/zhangshijing/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
ValueError: invalid contig chrM
Thu Mar 04 22:26:13 HKT 2021: Processing bam file using a Snakemake workflow. This is the most computationally intensive step.
Thu Mar 04 22:26:14 HKT 2021: ERROR: Snakemake execution failed. Check bap_out/logs/alignment.last.snakemake.log file for more information. If blank, try allocating more memory.

Before running bap2, I use GATK to remove the "chrM" reads in my bam file. After I see this error, I use "samtools view -h alignment.last.bam | grep -c "chrM"" to check whether there is "chrM" existing in my bam file, and the result shows that there is no "chrM" reads in my bam file.

I have attached the snakemake.log file and the head&tail 100 lines of the alignment.last.bam file.
alignment.last.snakemake.log
alignment_bam_head100.txt
alignment_bam_tail100.txt

Thank you!

Best Wishes,
Shijing

rename/parse/repurpose "bedtools genome"

Should add 3rd column with special names for mito, keep, etc.

error message for poorly calibrated reference genome

do an idxstats
pull bedtools genome
--> intersect, if nothing, bark

Traceback (most recent call last):
 File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 119, in worker
   result = (True, func(*args, **kwds))
 File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
   return list(map(*args))
 File "/home/akohlway/bap/bap/bin/python/11_quantBarcode_Filt.py", line 75, in getUniqueBarcodes
   Itr = bam.fetch(str(chrom),multiple_iterators=True)
 File "pysam/libcalignmentfile.pyx", line 1066, in pysam.libcalignmentfile.AlignmentFile.fetch
 File "pysam/libchtslib.pyx", line 675, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig `chr1`
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "/home/akohlway/bap/bap/bin/python/11_quantBarcode_Filt.py", line 182, in <module>
   unique_barcodes, all_barcodes = zip(*pool.map(getUniqueBarcodes, chrs))
 File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 266, in map
   return self._map_async(func, iterable, mapstar, chunksize).get()
 File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 644, in get
   raise self._value
ValueError: invalid contig `chr1`
Traceback (most recent call last):
 File "/home/akohlway/venv3/bin/bap", line 11, in <module>
   load_entry_point('bap', 'console_scripts', 'bap')()
 File "/home/akohlway/venv3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
   return self.main(*args, **kwargs)
 File "/home/akohlway/venv3/lib/python3.6/site-packages/click/core.py", line 697, in main
   rv = self.invoke(ctx)
 File "/home/akohlway/venv3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
   return ctx.invoke(self.callback, **ctx.params)
 File "/home/akohlway/venv3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
   return callback(*args, **kwargs)
 File "/home/akohlway/bap/bap/cli.py", line 226, in main
   val = file_len(p.output + "/final/" + p.name + ".barcodequants.csv") -1
 File "/home/akohlway/bap/bap/bapHelp.py", line 33, in file_len
   with open(fname) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'bap_out/final/N719_Exp77_sample15_S1.bap.barcodequants.csv'

via @kohlway1

execution_error

Hi, thanks for this awesome tool.
I am getting the error at processing of the bam files
atac.positionsort.MAPQ30.snakemake.log

:
stop_if_wrong_length("'seqnames'", ans_len) :
'seqnames' must have the length of the object to construct (1) or
length 1
Calls: findOverlaps ... makeGRangesFromDataFrame -> GRanges -> new_GRanges -> stop_if_wrong_length
Execution halted

I am using 10x scATAC-seq but the files are processed by another pipeline(scATAC-pro) and not cellranger. This pipeline essentially does the same thing except the barcode flag is probably different in the bam files, so I did not specify the barcode tag. Following was my command:
bap2 bam -i ~/mapping_result/atac.positionsort.MAPQ30.bam -o ~/atac/bap -r hg38 -c 32 -w ~/atac/barcodes.txt

I have attached an entire snakemake log file with the error reported in it. Thanks for your help!

bap-barcode generated zero outputs

bap-barcode v2.1-multi -a fastq_br/biorad_v2-multi_R1.fastq.gz -b fastq_br/biorad_v2-multi_R2.fastq.gz --nmismatches 1

Hello, I was going through the tutorial. I ran the line for the test datasets but the debarcoded files have empty lines. Also in the sumstats.log, it said 0 reads parsed with barcodes.

I was wondering if this is because demo data or this is because the program is not successfully installed at my side.

remove fragments in blacklist and mito when computing knee and jaccard?

The file barcodeQuantSimple.csv contains the reads compared to mitochondria. If the proportion of mitochondria in the sample or some cells is high, will it cause bad or inaccurate knee curve results? Cellranger uses the peaks after peak calling to analyze the knee, can bap use reads without blacklist and mt region?

Error in .subset2(x, i, exact = exact) : subscript out of bounds

Hello! Thank you for helping me solve the "chrM" error :)

However, I got another error message this time. My scATAC-seq data is prepared by Bio-Rad Surecell ATAC-Seq Library Prep Kit, so I also used bap-barcode to treat the raw fq.gz data. What I have run is:

bap-barcode v2.1 -a test_seqtk_S1_L001_R1_1.fastq.gz -b test_seqtk_S1_L001_R2_1.fastq.gz --nmismatches 1
bwa mem /reference/hg38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna debarcode-c001_1.fastq.gz debarcode-c001_2.fastq.gz | samtools view -bS - | samtools sort -@ 4 - -o debarcode.bam
bap-reanno -i debarcode.bam -o reanno_debarcode.bam
samtools index reanno_debarcode.bam

After I got the reanno_debarcode.bam file, I then run bap2 bam -i reanno_debarcode.bam -o out -r hg38.

The error in snakemake.log is shown in following:

Error in .subset2(x, i, exact = exact) : subscript out of bounds
Calls: [[ -> [[.data.frame ->
Execution halted
ESC[31mMissingOutputException in line 206 of
/Users/zhangshijing/Documents/scATAC-Seq/bap/bap/bin/snake/Snakefile.bap2.chr:
Job Missing files after 5 seconds:
out/temp/frag_overlap/reanno_debarcode.chr2_overlapCount.rds
out/temp/frag_overlap/reanno_debarcode.chr2_ncCount.tsv
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait."

I have converted the bam file, snakemake log information, and the idxstats result of bam file to txt.

reanno_debarcode_snakemake_log.txt
reanno_debarcode_idxstats.txt
reanno_debarcode_bam.txt

Thank you!
Shijing

	if(not os.path.exists(input + ".bai")):
	sys.exit("Index supplied .bam before proceeding")