walaj / bxtools Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 10.0 147 KB

Tools for analyzing 10X Genomics data

License: MIT License

Makefile 43.13% Shell 32.37% M4 1.34% C++ 21.38% C 1.79%

genomics sequencing tenxgenomics

bxtools's People

Contributors

Stargazers

Watchers

Forkers

dkj tkamath1 sjackman verne91 vaibbc 1dayac vladsavelyev xtmgah adrisede fanjumeng

bxtools's Issues

Problems with the "mol" module - reading in MI tag.

Many thanks for this otherwise very capable tool. Helps a lot with handling 10X data.

I have a question regarding the "mol" module. I'm getting messages suggesting that it can't read the MI tag of any of the BAM files I've got. I've mainly tried reading in file generated from mice. but also other organisms.

The message I got was:

1e5 reads in and haven't hit MI tag yet
1e6 reads in and haven't hit MI tag yet

This is from BXLOOPCHECK from bxcommon.h. But the main issue seems to be reading in the MI tag, which led me to look into SeqLib - but I haven't isolated the issue yet.

This was run on an Ubuntu-16.04 x86-64 system. Tried also on an Mac OSX and got the same result.

I include here an example line of the actual BAM file I used.

ST-J00101:68:HKHWYBBXX:2:1202:22242:1455 83 1 4347597 60 127M = 4347178 -546 TAATGAACCTGGTATTTGAGAAATTCCATTCTGATGTTTCTGATTATTAGGAACAAAAGCTTCCAGGTGCCGTAGCAACAAGGCTGGTAAGTAGTCTTTTAGAACATTTTCTCCCATAGTATCTACC AJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ DM:Z:0.250000 QT:Z:AAFFFJJJ BC:Z:AAGATCAT QX:Z:AAFFFJJJJJJJJJJJ AM:A:1 XM:A:0 TR:Z:TATCACA TQ:Z:JJJJJJJ AS:f:-2 XS:f:-72.3669 BX:Z:TCGGTAAGTAGTTGTC-1 XT:i:0 RX:Z:TCGGTAAGTAGTTGTC OM:i:60 RG:Z:CTRL-DMSO:LibraryNotSpecified:1:unknown_fc:0 PS:i:4240224 HP:i:1 PC:i:39 MI:i:50195

So there're definitely MI tags in the file. But they somehow just couldn't be read. Otherwise it generates no output.

A stack trace gave me the following:

munmap(0x7f3e6933d000, 790528) = 0
munmap(0x7f3e693fe000, 397312) = 0
munmap(0x7f3e692dc000, 397312) = 0
munmap(0x7f3e680b1000, 397312) = 0
munmap(0x7f3e68050000, 397312) = 0
munmap(0x7f3e67fef000, 397312) = 0
munmap(0x7f3e67f8e000, 397312) = 0
munmap(0x7f3e67f2d000, 397312) = 0
munmap(0x7f3e67ecc000, 397312) = 0
munmap(0x7f3e6945f000, 200704) = 0
munmap(0x7f3e67e9b000, 200704) = 0
munmap(0x7f3e67e09000, 397312) = 0
munmap(0x7f3e67da8000, 397312) = 0
munmap(0x7f3e67d47000, 397312) = 0
munmap(0x7f3e67ce6000, 397312) = 0
munmap(0x7f3e67c85000, 397312) = 0
munmap(0x7f3e67c24000, 397312) = 0
munmap(0x7f3e67bc3000, 397312) = 0
munmap(0x7f3e67b62000, 397312) = 0
munmap(0x7f3e67b01000, 397312) = 0
munmap(0x7f3e67e6a000, 200704) = 0
close(3) = 0
write(1, "ATATAGTTAGCAAATTACACTCCACAGACCCA"..., 453) = 453
fsync(1) = -1 EINVAL (Invalid argument)
close(1) = 0
brk(0x1017000) = 0x1017000
exit_group(0) = ?
+++ exited with 0 +++

I also tried substituting the SeqLib submodule with the master branch, which led to basically the same outcome, but now with segmentation fault.

.
.
.
munmap(0x7f0c01602000, 200704) = 0
close(3) = 0
write(1, "ATATAGTTAGCAAATTACACTCCACAGACCCA"..., 453) = 453
fsync(1) = -1 EINVAL (Invalid argument)
close(1) = 0
brk(0xd37000) = 0xd37000
exit_group(0) = ?
+++ exited with 0 +++
Segmentation fault

I've also tried going back to an earlier commit right after the "mol" module was included, but I had the same results.

I've substituted several different 10X-derived BAM files and they all gave me the same outcome.

Any suggestions? I can send the bam file by FTP separately.
Thanks,
Frank

Generate MI tags [feature request]

I've aligned reads to a de novo assembly using BWA-MEM rather than Long Ranger align. The BWA alignments have the BX tag but no MI tag. Would you consider implementing a module to group together nearby reads into molecules and adding MI:i tags? No worries at all if you consider this feature outside the scope of bxtools.

lariat (aka longranger align) groups together reads that are within 50 kbp of each other. That's a reasonable default value, but I'd find it very helpful for this to be a configurable parameter. I'd like to set it to something more conservative like perhaps 5 kbp.

fail to open new bam file when splitting

I want to split bam file with 10x barcodes. But it seemed that bxtools did not generate bam files for all barcodes. The error is "E::hts_open_format] fail to open file 'abc.bam' Could not open BAM: abc.bam"

installation error

Hello team,

Thanks for developing bxtools.

While installing i encounter this error.

thread_pool.c: In function ‘hts_tpool_init’:
thread_pool.c:676: warning: implicit declaration of function ‘pthread_mutexattr_settype’
thread_pool.c:676: error: ‘PTHREAD_MUTEX_RECURSIVE’ undeclared (first use in this function)
thread_pool.c:676: error: (Each undeclared identifier is reported only once
thread_pool.c:676: error: for each function it appears in.)
make[2]: *** [thread_pool.o] Error 1

Is it common?

Thanks.

Frank

Most file are empty after split

Hi! Thank you for developing this tools.
I would like to use the function split in order to generate a bam file per single cell.

The structure of my bam file (obtained from Cell Ranger) is:

samtools view $BAM/possorted_genome_bam.bam | head
A00379:517:HWLKKDSX2:1:1542:2483:6668   16      chr1    3018437 0       150M1S  *       0       0       TCTTTATTCCTTCCTTGACCAAGGTATCATTGAACAGAGTGTTGTTCAGTCTCCACGTAAATGTTGGCTTTCTATTATTTATGTTGTTATTGAAGATCAGCCTTAGTCCATGGTGATCTGATAGGATGCATGGGACAATTTCGAAATTTTC       FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF       NH:i:7  HI:i:1  AS:i:136        nM:i:6  RG:Z:WT1_GEX_PC_mm10_introns:0:1:HWLKKDSX2:1  RE:A:I  xf:i:0  CR:Z:CTAGCCTAGGAATTAC   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:CTAGCCTAGGAATTAC-1 UR:Z:ACCCAACACG UY:Z:FFFFFFFFFF UB:Z:ACCCAACACG

So I don't have a BX tag. I would like instead to use the corrected barcode tag "CB"

So I used the command:

bxtools split $BAM/possorted_genome_bam.bam -a test --tag CB > $OUTPUT/count.txt

This is where it didn't work properly, this command generated many BAM files from which 30 contained reads and more than 7000 were empty files.

The files that contain reads show this error message:

samtools view test.GAATAAGTCTGAGGGA-1.bam | head -n 5
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
A00379:517:HWLKKDSX2:2:1224:19768:20102 1024    chr1    6214342 255     151M    *       0       0       ATTTCGGGGCAGCAGATGAGGGCCCCAGATCTGTGCTGGTGCTCACTCGTCAGCCTCCGGTTCCCCTGTTGGGGCTGCCCCAGGTTTGGCGAGGTCGGTCTGCCGCGGCCAGAAGGTCACGCTCACCTTGGGGCCGTCCAAGGCAAGCACC       FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF       NH:i:1  HI:i:1  AS:i:149        nM:i:0  RG:Z:WT1_GEX_PC_mm10_introns:0:1:HWLKKDSX2:2  TX:Z:ENSMUST00000159618,+98,151M;ENSMUST00000191825,+801,151M   GX:Z:ENSMUSG00000090031 GN:Z:4732440D04Rik      fx:Z:ENSMUSG00000090031       RE:A:E  xf:i:17 CR:Z:GAATAAGTCTGAGGGA   CY:Z:FFFFFFFFFFFFFFFF   CB:Z:GAATAAGTCTGAGGGA-1 UR:Z:GCTCATCGCT UY:Z:FFFFFFFFFF UB:Z:GCTCATCGCT

Do you have an idea what the problem can be?

Thank in advance for your help!

error in bxtools convert

Hello,

I used the following command to sort the bam file by 'BX' tag.
bxtools convert $bam -v -k | samtools sort - -o bx_sorted.bam

But I got an error after running about 30 mins.

.....
...at read 853,000,000 at pos 0:-1(+)
...at read 854,000,000 at pos 0:-1(+)
...starting second pass to flip chr and BX
terminate called after throwing an instance of 'std::out_of_range'
  what():  BamHeader::IDtoName - Requested ID is higher than number of sequences

Summarizing stats based on another tags (e.g. MI or PS)

Thank you Jeremiah for working on this tool. I am experiencing some strange behaviors and I wanted to give you a heads up.

I would like to collect stats based on different tags (e.g., MI, PS, etc.) but, independently of what I specify in the --tag option, I always obtain the same output (stats based on BX).

Here is the simple command I used (just for chr22):

samtools view -h $BAM chr22 | bxtools stats - -t MI > stats.MI.tsv

The reads do contain the other tags. I used the BAM file provided by the GIAB team. Below is an illustrative example of a read containing all the tags.

Another strange behavior I noticed is that the values reported in output for the AS column are all 0s. This seems odd since the majority of the reads have AS values different from 0.

ST-E00273:177:HMTTCCCXX:1:2120:6806:23477       105     chr22   10510039        60      101M27S chr21   8532882 -347    ATGTTTGGAATATAAAATCAGCAACTAATATGTATTTTCAAAGCATTATCAATACAGAGTGCTAAGTGACTTCACTGGGAAAGGTAGTCATATAAAGAACAGACTAATAGTCCGGGATTATTGTGAGG        <<F,7AFKF,F,,F,FKFAFK7AAFKFFKKFF,,<F7,7,,,<AK,,<,,7,A,,F,,77AF,7FFK7,,,AKA<,,,7,,7,,,AFF,F,F<FAKFKA,,,,7,,,,,,,7,,(A<AK,,<7,,<,,        DM:Z:1.236364   QT:Z:A<,F<FFA   BC:Z:TCACATCA   QX:Z:,AAF,<FFFFKFKKA<   AM:A:1  XM:A:0  TR:Z:TAGTCGC    TQ:Z:FKA,FKK    AS:f:-93  RG:Z:27058:MissingLibrary:1:HMTTCCCXX:1 XS:f:-94        BX:Z:TGAATCGCAACTGGAG-1 XT:i:0  RX:Z:TGAATCGCAACTGGAG   OM:i:60 PS:i:10464994   HP:i:2  PC:i:26 MI:i:28314638

fatal: reference is not a tree

Hi,

When I try to install this tool, I was not able to clone some submodules.

git clone --recursive https://github.com/walaj/bxtools.git
The error message is as following:

"Cloning into 'bxtools'...
remote: Counting objects: 69, done.
remote: Compressing objects: 100% (40/40), done.
remote: Total 69 (delta 28), reused 69 (delta 28), pack-reused 0
Unpacking objects: 100% (69/69), done.
Checking connectivity... done.
Submodule 'SeqLib' (https://github.com/walaj/SeqLib.git) registered for path 'SeqLib'
Cloning into 'SeqLib'...
remote: Counting objects: 5188, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 5188 (delta 0), reused 0 (delta 0), pack-reused 5182
Receiving objects: 100% (5188/5188), 11.63 MiB | 1.08 MiB/s, done.
Resolving deltas: 100% (3140/3140), done.
Checking connectivity... done.
fatal: reference is not a tree: 58aa22093443f5de38e78170c4c73390e3df51e8
Unable to checkout '58aa22093443f5de38e78170c4c73390e3df51e8' in submodule path 'SeqLib'"

Best,
Danshu

Bed file of molecule extents

I'd like to create a BED file of molecule extents. For each unique molecule identifier (MI), create a bed file with the barcode (BX) and molecule (MI) and start and stop coordinate of each molecule. I then plan to use bedtools to calculate the depth of physical molecule coverage, and look for gaps in physical coverage.

another installation error

Hello,

While I was trying to install bxtools, I got following error

/broad/software/free/Linux/redhat_6_x86_64/pkgs/gcc_5.2.0/bin/ld:
 /broad/software/free/Linux/redhat_6_x86_64/pkgs/bzip2_1.0.6/lib/libbz2.a(bzlib.o): relocation R_X86_64_32S against `.text' can not be used when making a shared object; recompile with -fPIC
/broad/software/free/Linux/redhat_6_x86_64/pkgs/bzip2_1.0.6/lib/libbz2.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status

Could you help me on this?
Thank you for your help in advance!

-Seunghun

../configure && make doesn't work in a subdirectory

It's helpful to be able to build in a subdirectory when working with multiple architectures, such as both Linux and macOS.

mkdir build
cd build
../configure
make

make  all-recursive
Making all in SeqLib/htslib
/bin/sh: line 0: cd: SeqLib/htslib: No such file or directory
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Failed Installation

I'm having an issue installing bxtools on the cluster at the Broad institute. I've pasted some of the output below. The issue begins when I try to make

Split by HP tag including unphased reads? [Feature request]

For the purposes of easily viewing coverage across the haplotypes, I split my longranger bams by haplotype. bxtools can create bam files for phased reads - is it possible to also output the unphased reads when -t HP is used? Something like an option to output to a single BAM file all the reads that don't match the specified tag would be great.

relabel appending quality to seq on some reads

'relabel' is appending quality field to the seq in some cases; seems to be mostly where there is a non-alphanumeric character at the start of the quality line.

[email protected]:/gsap/garage-protistvector/sredmond/180417_TE_content/tmp_FCV_1$ samtools view possorted_bam.unmapped.bam | grep '1101:26606:1801'
E00536:79:HFV5MALXX:1:1101:26606:1801	77	*	0	0	*	*	0	0	TTACACTGGCTATCTTTAATGCAAAGATAGATTTTCATGAAGAGTTCTTAAAAATTGAACGAAAAAACTGGTTGTTCATTTTTGATGGTTGCTGATGATTGAATGATGGATTCTTGAGCCTTAAAAAA	JJJ<AJAJFJJJ-J7JFJJFJJJJJJF<FJJFJJJJJJJJFJF-AJFFJFJ<FJJJF-F-AF-AJJJJJJJA-FJFFJFJAJJJ<FA-7F<FFJJFJJFA-FJFJJJFJJAFJJFJ7<<FJJ---<FF	AS:i:0	XS:i:0	RX:Z:CNGACTGGTAATCCAC	QX:Z:A#AFFFJJJJJJFJJJ	BX:Z:CGGACTGGTAATCCAC-1	BC:Z:NAGGATGT	QT:Z:#AAFAFFJ	TR:Z:GTGAAGA	TQ:Z:JJJJJJJ	RG:Z:Al_FCV_1_align:LibraryNotSpecified:1:unknown_fc:0
E00536:79:HFV5MALXX:1:1101:26606:1801	141	*	0	0	*	*	0	0	NAAATGGCTCAAAAGAGATTAAAACGGCTTAAAATGGCTTGAAATGGCTCATAATAGCTTAAAAAGTGTTAATGGCGTAAAACTGCTTAAGAGGGGTTAAGCTATTATTAGCCATTTCAAGCCATTTTAAGCCGTTTTAATCTATTTTGCG	#A7A-AFJJJJFF<-<F<<-<FJJAJJJ<JJJJJJFFJJJJ7-JAFAA-A77A<7-A--<-7FF---77FFA-7<--7--A<FFF-AFA-FF--A-FJ-<AAFA<FAA-A<--7AAF7<7FA)--7<F-----7<)-<-7<77----7---	AS:i:0	XS:i:0	RX:Z:CNGACTGGTAATCCAC	QX:Z:A#AFFFJJJJJJFJJJ	BX:Z:CGGACTGGTAATCCAC-1	BC:Z:NAGGATGT	QT:Z:#AAFAFFJ	RG:Z:Al_FCV_1_align:LibraryNotSpecified:1:unknown_fc:0

[email protected]:/gsap/garage-protistvector/sredmond/180417_TE_content/tmp_FCV_1$ samtools view possorted_bam.unmapped.bx.bam | grep '1101:26606:1801'
E00536:79:HFV5MALXX:1:1101:26606:1801_CGGACTGGTAATCCAC	77	*	0	0	*	*	0	0	TTACACTGGCTATCTTTAATGCAAAGATAGATTTTCATGAAGAGTTCTTAAAAATTGAACGAAAAAACTGGTTGTTCATTTTTGATGGTTGCTGATGATTGAATGATGGATTCTTGAGCCTTAAAAAA	JJJ<AJAJFJJJ-J7JFJJFJJJJJJF<FJJFJJJJJJJJFJF-AJFFJFJ<FJJJF-F-AF-AJJJJJJJA-FJFFJFJAJJJ<FA-7F<FFJJFJJFA-FJFJJJFJJAFJJFJ7<<FJJ---<FF	AS:i:0	XS:i:0	RX:Z:CNGACTGGTAATCCAC	QX:Z:A#AFFFJJJJJJFJJJ	BC:Z:NAGGATGT	QT:Z:#AAFAFFJ	TR:Z:GTGAAGA	TQ:Z:JJJJJJJ	RG:Z:Al_FCV_1_align:LibraryNotSpecified:1:unknown_fc:0
E00536:79:HFV5MALXX:1:1101:26606:1801_CGGACTGGTAATCCAC	141	*	0	0	*	*	0	0	NAAATGGCTCAAAAGAGATTAAAACGGCTTAAAATGGCTTGAAATGGCTCATAATAGCTTAAAAAGTGTTAATGGCGTAAAACTGCTTAAGAGGGGTTAAGCTATTATTAGCCATTTCAAGCCATTTTAAGCCGTTTTAATCTATTTTGCG#A7A-AFJJJJFF<-<F<<-<FJJAJJJ<JJJJJJFFJJJJ7-JAFAA-A77A<7-A--<-7FF---77FFA-7<--7--A<FFF-AFA-FF--A-FJ-<AAFA<FAA-A<--7AAF7<7FA)--7<F-----7<)-<-7<77----7---	AS:i:0	XS:i:0	RX:Z:CNGACTGGTAATCCAC	QX:Z:A#AFFFJJJJJJFJJJ	BC:Z:NAGGATGT	QT:Z:#AAFAFFJ	RG:Z:Al_FCV_1_align:LibraryNotSpecified:1:unknown_fc:0

understanding bxtools stats output

Thanks for your useful tool!
I have got stats for my 10x genomics data.
"bxtools stats $bam > stats.tsv

output is BX count median_isize median_mapq"

If I understand it correctly, BX is the barcode and count number of barcode/pool/droplet.
I found that in my dataset, there are many BX with very few count. Thus I want to filter these low-frequency BXs, but haven't had idea about it.

In your example

make a list of bad tags (freq < 100)

Is "freq < 100" a general standard for filtering bad BXs? And also because the input for bxtools stats was unaligned bam file containing paired end reads, so freq=100 in field 2 should represent a frequency of 50 for a BX?

I have also tried to find out how to set this filtering threshold in literature. In this paper "A hybrid approach for de novo human genome sequence assembly and phasing", I found the following sentence:
"those barcodes that were seen below a given threshold fre- quency (22 for library 1 and 101 for library 2, based on the lowest frequency among the number of barcodes that were detected in these libraries by 10XG’s Long Ranger software)". Actually I can not understand what does "the lowest frequency among the number of barcodes that were detected in these libraries by 10XG’s Long Ranger software" mean. Actually if I filter use the lowerest frequency, there is not any filtering at all, right?

Sorry if my question is a little unrelated to your tool.

Best,
Danshu

How do you deal with the records without 'BX' tag in the bam files?

Runtime error of ChrID in bxtools convert

Hello,

I have encountered an error when I use bxtools convert to convert a lariat generated bam file.

[bma@node63 Test]$ /usr/bin/time bxtools convert $bam > test.bam
terminate called after throwing an instance of 'std::invalid_argument'
what(): BamHeader::IDtoName - ID must be >= 0

I found these code in bxconvert.cpp

      std::string chr = hdr.IDtoName(r.ChrID());  <----
      r.SetChrID(bxtags[bx]);
      r.AddZTag("CR", chr);

and

these code in BamHeader.cpp

  if (id < 0)
    throw std::invalid_argument("BamHeader::IDtoName - ID must be >= 0");

have generated this error message.

I tried to print out the r.ChrID() and readname

       if (r.ChrID() < 0) {
         std::cerr << r.ChrID() << std::endl;
         std::cerr << r.Qname().c_str() << std::endl;
         continue;
       }

and get the result :
-1
ST-E00126:314:HFL3FALXX:6:2202:30776:15953

Then I grep the readname and get an unmapped pair-end read:

ST-E00126:314:HFL3FALXX:6:2202:30776:15953      173     *       0       0       47S20M83S       *       0       10495934        ACATATATATATGTAACATAAGGTTCCATTAAACCTGTCGTTCGTCCAACCATTTTATAAAATATATATGTTTTCCTTTATTTTTTGTTTTCATTAATCCTATATCTGAATTTTCTTCCTCTTTCTTTTTCGATGTAAACTGAGTTTTCT   AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJFJJJJJJJJJJJJJJJJFJJJFJJJJJFJJFFJJFFFJJFFFJ7JJJJJFFFFAJ<<-<<FF-<-A777FJJJFF   XM:A:0  QX:Z:AAFFFJJJJJJJJJJJ   AM:A:0  RX:Z:AACCATGGTCGACTAT    AS:f:-141.5     RG:Z:FtTest01:LibraryNotSpecified:1:HFL3FALXX:6 XS:f:-141.5     BX:Z:AACCATGGTCGACTAT-1 XT:i:1  OM:i:0
ST-E00126:314:HFL3FALXX:6:2202:30776:15953      93      *       0       0       43S22M62S       Ft8     5680442 10495934        CCTAAAAAAATAATACCCCACGTCCTATTAACTCATCAAATTAAAATGATATTTTATTTCATAAATTGAAAGTTCTTACAAAATGATAATAATAATTGTTTATATATAACTTGGCAAGTTAACTCCT  --JJJJJJJJJJJJJJFJJJFJFJJFJJJJFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJ  XM:A:0  QX:Z:AAFFFJJJJJJJJJJJ   AM:A:0  TR:Z:CGTAACT    TQ:Z:JJJJJJJ    AS:f:-141.5     RG:Z:FtTest01:LibraryNotSpecified:1:HFL3FALXX:6  XS:f:-141.5     BX:Z:AACCATGGTCGACTAT-1 XT:i:1  RX:Z:AACCATGGTCGACTAT   OM:i:0

Is this hdr.IDtoName line necessary?
I found it only used for generating CR tag in r.AddZTag line.
I just commented these two lines to walk around this error message.

      // std::string chr = hdr.IDtoName(r.ChrID());
      r.SetChrID(bxtags[bx]);
      // r.AddZTag("CR", chr);

Split command results in empty id.bx.bam

Hello BXtools team.

Thank you for creating this tool.
In some cases when using the splitting command via
"bxtools split input.bam -a SM4279 -m 50 > SM4279_counts.tsv".

The resulting (split) bams are all empty. For example, all files are empty.

0 SM_4279.TTCGAAGGTTAGAACA-1.bam
0 SM_4279.TTCGAAGTCTGTCTCG-1.bam

Has this functionality been observed before and do you have any suggestions?
Thank you for your time,
Aaron

error installing - lzma

I'm trying to install bxtools locally on our server. When I run the command:
./configure --prefix=/home/april/local/bxtools/0.0/

Configure runs fine until I get the error:

checking for library containing lzma_end... no
configure: error: liblzma not found, please install lzma

I installed lzma locally from the resource https://tukaani.org/xz/ and added ~/lzma/5.2.3/bin/ to my path but still get this error. Is bxtools looking for another portion of lzma that I may have not installed? I can't find any library called "lzma_end" in the package I installed? Is this package necessary? Is there a way to tell bxtools to ignore this dependency?

SAM to FASTQ functionality

Say you are only given a longranger processed BAM file that you would like to use for other purposes. Many tools cannot use BAM format directly. It could be useful to support a fast SAM/BAM conversion to fastq that preserves BX tags, RX tags and even MI tags in the header.

This is pretty easy to script, but I think this could be useful if written efficiently. It would also present the opportunity to encourage some sort of standardized fastq header format for chromium data.

Name of this package/tool: bxtools or "snowman"?

I guess this was an easter egg? Found it!

$ bxtools
Program: bxtools  <-------------------- OK
Contact: Jeremiah Wala [ [email protected] ]
Usage: snowman <command> [options]   <-------------------- WHAT?

Commands:
(...)

While I'm here, could you please create release(s) for your software? It helps quite a lot while packaging and keeping track of versions/changes/bugs/etc...

split, stats, tile with no arguments segfaults

relabel and mol with no arguments print useful --help text. split, stats, and tile segfault.

❯❯❯ src/bxtools split
[1]    64359 segmentation fault  src/bxtools split

mol accepts --help as an option, but none of the other tools do. Please consider adding a --help option to each tool.

stats: Add median alignment score [feature request]

Please consider adding the median alignments score AS to the output of stats. Note that most tools like BWA output AS:i whereas Long Ranger align outputs AS:f.

walaj / bxtools Goto Github PK

bxtools's People

Contributors

Stargazers

Watchers

Forkers

bxtools's Issues

output is BX count median_isize median_mapq"

make a list of bad tags (freq < 100)

Recommend Projects

Recommend Topics

Recommend Org