Thanks a lot for developing this great software. Really liked the interface and sp

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for your question. The <a href="https://github.com/luntergroup/octopus/wiki/How

--bamout feature doesn't split output about octopus HOT 7 CLOSED

luntergroup commented on July 26, 2024

--bamout feature doesn't split output

from octopus.

Comments (7)

dancooke commented on July 26, 2024 1

@mehrankr FYI. I just added a Python script to split Octopus new tagged realigned BAM output into separate BAM files.

from octopus.

dancooke commented on July 26, 2024

Thanks for your question. The --bamout option has different input depending on whether you're analysing a single sample or multiple samples. For a single sample, you need to specify the full output path (e.g.~/octopus/minibams/realigned.bam). The reason your first command fails is that you're trying to write to an existing directory. For multi-sample analysis the input is a directory (e.g. ~/octopus/minibams). I'll try to make this clearer in the documentation.

Note that Octopus does not produce 'split' realigned BAMs in the same way that my bamsplit script does (and earlier versions of Octopus). Rather, all realigned reads for a given sample are written to a single BAM, and the supporting haplotype is annotated with the HI BAM tag. I think this is a better way of doing things as it allows clearing visualisation in alignment browsers that support coloured reads, such as IGV. If you need to work with 'split' BAMs, then it should be fairly straightforward to write a Python script using pysam to generate them by making use of the HI tag in the Octopus realigned BAM.

from octopus.

mehrankr commented on July 26, 2024

Thanks Dan for getting back to me so quickly.

That makes sense. However, when I look at the output BAM file, all of my prior tags in the input BAM are excluded and there is only the RG tag.

In pysam, I only see the following for all reads:

In [1]: read.get_tags()
Out[1]: [('RG', '1')]

This, however, is for an unpaired 50 bp experiment. Does it mean that Octopus couldn't do any phasing because of the short reads?

from octopus.

dancooke commented on July 26, 2024

Ah... I missed that you are using v0.5.2-beta. This version of Octopus does produce split BAM files: you need to use the --split-bamout option rather than the --bamout option. In v0.6.0-beta the --split-bamout option was removed and BAM tags were added.

from octopus.

mehrankr commented on July 26, 2024

Thanks Dan. this worked well for all of the files I had, except one paired end ATAC-seq library which ended with the following debug message:

[2019-03-30 01:35:09] <DEBG> There are 38 reads in chr3_GL000221v1_random:44103-44304 after filtering
[2019-03-30 01:35:09] <DEBG> Measuring block chr3_GL000221v1_random:44203-44204 containing 1 calls
[2019-03-30 01:35:09] <DEBG> Fetched 19 unfiltered reads from chr3_GL000221v1_random:50521-50722
[2019-03-30 01:35:09] <DEBG> In sample 93-VU147T
[2019-03-30 01:35:09] <DEBG> 0 failed the IsNotMarkedQcFail filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasWellFormedCigar filter
[2019-03-30 01:35:09] <DEBG> 0 failed the IsMapped filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasValidBaseQualities filter
[2019-03-30 01:35:09] <DEBG> There are 19 reads in chr3_GL000221v1_random:50521-50722 after filtering
[2019-03-30 01:35:09] <DEBG> Measuring block chr3_GL000221v1_random:50621-50622 containing 1 calls
[2019-03-30 01:35:09] <DEBG> Fetched 48 unfiltered reads from chr4_GL000008v2_random:0-106
[2019-03-30 01:35:09] <DEBG> In sample 93-VU147T
[2019-03-30 01:35:09] <DEBG> 0 failed the IsNotMarkedQcFail filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasWellFormedCigar filter
[2019-03-30 01:35:09] <DEBG> 0 failed the IsMapped filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasValidBaseQualities filter
[2019-03-30 01:35:09] <DEBG> There are 48 reads in chr4_GL000008v2_random:0-106 after filtering
[2019-03-30 01:35:09] <INFO>                           -             100%            2h 4m                 -
[2019-03-30 01:35:09] <DEBG> Encountered an error whilst filtering, attempting to cleanup
[2019-03-30 01:35:09] <INFO> Removed 2 temporary files
[2019-03-30 01:35:10] <EROR> A program error has occurred:
[2019-03-30 01:35:10] <EROR> 
[2019-03-30 01:35:10] <EROR>     Encountered an unknown error during calling. This means there is a
[2019-03-30 01:35:10] <EROR>     bug and your results are untrustworthy.
[2019-03-30 01:35:10] <EROR> 
[2019-03-30 01:35:10] <EROR> To help resolve this error run in debug mode and send the log file to
[2019-03-30 01:35:10] <EROR> https://github.com/luntergroup/octopus/issues.
[2019-03-30 01:35:10] <INFO> ------------------------------------------------------------------------

Any ideas on what might be going wrong?

Edit:
After using another FASTA file that doesn't have the contigs and only has the chromosomes, I didn't get that error. I think there might be an issue related to small contigs, multi-mapping reads, or soft clipped reads.

from octopus.

dancooke commented on July 26, 2024

It's difficult to say what caused this error without the having access to the data, but bugs that caused this type of error have been fixed in versions since v0.5.2-beta (which is now fairly outdated), so it's possible that whatever bug caused this error has already been fixed. I'll close this issue now but please open a new issue if you find the problem (or any others) with the latest version.

from octopus.

mehrankr commented on July 26, 2024

Really appreciate it, thanks.

from octopus.

--bamout feature doesn't split output about octopus HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent