Hello,
I have used Bismark a few times in directional mode, and I have been very happy with the results so far (thanks for that!). However, I believe that I have found a bug in the non-directional mode (I am using bismark_v0.15.0).
Specifically, we are looking at amplicons, and our library contains amplicons targeting the OT and OB strands of the same region. In this case, some amplicons that should be reported as mapping to the OB strand (XG:Z:GA) are reported on the OT strand (XG:Z:CT) and thus the methylation calls are wrong. This problem is likely to affect any type of non-directional library, not just amplicons.
The problem can be seen with the following reference and reads:
Reference:
GTGAAGATTGTCGTGTTTCTTGACTAATTTCCATTGCTTGTTTCATGAGTGCATCCATTAAATCTTGAACATTCTCTGACTTCGTTTCATTTTCGAATATGGTACCAATCTTTACAGACGTTAATGAATTTTTTTTAGCATCATCTGATTTTAATATCTTTGTTTTTTCAATAGTTTTCTTTGCTTTCTTTTTTTGTGATTTTGATTCATTGCTTTCATCACTCGATAATACCATAATATCATCGATTTTTTCTTTTTGTGAAACCATATCGCAAACTTTCGTGATTACATCATCTATTAAAGCTACGACATCATGCATTAATTGCAAATGAGATACATTGCCTAATTTTATTTTCTTTATATTATCTTGAGGTAATTCAGTAGGAGATCTTTTACGTGTTTCAGTTTTTTCTTGTTCATTACGTTGATTTGTTCCAAATCCTGTATTTGTTCTGTCTATAATTTCAATCTCCTCGATACTATTTGAACGTTGAGCAATGGTGACAACATTTCTGTCACTTTCAGTTGTAAATTCCATTTCTTTTCCAATAATTTTATTACTTGTGAAAAGATTTTCTTCCTTAGTTTTCTCATTTAATTTTGAAGTATATTTCATAAAAGGGCCTTTAGCACATTTTTCAAGATGAAATTGAAGCAATTGACCTAAAGATTTTTTCAAGGGTTTTCCTTCGTGCTGCTCAAGATATTTTAATATGCTTGCATGAATTCGATAATG
Forward read:
TTAATTCATTACTTTCATCACTCAATAATACCATAATATCATCAATTTTTTCTTTTTATAAAACCATATCACAAACTTTCATAATTACATCATCTATTAAAACTACGACATCATGCATTAATTACAAATAAAATACATTACCTAATTTTATTTTCTTTATATTATCTTAAAATAATTCAATAAAAAATCTTTTACATATTTCAATTTTTTCTTATTCATTACGTTAATTT
Reverse read:
ATTATTGTTTAACGTTTAAATAGTATCGAGGAGATTGAAATTATAGATAGAATAAATATAGGATTTGGAATAAATTAACGTAATGAATAAGAAAAAATTGAAATATGTAAAAGATTTTTTATTGAATTATTTTAAGATAATATAAAGAAAATAAAATTAGGTAATGTATTTTATTTGTAATTAATGCATGATGTCGTAGTTTTAATAGATGATGTAATTATGAAAGTTTG
bismark command:
bismark --bowtie2 -minins 0 --maxins 1000 --output_dir . --sam --non_directional -N 1 -L 20 -D 20 -R 3 --score_min L,-0.8,-0.8 -p 4 /home/laurawelsh/bio/myBisulf_data/work/mapping/bismark -1 tmp1.fastq -2 tmp2.fastq
I think that I have found a potential solution for this problem.
I'm attaching a diff of my code against bismark_v0.15.0.
The fix is still incomplete: the same thing must be applied to single-end case and to the case where there is ambiguous mapping on the same strand.
Am I missing missing something or is this a real bug?
If it is a real bug, does this fix seem OK?
Thanks,
Sylvain
diff.txt