dieterich-lab / jacusa Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 8.0 27.07 MB

JAVA framework for accurate SNV assessment

License: GNU General Public License v3.0

HTML 87.07% CSS 0.03% Java 12.68% XSLT 0.05% Shell 0.01% Batchfile 0.01% TeX 0.07% R 0.09%

jacusa's People

Contributors

Stargazers

Watchers

Forkers

inambioinfo opplatek amchalkie 1518287929 qwang-big hankinsonlab healthvivo y9c

jacusa's Issues

change VCF version to 4.2

IGV now complains about the GT tag in VCF files. Manually changing the version to 4.2 solves this.

I running JACUSA, but there is something wrong in logfile:
tail logfile
[ INFO ] 02:24:00 : Started screening contig scaffold232081:277-4771
[ INFO ] 02:24:00 : Started screening contig scaffold226941:87-524
[ INFO ] 02:24:00 : Started screening contig scaffold222121:216-831
[ INFO ] 02:24:00 : Started screening contig scaffold234201:3573-10807
[ INFO ] 02:24:00 : Started screening contig scaffold203801:370-869
[ INFO ] 02:24:00 : Started screening contig scaffold236361:111-991
[ INFO ] 02:24:00 : Started screening contig scaffold237141:54-346
[ INFO ] 02:24:00 : Started screening contig scaffold220141:22-1328
[ INFO ] 02:24:00 : Started screening contig scaffold239001:87-462
[ INFO ] 02:24:00 : Started screening contig scaffold233041:28-3157
Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException: 1
at jacusa.estimate.MinkaEstimateDirMultParameters.getLogLikelihood(MinkaEstimateDirMultParameters.java:187)
at jacusa.estimate.MinkaEstimateDirMultParameters.maximizeLogLikelihood(MinkaEstimateDirMultParameters.java:111)
at jacusa.method.call.statistic.AbstractDirichletStatistic.estimate(AbstractDirichletStatistic.java:229)
at jacusa.method.call.statistic.AbstractDirichletStatistic.getStatistic(AbstractDirichletStatistic.java:255)
at jacusa.method.call.statistic.dirmult.DirichletMultinomialRobustCompoundError.getStatistic(DirichletMultinomialRobustCompoundError.java:76)
at jacusa.method.call.statistic.AbstractDirichletStatistic.addStatistic(AbstractDirichletStatistic.java:145)
at jacusa.pileup.worker.AbstractCallWorker.processParallelPileup(AbstractCallWorker.java:41)
at jacusa.pileup.worker.AbstractWorker.processParallelPileupIterator(AbstractWorker.java:186)
at jacusa.pileup.worker.AbstractWorker.run(AbstractWorker.java:67)

And result output just 9 tmp file:
*txt_9_tmp.gz

I have no idea what or how to deal with it . It will appreciate if you give some advice.

What name to use instead of rcoverage

Suggestions:

readArrest
rarrest
Currently: rcoverage

calmd error?

Hi,

When running an older sample of ours, we get this error. I suggest that the software do a complete halt when it sees this.

00:00:02 Thread 1: Working on contig chr1:150384085-150384085
ERROR 00:00:02 Problem with read: E00515:95:HJ7K5ALXX:5:1218:11424:65687 in sorted/mysample.Aligned.out.srt.rg-added.dedup.calmd.bam
java.lang.IllegalArgumentException: Byte 99 unknown
at lib.util.Base.valueOf(Base.java:72)
at lib.recordextended.MDRecordReferenceProvider.getReferenceBase(MDRecordReferenceProvider.java:50)
at lib.data.storage.container.SimpleMDReferenceProvider.addRecordExtended(SimpleMDReferenceProvider.java:66)
at lib.data.storage.container.ComplexSharedStorage.addRecordExtended(ComplexSharedStorage.java:47)
at lib.data.storage.container.UnstrandedCacheContainter.process(UnstrandedCacheContainter.java:47)
at lib.data.storage.container.AbstractStrandedCacheContainer.process(AbstractStrandedCacheContainer.java:80)
at lib.data.assembler.SiteDataAssembler.buildCache(SiteDataAssembler.java:57)
at lib.util.ReplicateContainer.createIterators(ReplicateContainer.java:49)
at lib.util.ConditionContainer.lambda$0(ConditionContainer.java:29)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at lib.util.ConditionContainer.updateWindowCoordinates(ConditionContainer.java:29)
at lib.worker.AbstractWorker.updateReservedWindowCoordinate(AbstractWorker.java:166)
at lib.worker.AbstractWorker.processInit(AbstractWorker.java:182)
at lib.worker.AbstractWorker.run(AbstractWorker.java:214)

The offending read:

samtools view sorted/mysample.Aligned.out.srt.rg-added.dedup.calmd.bam | grep HJ7K5ALXX | grep E00515 | grep "E00515:95:HJ7K5ALXX:5:2215:25246:23636"
E00515:95:HJ7K5ALXX:5:2215:25246:23636 163 chr1 127333446 255 144M = 127333874 578 ATCACTACTAGATAGTACATCCTTATGGATCTGCAGAAATCTGCTCCAAAGGGGTGGGCTATACTTAGTGATTGTTATATATGTTTAACAGTAACAGGAAATGCATATTAACAGCAGGAATCTTTCCTGAAAGAATCCATTACA AAFFFJFJJFFJFJFFFJJJJJJJJJA<A--<FFJAFJJJJJ7JF<-FFJ<F<-AF--AFJJFFJJJ<7JJ<AFF<FJFJJA7<FJFJJFJFFF<F-AAFJF--<AAAJFAAFJAA7)AAAFJAJ---7<<F-<A-<--7<<-7 PG:Z:MarkDuplicates RG:Z:id NH:i:1 HI:i:1 nM:i:1 AS:i:290 NM:i:1 MD:Z:138c5
E00515:95:HJ7K5ALXX:5:2215:25246:23636 83 chr1 127333874 255 150M = 127333446 -578 GAAAGTGCCTTTTATTTGATATTGGAATGGCTATTCAAGCTTGTTTCTTGGGACCATCTGCATGGAAAATTGTTTTCCAGCTCTTTACTCTGAGGTGGGGTTGGTCTTTGTCACTGTGGTAGATTTCCTGTATGCAGTAAAATGCTGGGT JJJJJJJJFJJJJJJJJJJJFJJJJJJJJJJJJJJJJJFJJJJJJJJJJJFJJFJJJFJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA PG:Z:MarkDuplicates RG:Z:id NH:i:1 HI:i:1 nM:i:1 AS:i:290 NM:i:0 MD:Z:150

What pileup filters to use inside rcoverage

JacusaHelper for RRD

What functionalities are available in JacusaHelper for looking at RRD data? The documentation says that AddBaseChangeInfo, AddEditingFrequencyInfo, etc. are only applicable for RDD data. Why is this the case? Is there something I can run on RRD results to get a similar well-summarized output of number and location of putative A to G editing sites, for example?

Thanks.

Zip Exceptions, String Exception Failure

I think I figured out the problem

unclear -u options

Hi,

Again, many thanks for providing nice software.

Can I clarify the jacusa2 input parameters for -u?

java -Xmx2g -jar /mnts/bioinfo/src/JACUSA_v2.0.0-RC5.jar call-2 -u DirMult
= OK

java -Xmx2g -jar /mnts/bioinfo/src/JACUSA_v2.0.0-RC5.jar call-2 -u calcPvalue
java.lang.IllegalArgumentException: Unknown statistic or wrong option: calcPvalue

Also, the help could be improved by clarifying how the arguments are correctly combined:

i.e.
Default mode
-u DirMult
Calculate a pvalue based on a chi^2 approximation of the likelihood
-u calcPvalue
How do we, for instance, change any params in this part?
-u calcPvalue, showAlpha
-u DirMult, epsilon=0.001

-u Choose between different modes (Default: DirMult):
DirMult Compound Error (estimated error {0.01} + phred score)
| Adjusts variant condition
| :epsilon Fit achieved if |L1 - L2| < epsilon, where L1 and L2
correspond to old
| and new likelihood respectively.
| Default: 0.001
| :maxIterations Maximum number of iterations for Newton's method.
| Default: 100
| :calcPvalue Calculate a pvalue based on a chi^2 approximation of the
likelihood
| ratio
| :showAlpha Show detailed info of Newton's method in output (not in VCF
output).

Thanks

Compilation from GitHub

Hi,

I would like to compile the latest JACUSA GitHub commit but I wasn't able to find any instructions on how to do so. I have tried to manually compile random files but of course, it didn't work. Do you think it would be possible to add a few lines to the manual on how to compile the tool from git clone download?

Thanks!

Install from github?

Hi,
Adding the option to install directly from github would be excellent.

R.3.5.0.>library(devtools)
R.3.5.0.>install_github("dieterich-lab/JACUSA/tree/master/JacusaHelper")
Downloading GitHub repo dieterich-lab/JACUSA@master
from URL https://api.github.com/repos/dieterich-lab/JACUSA/zipball/master
Installation failed: Does not appear to be an R package (no DESCRIPTION)

java.lang.IllegalArgumentException: Byte 77 unknown

I'm working on the human genome (from ncbi) and I get this error message.


  INFO          00:07:20  Thread 1: Working on contig NC_000001.11:248300969-248400968
  INFO          00:07:20  Thread 1: Working on contig NC_000001.11:248676662-248776661
  java.lang.IllegalArgumentException: Byte 77 unknown
        at lib.util.Base.valueOf(Base.java:74)
        at lib.data.storage.container.FileReferenceProvider.getReferenceBase(FileReferenceProvider.java:90)
        at lib.data.storage.container.FileReferenceProvider.getReferenceBase(FileReferenceProvider.java:68)
        at lib.data.assembler.DataAssembler.createDefaultDataContainer(DataAssembler.java:50)
        at lib.data.assembler.DataAssembler.assembleData(DataAssembler.java:39)
        at lib.util.ReplicateContainer.getNullDataContainer(ReplicateContainer.java:61)
        at lib.util.ConditionContainer.getNullDataContainer(ConditionContainer.java:42)
        at jacusa.worker.CallWorker.createParallelData(CallWorker.java:41)
        at lib.worker.AbstractWorker.hasNext(AbstractWorker.java:111)
        at lib.worker.AbstractWorker.processReady(AbstractWorker.java:196)
        at lib.worker.AbstractWorker.run(AbstractWorker.java:213)

The log show that the error seems to occur at the end of the first analyzed sequence (NC_000001.11)
Any help would be greatly appreciated

java.lang.IndexOutOfBoundsException & Exception in thread "Thread-25" java.lang.StackOverflowError

Hi,
I am using JACUSA for my project.
I have gDNA (from bwa aligner) and cDNA (from STAR aligner) bam files sorted and indexed.
I used the following command with java 1.7.
(hs.bed contains chromosome 1 to Y)

java -Xmx50G -jar JACUSA_v1.3.0.jar call-2 -a H:1,D -b hs37d5.bed -p 26 -r rddsnov27aoption.out -s gDNA.bam cDNA.bam &> jacnov27aoption.log

then I got the following errors. The process was stuck and there was no output but only tmp.gz files

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at java.util.Collections$UnmodifiableList.get(Collections.java:1211)
at jacusa.filter.storage.DistanceFilterStorage.processRecord(DistanceFilterStorage.java:39)
at jacusa.pileup.builder.AbstractPileupBuilder.processRecord(AbstractPileupBuilder.java:358)
at jacusa.pileup.builder.AbstractPileupBuilder.adjustWindowStart(AbstractPileupBuilder.java:178)
at jacusa.pileup.iterator.AbstractWindowIterator.adjustWindowStart(AbstractWindowIterator.java:155)
at jacusa.pileup.iterator.AbstractWindowIterator.adjustCurrentGenomicPosition(AbstractWindowIterator.java:148)
at jacusa.pileup.iterator.TwoSampleIterator.hasNext(TwoSampleIterator.java:38)
at jacusa.pileup.worker.AbstractWorker.processParallelPileupIterator(AbstractWorker.java:183)
at jacusa.pileup.worker.AbstractWorker.run(AbstractWorker.java:67)

Exception in thread "Thread-25" java.lang.StackOverflowError
at org.apache.commons.math3.special.Gamma.digamma(Gamma.java:446)
at org.apache.commons.math3.special.Gamma.digamma(Gamma.java:461)
at org.apache.commons.math3.special.Gamma.digamma(Gamma.java:461)
at org.apache.commons.math3.special.Gamma.digamma(Gamma.java:461)

Complete error log attached as pdf below.

jacusaerrorforgitissue.pdf

Please suggest as how these errors could be resolved.
Thank you!
Priya

Jacusa command log in output file header

Hi,

For keeping track of things it would be super to have a copy of the command line options choosen in the first line of any jacusa output file, with ##. Helps keep track of versioning, databases used, etc.

Cheers
/Alistair

bed format error

processing BED files with header produces an error (solution of using a BED with no header works).

This breaks the processing:

(the example from http://genome.ucsc.edu/FAQ/FAQformat#format1)
track name=pairedReads description="Clone Paired Reads" useScore=1
chr1 3073253 3079322

INFO 00:00:00 Computing overlap between sequence records.
java.lang.ArrayIndexOutOfBoundsException: 1
at lib.util.coordinate.provider.BedCoordinateProvider.read(BedCoordinateProvider.java:53)
at lib.util.coordinate.provider.BedCoordinateProvider.(BedCoordinateProvider.java:33)
at lib.util.AbstractMethod.initCoordinateProvider(AbstractMethod.java:194)
at lib.cli.CLI.processArgs(CLI.java:181)
at lib.util.AbstractTool.run(AbstractTool.java:53)
at jacusa.JACUSA.main(JACUSA.java:99)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61)

JACUSA2 Version: 2.0.0-RC5 call-1 -b mm10_ALL_GENES_plus_minus_5000bp_pos_v2.bed -r tmp ../sorted/myfile.Aligned.out.srt.rg-added.dedup.bam

java.lang.NullPointerException
at lib.worker.WorkerDispatcher.hasNext(WorkerDispatcher.java:73)
at lib.worker.WorkerDispatcher.run(WorkerDispatcher.java:80)
at lib.util.AbstractTool.run(AbstractTool.java:65)
at jacusa.JACUSA.main(JACUSA.java:99)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61)

java.lang.Exception: Sequence Dictionaries of BAM files do not match

Hi,
I have been trying to use JACUSA using a gDNA bam and cDNA bam file with the command below:

java -jar JACUSA_v1.3.0.jar call-2 -a H:1,B,Y -c 10 -F 1024 -f B -P RF-FIRSTSTRAND,RF-FIRSTSTRAND -p 2 -r rdds_out gDNA.bam cDNA.bam

The RNA is stranded and the DNA is whole exome, both files are sorted, duplicates have been marked and they are indexed.

I have tested it with the example data you provided and it worked but now when using my own data I get the following error:

OutputWriter: rdds_out
[ INFO ] 00:00:00 : Computing overlap between sequence records.
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61)
Caused by: java.lang.Exception: Sequence Dictionaries of BAM files do not match
at jacusa.method.AbstractMethodFactory.getSAMSequenceRecords(AbstractMethodFactory.java:168)
at jacusa.method.call.TwoSampleCallFactory.initCoordinateProvider(TwoSampleCallFactory.java:273)
at jacusa.JACUSA.main(JACUSA.java:238)
... 5 more

Do you have any suggestions on how this could be resolved?

Best wishes,
Kerstin

How can I get the files of gDNA.bam and cDNA.bam?I don't understand what it is.Sorry ,I'm just a beginner.

Stranded data:

Hi,

Just writing for a definition: In the manual, you guys write:
FR-FIRSTSTRAND STRANDED library - first strand sequenced,
FR-SECONDSTRAND STRANDED library - second strand sequenced, and
UNSTRANDED UNSTRANDED library.

So we have generated data that are sequenced with the KAPA RNA Hyperprep Kit and works like this:

1st strand cDNA synthesis using random priming;
combined 2nd strand synthesis and A-tailing, which
converts the cDNA:RNA hybrid to double-stranded
cDNA (dscDNA), incorporates dUTP into the second
cDNA strand for stranded RNA sequencing, and adds
dAMP to the 3' ends of the resulting dscDNA;
adapter ligation, where dsDNA adapters with 3' dTMP
overhangs are ligated to library insert fragments; and
library amplification, to amplify library fragments
carrying appropriate adapter sequences at both ends
using high-fidelity, low-bias PCR. The strand marked
with dUTP is not amplified, allowing strand-specific
sequencing.

That means that the FR-SECONDSTRAND STRANDED library - second strand sequenced is the right option for us? In other words, this option means that the second strand will be the identical to the RNA seq ?

Thanks a bunch,

How is Phasing handled?

For example if I have the following reference sequence with two variants/mutations:

How would this look in the VCF output? Does it matter if mutations come from the same read? I assume that the output would look like:

1 12345 C T ...
1 12348 T A ...

How do we count low quality bases and how do they influence read end?

With base call quality filtering set to 20
and given the following Read:

Position: 1 2 3 4
Base calls: A C G T
Base Quality: 40 40 40 10

How are low quality base calls treated?
And how do they influence the read end? If at all.

Could it be that low quality bases carry RT arrest information?!

Test data

Hello,

Would it be possible to make the test data available again - the https://cloud.dieterichlab.org/index.php/s/349PMjCdJl4wUwV link is not working currently.

Many thanks,
David

What is a "rare event" filter?

See title

sample in silico data links in "April 8th, 2019" manual not working?

Hello,

These links such as "http://www.age.mpg.com/software/jacusa/sample_data/hg19_chr1_gDNA_VS_cDNA.tar.gz" for "hg19_chr1_gDNA_VS_cDNA.tar.gz" do not work?

Changing the .com to .de did not work.

sorry if files are available somewhere else and I have not seen. In the "Test data" issue it stated that they were updated.

regards,
Paul

Homopolymer filtering

For the Y pileup filter. The documentation states "Filter wrong variant calls in the vicinity of homopolymers. Default 7 (Y:length)". What do you mean by in the vicinity? Does that mean that the variant is embedded within a homopolymer of length at least 7 or does it mean that the variant could be outside of the homopolyer (in it's vicinity implies outside)? The length I assume is the minimum length of the homopolymer.

Update manual - test-statistic

move relevant parts from supplementary material to manual

Wrong function names?

JACUSA/src/jacusa/pileup/builder/FRPairedEnd1PileupBuilderFactory.java

Line 5 in 2d93ebb

import jacusa.pileup.builder.inverted.FRPairedEnd1InvertedPileupBuilder;

java.lang.ArrayIndexOutOfBoundsException

Is it normal to have these kinds of exceptions in the standard out or standard error? I have them in most of my output files.

java.lang.ArrayIndexOutOfBoundsException: 42
        at jacusa.pileup.builder.WindowCache.addHighQualityBaseCall(WindowCache.java:72)
        at jacusa.pileup.builder.UnstrandedPileupBuilder.addHighQualityBaseCall(UnstrandedPileupBuilder.java:65)
        at jacusa.pileup.builder.AbstractPileupBuilder.processAlignmentMatch(AbstractPileupBuilder.java:537)
        at jacusa.pileup.builder.AbstractPileupBuilder.processRecord(AbstractPileupBuilder.java:401)
        at jacusa.pileup.builder.AbstractPileupBuilder.adjustWindowStart(AbstractPileupBuilder.java:178)
        at jacusa.pileup.iterator.AbstractWindowIterator.initLocation(AbstractWindowIterator.java:66)
        at jacusa.pileup.iterator.AbstractTwoSampleIterator.<init>(AbstractTwoSampleIterator.java:41)
        at jacusa.pileup.iterator.TwoSampleIterator.<init>(TwoSampleIterator.java:26)
        at jacusa.pileup.worker.TwoSampleCallWorker.buildIterator(TwoSampleCallWorker.java:44)
        at jacusa.pileup.worker.AbstractWorker.run(AbstractWorker.java:66)

command:

java -jar jacusa call-2 -P UNSTRANDED,UNSTRANDED -a B,S,D,H:1,I,Y,L,M -F 1024 -f V -r jacusa.vcf -p 6 wgs_align.bam rnaseq_nodups.bam &> jacusa.log

Update manual - test-statisic

move relevant parts from supplementary to manual

Asterisks in FILTER and INFO column of VCF output

As the title describes, I see asterisks in the FILTER and INFO fields in the vcf output, for example:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  wgs_tumor.bam    nodups.bam
1       14574   .       A       G       .       *       *       DP:BC   5:5,0,0,0       18:4,0,14,0
1       14590   .       G       A       .       *       *       DP:BC   5:0,0,5,0       19:12,0,7,0
1       14599   .       T       A       .       B;D     *       DP:BC   7:0,0,0,7       11:4,0,0,7
1       14673   .       G       C       .       H       *       DP:BC   8:0,2,6,0       28:0,1,27,0

I've read the VCF file format specification and nowhere can I find what '*' indicates in the output. I see that if FILTER is not 'PASS' then a semi-colon separated list of codes for filters that failed will be shown. Does this mean that lines with '*' in their FILTER column failed all filters? What about the INFO column? I see asterisks all the way down the file in the INFO column.

store read arrest and read through

Report data for sites not called as edited?

Hi,

It would be really useful to have depth and general information for a site regardless of if it is edited or not. Most importantly for call-1, but also for other modes.
Why? Because knowing the depth of reads in a position, from a sample with theoretically no editing (a good genetic control), allows "no editing, good read depth" vs "no editing - was it lack of read depth OR lack of editing?"

Cheers
/Alistair

File format for rcoverage

support for replicates
support in JacusaHelper
store read arrest and read through counts
sture Beta-Statistic and p-value

java.lang.OutOfMemoryError: GC overhead limit exceeded error

Hi,
Thank you for developing this software which looks promising.

I tried to launch an analysis with Jacusa on an RNAseq dataset:

14 libraries of ~30M paired reads of species A (4 different tissues)
14 libraries of ~30M paired reads of species B (4 different tissues)

Alignments were produced with GEM (http://algorithms.cnag.cat/wiki/The_GEM_library)
I'm trying to identify SNPs segregating the 2 species sequenced (considering all tissues, all replicates), so I try this command:

java -jar JACUSA_v1.2.0.jar call-2 -p 30 -r test.res $(ls ../data/bams/with_RG_no_dup/23*bam | paste -sd ',') $(ls ../data/bams/with_RG_no_dup/25*bam | paste -sd ',') &> std_out_err.txt &

Everything apparently ran smoothly for 7 hours and then I got this error for several threads:
Exception in thread "Thread-8" java.lang.OutOfMemoryError: GC overhead limit exceeded

And then a "java.lang.ArrayIndexOutOfBoundsException" for other threads.

I attached the log of the failed run.

I would try to fix this by setting: java -Xms1024M -Xmx2048M (considering this, I'll be running 20 threads on a 64Gb RAM machine).

In case you have another proposition, please let me know.
Thanks !

Etienne
std_out_err.txt

Add rcoverage method and CLI options

unexpected command line complaint with -r

java -Xmx6g -jar JACUSA_v2.0.0-RC5.jar call-1 -b Sppl2a.bed -f V -r Sppl2a.jacusa2.vcf myfile.bam
INFO 00:00:00 Computing overlap between sequence records.

JACUSA2 Version: 2.0.0-RC5 call-1 -b Sppl2a.bed -f V -r Sppl2a.jacusa2.vcf myfile.bam

java.lang.IllegalArgumentException: Cannot set a file type if the output is not to a file.
at htsjdk.variant.variantcontext.writer.VariantContextWriterBuilder.setOutputFileType(VariantContextWriterBuilder.java:185)
at jacusa.io.format.call.VCFcallWriter.(VCFcallWriter.java:59)
at jacusa.io.format.call.VCFcallFormat.createWriter(VCFcallFormat.java:41)
at lib.worker.WorkerDispatcher.(WorkerDispatcher.java:53)
at lib.util.AbstractMethod.getWorkerDispatcherInstance(AbstractMethod.java:60)
at lib.util.AbstractTool.run(AbstractTool.java:64)
at jacusa.JACUSA.main(JACUSA.java:99)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61)

How do we define pseudocounts in rt-arrest?

In cases where no read through or read end can be observed at a specific position we need to define pseudocount(s).

Currently, JACUSA 2.x assumes a constant pseudocount of 1 for
read through and arrest. (1 each)

Can we do better?
Considering read ends in the neighborhood of the current position?

Example of valid pileup call for JACUSA

Hi Michael,
would it be possible to provide a valid example of a command line call of JACUSA's pileup mode?
I couldn't find any in the manual or elsewhere.

No special parameters, just the BAM file as input.
I tried:
java -jar /home/ralf/JACUSA_v1.2.0.jar pileup /home/ralf/mybam.sorted.bam
Tis just displayed the help page.

How would the command line call look like, if all reference positions (also non-mismatches) are desired as output?

Many thanks
Ralf

use library type to correctly infer read end

JacusaHelper update

Add Tutorial to the manual
Move "obsolet" pileup filter in JACUSA to JacusaHelper RareVariant, MinDifference

Dirichlet-Multinomial -> Beta-Multinomial for read arrest and read through

Merge JACUSA 1.x and JACUSA 2.x

Explanation about JACUSA output

Hi,

I am doing RDDs with JACUSA (working great !)

My test statistics scores range form 0.001 - 300.
Is this score meaningful when working without replicates?
What would be a descent/acceptable minimum (10, 100, 200)?
In the manual you mention 'base IJ columns indicate inverted base count if on negative strand’.
In this case, is the vector (A,C,G,T) inverted for RNA sample (FR-FIRSTSTRAND) on minus strand as (T,G,C,A)?
Is the following example correctly interpreted for the minus strand ('115' corresponds to C or G)?

stat	strand	bases11	        bases21		        DNA - A	DNA - C	DNA - G	DNA - T	RNA - A	RNA - C	RNA - G	RNA - T
175.09	+	0,615,0,0	0,0,0,109	=>	0	615	0	0	0	0	0	109
287.89	-	399,0,0,0	0,0,115,0	=>	399	0	0	0	0	115	0	0

About the vcf output, does the ALT base reported is the one with the highest number of reads in samples 2 (after the REF ones)?

Thanks !

Exception in thread "Thread-0" java.lang.NullPointerException

Hello,
I have a sorted and indexed BAM file from an experimental sample which I am attempting to probe for editing activity (specifically ADAR1 mediated A to G editing). My BAM was aligned to the mm10 genome, and allowed multi-mapping.
I am looking in specific bed files corresponding to ERVs for mismatches. I use a command like the following:
I am attempting to use the following command:
java -jar JACUSA_v1.3.0.jar call-1 -b my_bed_file -r JACUSA.out my_bam_file

I end up with the following error message:
Exception in thread "Thread-0" java.lang.NullPointerException
at jacusa.pileup.builder.AbstractPileupBuilder.(AbstractPileupBuilder.java:57)
at jacusa.pileup.builder.UnstrandedPileupBuilder.(UnstrandedPileupBuilder.java:26)
at jacusa.pileup.builder.UnstrandedPileupBuilderFactory.newInstance(UnstrandedPileupBuilderFactory.java:20)
at jacusa.pileup.iterator.AbstractWindowIterator.createPileupBuilders(AbstractWindowIterator.java:94)
at jacusa.pileup.iterator.AbstractOneSampleIterator.(AbstractOneSampleIterator.java:28)
at jacusa.pileup.iterator.OneSampleIterator.(OneSampleIterator.java:24)
at jacusa.pileup.worker.OneSampleCallWorker.buildIterator(OneSampleCallWorker.java:37)
at jacusa.pileup.worker.OneSampleCallWorker.buildIterator(OneSampleCallWorker.java:1)
at jacusa.pileup.worker.AbstractWorker.run(AbstractWorker.java:66)

I also receive a number of exceptions like the following:
java.lang.IllegalStateException: No MD field present for SAMRecord: A00873:245:HL2FWDSXY:1:1345:19416:14826_GGTTT 1/2 95b aligned read.

Is there something obvious I'm missing and needs to be fixed?