Comments (8)
I see :) let’s see if this happens for others and I'll revisit then.
from ultra.
Do you have the possibility to share/link to your test data so that I can see if I can reproduce it?
from ultra.
I recreated the alz.chunk3_tama.fa
fasta file.
You'll find everything here.
from ultra.
Thank you! downloaded the data and ran with your command on my MacBook. I could not reproduce the bug.
I ran both with head of master and head of develop (commit c0ac67c and commit 8313e9b, respectively)
This is my call:
./uLTRA pipeline --t 2 --prefix alz.chunk3 --isoseq --disable_infer ~/Downloads/ultra_bug/Homo_sapiens.GRCh38.dna.chromosome.19.fasta ~/Downloads/ultra_bug/Homo_sapiens.GRCh38.104.chr.13_18_19.gtf ~/Downloads/ultra_bug/alz.chunk3_tama.fa ~/Downloads/ultra_bug/ultra_out
This is my output log of uLTRA:
creating /Users/kxs624/Downloads/ultra_bug/ultra_out
total_flanks2: 20352
total_flank_size 20395551
total_unique_segment_counter 7303161
total_segments_bad 5517888
bad 89736
total parts size: 7391334
total exons size: 20040918
min_intron: 1
Number of ref seqs in gff: 15570
Number of ref seqs in fasta: 1
6622546.0 Unique kmers in reference part sequences with abundance > 1
AAAAAAAAAAAAAAAAAAAA 538
TTTTTTTTTTTTTTTTTTTT 496
CCTCCCAAAGTGCTGGGATT 385
GCCTCCCAAAGTGCTGGGAT 382
[...]
ACGGTGAAACCCCGTCTCTA 200
GAGTGCAGTGGCGTGATCTC 200
AAAAAATTAGCCGGGCGTGG 200
CCAAAGTGCTAGGATTACAG 200
CACACCCAGCTAATTTTTGT 200
GATTGCGCCACTGCACTCCA 200
CCAGGAGGTGGAGGTTGCAG 200
11038 11038 out of 20352 sequences has been modified in masking step.
<multiprocessing.context.SpawnContext object at 0x7fc20382c820>
Environment set: <multiprocessing.context.SpawnContext object at 0x7fc20382c820>
Using 2 cores.
Filtering reads aligned to unindexed regions with minimap2
Done filtering. Reads filtered:21
batch nt: 97383 total_nt: 194765
37
36
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Time for slaMEM to find mems:45.91997790336609 seconds.
Starting aligning reads.
Nr reads: 73 nr batches: 2 [37, 36]
READ 37 RECORDS.
READ 37 RECORDS.
Number of instances solved with quadratic collinear chainer solution: 84
Number of instances solved with n*log n collinear chainer solution: 15
READ 36 RECORDS.
READ 36 RECORDS.
Number of instances solved with quadratic collinear chainer solution: 103
Number of instances solved with n*log n collinear chainer solution: 8
Time elapesd multiprocessing: 1.5507452487945557
Time to align reads:1.5510809421539307 seconds.
Time to merge SAM-files:0.00915384292602539 seconds.
21 reads were not attempted to be aligned with ultra, instead alternative aligner was used.
6 reads with primary alignments were replaced with alternative aligner because they were unaligned with uLTRA.
1 primary alignments had best fit with uLTRA.
4 primary alignments had equal fit.
3 primary alignments had best fit with alternative aligner.
Table of score-difference between alignment methods (negative number: alternative aligner better fit, positive number is uLTRA better fit)
Score is calculated as sum(identities) - sum(ins, del, subs)
Format: Score difference: Number of primary alignments
[-4294967296 - -100): 2
[-100 - -50): 0
[-50 - -20): 1
[-20 - -10): 0
[-10 - -5): 0
[-5 - -4): 0
[-4 - -3): 0
[-3 - -2): 0
[-2 - -1): 0
[-1 - 0): 0
[0 - 1): 4
[1 - 2): 0
[2 - 3): 0
[3 - 4): 0
[4 - 5): 0
[5 - 10): 0
[10 - 20): 0
[20 - 50): 0
[50 - 100): 0
[100 - 4294967296): 1
defaultdict(<class 'int'>, {'unaligned': 65, 'FSM': 4, 'NO_SPLICE': 2, 'Insufficient_junction_coverage_unclassified': 1, 'NIC_novel': 1})
total alignment coverage: 7.350311311286634
Deleting temporary files...
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_0_rc.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_1.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_0.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_1_rc.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_1.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_0_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_0.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_1_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_0_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_1.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_1_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_0.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_0.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_1_rc.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_1.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_0.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_1.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_0_rc.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/minimap2.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/minimap2_errors.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/uLTRA_batch_0.stderr
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/uLTRA_batch_1.stderr
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_after_genomic_filtering.fasta
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/indexed.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/unindexed.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/refs_sequences.fa
Done.
from ultra.
My version of python (that executes uLTRA) is Python 3.8.2.
from ultra.
I'm not surprised, it seems to be very random :/
On the 22 To of isoseq a analyzed, I noticed some failing uLTRA jobs, but each time they succeeded after resubmit.
uLTRA is install packed with python 3.9.13.
from ultra.
ouch..
I wonder if it is because of slaMEM that uLTRA fails (as this is a package I have no control/understanding about). If --use_NAM_seeds
option is used, which calls StrobeMap
instead, I would be the developer of all the steps in uLTRA, but --use_NAM_seeds
is slightly lower in accuracy than the default. Still debating if it is worth the tradeoff (would probablyneed eval on more datasets to make this decision).
from ultra.
From my point of view, it's not a big deal.
It's a very small number of failing job, 0, 1, 2, 3 on 300 jobs. And as I said, they succeed at the second try.
The only annoying point (in nf-core/isoseq context) is that the job do not stop and is killed by the walltime.
It won't stop me to use it :)
from ultra.
Related Issues (20)
- ultra installation and run error HOT 7
- CIGAR string starts/ends with N HOT 7
- Mapping with uLTRA without GTF? HOT 3
- Cigar is None HOT 2
- Controlling (high) uLTRA RAM usage HOT 1
- a bug of `--alignment_threshold` HOT 1
- Out of bound reads HOT 13
- Genomes FASTA/GTF files needed to run the evaluation HOT 4
- Can not access local variable 'read_mems_tmp' when using --use_NAM_seeds HOT 2
- Error: invalid feature coordinates (end<start!) at line: HOT 1
- Bug with uLTRA align : TypeError: bad argument type for built-in operation HOT 3
- UnboundLocalError: local variable 'i' referenced before assignment HOT 4
- Can I use ultra to align est to references HOT 1
- error when aligning direct RNA data during revcomp script HOT 3
- KeyError when running test pipeline HOT 3
- How to control minimap 2 parameters during uLTRA alignment HOT 6
- uLTRA + SQANTI3 HOT 2
- Non-absolute paths don't resolve HOT 5
- Python bindings? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ultra.