Giter VIP home page Giter VIP logo

ultra's Introduction

uLTRA

install with bioconda Build Status

uLTRA is a tool for splice alignment of long transcriptomic reads to a genome, guided by a database of exon annotations. uLTRA is particularly accurate when aligning to small exons see some examples.

uLTRA is distributed as a python package supported on Linux / OSX with python (versions 3.4 or above).

Table of Contents

INSTALLATION

Conda recipe

There is a bioconda recipe, docker image, and a singularity container of uLTRA created by sguizard. You can use, e.g., the bioconda recipe for an easy automated installation.

Alternative ways of installations are provided below.

Using the INSTALL.sh script

You can clone this repository and run the script INSTALL.sh as

git clone https://github.com/ksahlin/uLTRA.git --depth 1
cd uLTRA
./INSTALL.sh <install_directory>

The install script is tested in bash environment.

To run uLTRA, you need to activate the conda environment "ultra":

conda activate ultra

Without the INSTALL.sh script

You can also manually perform below steps for more control.

1. Create conda environment

Create a conda environment called ultra and activate it

conda create -n ultra python=3 pip 
conda activate ultra

2. Install uLTRA

pip install ultra-bioinformatics

3. Install third party tools

Install namfinder and minimap2 and place the generated binaries namfinder and minimap2 in your path.

4. Verify installation

You should now have 'uLTRA' installed; try it

uLTRA --help

Upon start/login to your server/computer you need to activate the conda environment "ultra" to run uLTRA as:

conda activate ultra

You can also download and use test data available in this repository here and run:

uLTRA pipeline [/your/full/path/to/test]/SIRV_genes.fasta  \
               /your/full/path/to/test/SIRV_genes_C_170612a.gtf  \
               [/your/full/path/to/test]/reads.fa outfolder/  [optional parameters]

Entirly from source

Make sure the below-listed dependencies are installed (installation links below). All below dependencies except namfinder can be installed as pip install X or through conda.

With these dependencies installed. Run

git clone https://github.com/ksahlin/uLTRA.git
cd uLTRA
./uLTRA

USAGE

uLTRA can be used with either PacBio Iso-Seq or ONT cDNA/dRNA reads.

Indexing

uLTRA index genome.fasta  /full/path/to/annotation.gtf  outfolder/  [parameters]

Important parameters:

  1. --disable_infer can speed up the indexing considerably, but it only works if you have the gene feature and transcript feature in your GTF file.

Aligning

For example

uLTRA align genome.fasta reads.[fa/fq] outfolder/  --ont --t 8   # ONT cDNA reads using 8 cores
uLTRA align genome.fasta reads.[fa/fq] outfolder/  --isoseq --t 8 # PacBio isoseq reads

Important parameters:

  1. --index [PATH]: You can set a custom location of where to get the index from using, otherwise, uLTRA will try to read the index from the outfolder/ by default.
  2. --prefix [PREFIX OF FILE]: The aligned reads will be written to outfolder/reads.sam unless --prefix is set. For example, --prefix sample_X will output the reads in outfolder/sample_X.sam.

Pipeline

Perform all the steps in one

uLTRA pipeline genome.fasta /full/path/to/annotation.gtf reads.fa outfolder/  [parameters]

Common errors

Not having a properly formatted GTF file. Before running uLTRA, notice that it reqires a properly formatted GTF file. If you have a GFF file or other annotation format, it is adviced to use AGAT for file conversion to GTF as many other conversion tools do not respect GTF format. For example, you can run AGAT as:

agat_convert_sp_gff2gtf.pl --gff annot.gff3 --gtf annot.gtf

CREDITS

Please cite

  1. Kristoffer Sahlin, Veli Mäkinen, Accurate spliced alignment of long RNA sequencing reads, Bioinformatics, Volume 37, Issue 24, 15 December 2021, Pages 4643–4651, https://doi.org/10.1093/bioinformatics/btab540

when using uLTRA. Please also cite minimap2 as uLTRA incorporates minimap2 for alignment of some genomic reads outside indexed regions. For example "We aligned reads to the genome using uLTRA [1], which incorporates minimap2 [CIT].".

LICENCE

GPL v3.0, see LICENSE.txt.

ultra's People

Contributors

ksahlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ultra's Issues

Error: invalid feature coordinates (end<start!) at line:

hello ,
I want to mapping gencode.v43.transcripts.fa and GRCh38.primary_assembly.genome.fa, I have some problems during using uLTRA, I would like to ask why there is such an error?

code:

 uLTRA index  --disable_infer  /home/data/t050326/reference/GRCh38.primary_assembly.genome.fa /home/data/t050326/reference/new/gencode.v43.annotation.gtf  /home/data/t050326/index/uLTRA
uLTRA align --t 10 --index ~/index/uLTRA --prefix ultra --max_intron 1000000  ~/reference/GRCh38.primary_assembly.genome.fa ~/reference/gencode.v43.transcripts.fa  ~/result/uLTRA/t1
samtools view -bS ultra.sam > ultra.bam
bedtools bamtobed -bed12 -i ~/result/uLTRA/t1/ultra.bam > ~/result/uLTRA/t1/ultra.bed12
bedToGenePred ~/result/uLTRA/t1/ultra.bed12 ~/result/uLTRA/t1/ultra.GenePred
genePredToGtf -utr -honorCdsStat file ~/result/uLTRA/t1/ultra.GenePred ~/result/uLTRA/t1/ultra.gtf
gffcompare -T -r ~/reference/gencode.v43.annotation.gtf -o ~/result/uLTRA/gffcompare/t1/ultra ~/result/uLTRA/t1/ultra.gtf

warning:


Error: invalid feature coordinates (end<start!) at line:
chr2    /home/data/t050326/result/uLTRA/t1/ultra.GenePred       exon    111196368       111196367       .       -       .       gene_id "ENST00000642451.1|ENSG00000222041.13|OTTHUMG00000130273.15|OTTHUMT00000330387.4|CYTOR-216|CYTOR|7512|lncRNA|_2"; transcript_id "ENST00000642451.1|ENSG00000222041.13|OTTHUMG00000130273.15|OTTHUMT00000330387.4|CYTOR-216|CYTOR|7512|lncRNA|_2"; exon_number "1"; exon_id "ENST00000642451.1|ENSG00000222041.13|OTTHUMG00000130273.15|OTTHUMT00000330387.4|CYTOR-216|CYTOR|7512|lncRNA|_2.1";
Error: invalid feature coordinates (end<start!) at line:
chr2    /home/data/t050326/result/uLTRA/t1/ultra.GenePred       exon    95834978        95834977        .       -       .       gene_id "ENST00000612307.4|ENSG00000277701.5|OTTHUMG00000188202.4|OTTHUMT00000476587.2|ENST00000612307|ENSG00000277701|530|lncRNA|_3"; transcript_id "ENST00000612307.4|ENSG00000277701.5|OTTHUMG00000188202.4|OTTHUMT00000476587.2|ENST00000612307|ENSG00000277701|530|lncRNA|_3"; exon_number "4"; exon_id "ENST00000612307.4|ENSG00000277701.5|OTTHUMG00000188202.4|OTTHUMT00000476587.2|ENST00000612307|ENSG00000277701|530|lncRNA|_3.4";
Error: invalid feature coordinates (end<start!) at line:
chr5    /home/data/t050326/result/uLTRA/t1/ultra.GenePred       exon    178039387       178039386       .       +       .       gene_id "ENST00000697483.1|ENSG00000289731.1|-|-|FAM153B-206|FAM153B|721|lncRNA|_3"; transcript_id "ENST00000697483.1|ENSG00000289731.1|-|-|FAM153B-206|FAM153B|721|lncRNA|_3"; exon_number "4"; exon_id "ENST00000697483.1|ENSG00000289731.1|-|-|FAM153B-206|FAM153B|721|lncRNA|_3.4";
Error: invalid feature coordinates (end<start!) at line:
chr5    /home/data/t050326/result/uLTRA/t1/ultra.GenePred       exon    176119655       176119654       .       +       .       gene_id "ENST00000650646.1|ENSG00000204677.13|OTTHUMG00000163459.11|OTTHUMT00000502047.1|FAM153CP-213|FAM153CP|1179|lncRNA|_2"; transcript_id "ENST00000650646.1|ENSG00000204677.13|OTTHUMG00000163459.11|OTTHUMT00000502047.1|FAM153CP-213|FAM153CP|1179|lncRNA|_2"; exon_number "15"; exon_id "ENST00000650646.1|ENSG00000204677.13|OTTHUMG00000163459.11|OTTHUMT00000502047.1|FAM153CP-213|FAM153CP|1179|lncRNA|_2.15";
Error: invalid feature coordinates (end<start!) at line:
chr10   /home/data/t050326/result/uLTRA/t1/ultra.GenePred       exon    87342323        87342322        .       -       .       gene_id "ENST00000653620.1|ENSG00000225484.7|OTTHUMG00000018585.17|OTTHUMT00000522829.1|NUTM2B-AS1-214|NUTM2B-AS1|4086|lncRNA|_2"; transcript_id "ENST00000653620.1|ENSG00000225484.7|OTTHUMG00000018585.17|OTTHUMT00000522829.1|NUTM2B-AS1-214|NUTM2B-AS1|4086|lncRNA|_2"; exon_number "3"; exon_id "ENST00000653620.1|ENSG00000225484.7|OTTHUMG00000018585.17|OTTHUMT00000522829.1|NUTM2B-AS1-214|NUTM2B-AS1|4086|lncRNA|_2.3";
Error: invalid feature coordinates (end<start!) at line:
chr10   /home/data/t050326/result/uLTRA/t1/ultra.GenePred       exon    79826485        79826484        .       -       .       gene_id "ENST00000663683.1|ENSG00000223482.10|OTTHUMG00000018671.22|OTTHUMT00000523167.1|NUTM2A-AS1-225|NUTM2A-AS1|1398|lncRNA|_2"; transcript_id "ENST00000663683.1|ENSG00000223482.10|OTTHUMG00000018671.22|OTTHUMT00000523167.1|NUTM2A-AS1-225|NUTM2A-AS1|1398|lncRNA|_2"; exon_number "5"; exon_id "ENST00000663683.1|ENSG00000223482.10|OTTHUMG00000018671.22|OTTHUMT00000523167.1|NUTM2A-AS1-225|NUTM2A-AS1|1398|lncRNA|_2.5";
Error: invalid feature coordinates (end<start!) at line:
chr16   /home/data/t050326/result/uLTRA/t1/ultra.GenePred       exon    22374887        22374886        .       +       .       gene_id "ENST00000522480.5|ENSG00000180747.16|OTTHUMG00000164329.2|OTTHUMT00000378303.1|SMG1P3-201|SMG1P3|4262|transcribed_unprocessed_pseudogene|_2"; transcript_id "ENST00000522480.5|ENSG00000180747.16|OTTHUMG00000164329.2|OTTHUMT00000378303.1|SMG1P3-201|SMG1P3|4262|transcribed_unprocessed_pseudogene|_2"; exon_number "1"; exon_id "ENST00000522480.5|ENSG00000180747.16|OTTHUMG00000164329.2|OTTHUMT00000378303.1|SMG1P3-201|SMG1P3|4262|transcribed_unprocessed_pseudogene|_2.1";
  263660 query transfrags loaded.

When I added --isoseq to the mapping process, I still got the error.

code:

uLTRA align --t 10 --index ~/index/uLTRA --prefix ultra --max_intron 1000000 --isoseq ~/reference/GRCh38.primary_assembly.genome.fa ~/reference/gencode.v43.transcripts.fa  ~/result/uLTRA/t2
samtools view -bS ultra.sam > ultra.bam
bedtools bamtobed -bed12 -i ~/result/uLTRA/t2/ultra.bam > ~/result/uLTRA/t2/ultra.bed12
bedToGenePred ~/result/uLTRA/t2/ultra.bed12 ~/result/uLTRA/t2/ultra.GenePred
genePredToGtf -utr -honorCdsStat file ~/result/uLTRA/t2/ultra.GenePred ~/result/uLTRA/t2/ultra.gtf
gffcompare -T -r ~/reference/gencode.v43.annotation.gtf -o ~/result/uLTRA/gffcompare/t2/ultra ~/result/uLTRA/t2/ultra.gtf

warning:


  252913 reference transcripts loaded.
  1443 duplicate reference transcripts discarded.
Error: invalid feature coordinates (end<start!) at line:
chr5    /home/data/t050326/result/uLTRA/t2/ultra.GenePred       exon    178039387       178039386       .       +       .       gene_id "ENST00000697483.1|ENSG00000289731.1|-|-|FAM153B-206|FAM153B|721|lncRNA|_3"; transcript_id "ENST00000697483.1|ENSG00000289731.1|-|-|FAM153B-206|FAM153B|721|lncRNA|_3"; exon_number "4"; exon_id "ENST00000697483.1|ENSG00000289731.1|-|-|FAM153B-206|FAM153B|721|lncRNA|_3.4";
  262577 query transfrags loaded.

Looking forward to your reply!

Cigar is None

<multiprocessing.context.SpawnContext object at 0x7fdd16fc21f0>
Environment set: <multiprocessing.context.SpawnContext object at 0x7fdd16fc21f0>
Using 16 cores.
Filtering reads aligned to unindexed regions with minimap2
Done filtering. Reads filtered:7587
batch nt: 16444094 total_nt: 263105501
27073
26857
27083
27130
27307
27044
26775
26847
26784
26718
26972
27147
27161
26887
27175
27244
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Time for slaMEM to find mems:1133.155166387558 seconds.
Starting aligning reads.
Nr reads: 432204 nr batches: 16 [27073, 26857, 27083, 27130, 27307, 27044, 26775, 26847, 26784, 26718, 26972, 27147, 27161, 26887, 27175, 27244]
Processed 5000 reads in batch 1
Processed 5000 reads in batch 0
Processed 5000 reads in batch 5
Processed 5000 reads in batch 4
Processed 5000 reads in batch 15
Processed 5000 reads in batch 10
Processed 5000 reads in batch 14
Processed 10000 reads in batch 0
Processed 10000 reads in batch 15
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/martin/miniconda3/envs/ultra/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/martin/miniconda3/envs/ultra/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/martin/miniconda3/envs/ultra/lib/python3.8/site-packages/modules/align.py", line 670, in align_single_helper
    return align_single(*arguments)
  File "/home/martin/miniconda3/envs/ultra/lib/python3.8/site-packages/modules/align.py", line 528, in align_single
    non_covered_regions, mam_value, mam_solution = classify_read_with_mams.main(mem_solution, ref_segment_sequences, ref_flank_sequences, parts_to_segments, \
  File "/home/martin/miniconda3/envs/ultra/lib/python3.8/site-packages/modules/classify_read_with_mams.py", line 447, in main
    add_segment_to_mam(read_seq, ref_chr_id, segment_seq, s_start, s_stop, segm_id, mam_instance, min_acc, annot_label = '_full_segment' )
  File "/home/martin/miniconda3/envs/ultra/lib/python3.8/site-packages/modules/classify_read_with_mams.py", line 312, in add_segment_to_mam
    locations, edit_distance, accuracy = edlib_alignment(exon_seq, read_seq, mode="HW", task = 'path', k = 0.4*min(len(read_seq), len(exon_seq)) )
  File "/home/martin/miniconda3/envs/ultra/lib/python3.8/site-packages/modules/classify_read_with_mams.py", line 111, in edlib_alignment
    accuracy = cigar_to_accuracy(cigar_string)
  File "/home/martin/miniconda3/envs/ultra/lib/python3.8/site-packages/modules/classify_read_with_mams.py", line 74, in cigar_to_accuracy
    result = re.split(r'[=DXSMI]+', cigar_string)
  File "/home/martin/miniconda3/envs/ultra/lib/python3.8/re.py", line 231, in split
    return _compile(pattern, flags).split(string, maxsplit)
TypeError: expected string or bytes-like object
"""
 ``
The above exception was the direct cause of the following exception:
 
Traceback (most recent call last):
  File "/home/martin/miniconda3/envs/ultra/bin/uLTRA", line 717, in <module>
    align_reads(args)
  File "/home/martin/miniconda3/envs/ultra/bin/uLTRA", line 504, in align_reads
    classifications, alignment_outfiles = align.align_parallel(read_batches, refs_id_lengths, args)
  File "/home/martin/miniconda3/envs/ultra/lib/python3.8/site-packages/modules/align.py", line 688, in align_parallel
    results =res.get(999999999) # Without the timeout this blocking call ignores all signals.
  File "/home/martin/miniconda3/envs/ultra/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
TypeError: expected string or bytes-like object

I added a try except for this line
result = re.split(r'[=DXSMI]+', cigar_string)
to print the cigar, and the cigar is None

Any ideas?

Cheers,
Martin

Non-absolute paths don't resolve

Hi! We're trying to implement this package in nf-core modules, and it's stumbling on paths not being able to resolve, unless absolute. So basically, one have to do \$(pwd)/$gtf instead of just e.g. ./$gtf. Makes it a little less clean to implement, any chance you could help with making the next version robust to this?

It seems that this place is (one of) the offending lines. I'm thinking one could put in the \$(pwd)/$gtf logic inside the repo here, to keep it clean in the nf-core implementation. Would also help with general portability of code. I'm thinking just adding os.path.abspath around the path calls would be enough:

ultra/uLTRA

Lines 63 to 64 in a6b6fb0

db = gffutils.create_db(fn, dbfn=database, force=True, keep_order=True, merge_strategy='merge',
sort_attribute_values=True)

Genomes FASTA/GTF files needed to run the evaluation

Hello,

I have few questions regarding the genome FASTA/GTF files needed in experiment.json file to run Snakefile:

  • For human, I am using GENCODE v42 instead of GENCODE v34, and for drosophila release 108 instead of 97. Is this kind of upgrade in versions is not recommended?

  • In the experiment.json file, "TRANSCRIPTOME" : "/home/kris/source/ultra/data/all_transcripts_ENSEMBL.txt", is this the same as GENCODE transcripts.fa ?

  • For the SIRV, I am not sure which GTF file I should use, in the data link provided by Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis there are six different GTF files, coud you please indicate which one I should use?

Files links:
Human:
FASTA: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/GRCh38.primary_assembly.genome.fa.gz
Transcripts FASTA: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/gencode.v42.transcripts.fa.gz
GTF: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/gencode.v42.chr_patch_hapl_scaff.annotation.gtf.gz

Drosophila melanogaster:
FASTA: https://ftp.ensembl.org/pub/release-108/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.32.dna.toplevel.fa.gz
GTF: https://ftp.ensembl.org/pub/release-108/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.32.108.gtf.gz

SIRV:
https://www.lexogen.com/wp-content/uploads/2018/08/SIRV_Set1_Lot00141_Sequences_170612a-ZIP.zip

Thank you ~
Fadel

a bug of `--alignment_threshold`

ultra/uLTRA

Line 610 in a6b6fb0

pipeline_parser.add_argument('--alignment_threshold', type=int, default=0.5, help='Lower threshold for considering an alignment. \

type = float

Also, maybe a example of align accurate RNA sequence, almost 100% correct, can be provided.

BUG -4294967296

Hi Kristoffer,

During nf-core/isoseq test jobs, I randomly have a bug:

total_flanks2: 20352
total_flank_size 20395551
total_unique_segment_counter 7303161
total_segments_bad 5517888
bad 89736
total parts size: 7391334
total exons size: 20040918
min_intron: 1
Number of ref seqs in gff: 15570
Number of ref seqs in fasta: 1
6622546.0 Unique kmers in reference part sequences with abundance > 1
AAAAAAAAAAAAAAAAAAAA 538
TTTTTTTTTTTTTTTTTTTT 496
CCTCCCAAAGTGCTGGGATT 385
GCCTCCCAAAGTGCTGGGAT 382
CTCCCAAAGTGCTGGGATTA 376
CCAAAGTGCTGGGATTACAG 368
CCCAAAGTGCTGGGATTACA 367
TCCCAAAGTGCTGGGATTAC 366
TAATCCCAGCACTTTGGGAG 360
AATCCCAGCACTTTGGGAGG 359
CAAAGTGCTGGGATTACAGG 357
ATCCCAGCACTTTGGGAGGC 350
GTAATCCCAGCACTTTGGGA 339
CTGTAATCCCAGCACTTTGG 338
TGTAATCCCAGCACTTTGGG 333
CCTGTAATCCCAGCACTTTG 330
TCTCTACTAAAAATACAAAA 317
TTTTGTATTTTTAGTAGAGA 316
ATTCTCCTGCCTCAGCCTCC 289
AAAGTGCTGGGATTACAGGC 282
TTTTTGTATTTTTAGTAGAG 281
[...]
CCAGGAGGTGGAGGTTGCAG 200
GCCTGACCAACATGGTGAAA 200
11038 11038 out of 20352 sequences has been modified in masking step.
Environment set:
Using 2 cores.
Filtering reads aligned to unindexed regions with minimap2
Done filtering. Reads filtered:21
batch nt: 97383 total_nt: 194765
37
36
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Time for slaMEM to find mems:76.54179167747498 seconds.
Starting aligning reads.
Nr reads: 73 nr batches: 2 [37, 36]
BUG -4294967296

Here is the command used:

uLTRA \
    pipeline \
    --t 2 \
    --prefix alz.chunk3 \
    --isoseq --disable_infer \
    Homo_sapiens.GRCh38.dna.chromosome.19.fasta \
    Homo_sapiens.GRCh38.104.chr.13_18_19.gtf \
    alz.chunk3_tama.fa \
    ./

Re-running is enough to get the job done.
I never noticed this when I ran it locally.

Controlling (high) uLTRA RAM usage

Hi,

I'm running uLTRA on a cluster with 48 CPU cores and 196 GB RAM per node.

If I call uTLRA like this to align direct RNA reads to an indexed mammalian genome:

uLTRA align "${genome}" "${reads}" "${output_directory}" --index "${ultraIndex}" --ont --t 48

I notice that many of the uLTRA subprocesses use more RAM than is available and I get RAM errors for slaMEM etc, resulting in failure of the job.

The current workaround is to limit the number of CPUs used so that I don't exceed max node RAM.

Is there any way to control uLTRA maximum memory usage, either per CPU core or overall?

Thanks!

ultra installation and run error

Hi
After 3 installation failed from INSTALLATION script, i finally found conda ultra package to install it correctly.Next, i ran the following command to execute for test data but got the error as following:

reads="reads/alz.polished.hq.fasta.gz"
genome="GRCh38.v33p13.primary_assembly.fa"
gtfexons="gencode.v33p13.primary_assembly.annotation.exon.gtf"
ultra="/scratch/rupesh/Apps/envs/ultra/bin/uLTRA"
output_dir="HIFI/test_results"

echo "Running ultra full pipeline without creating index or alignment.."

$ultra pipeline --isoseq --t 1 $genome $gtfexons $reads $output_dir

ERRORS:

 Traceback (most recent call last):
  File "/scratch/rupesh/Apps/envs/ultra/bin/uLTRA", line 722, in <module>
    align_reads(args)
  File "/scratch/rupesh/Apps/envs/ultra/bin/uLTRA", line 350, in align_reads
    nr_reads_to_ignore, path_reads_to_align = prefilter_genomic_reads.main(ref_part_sequences, args.ref, args.reads, args.outfolder, index_folder, args.nr_cores, args.genomic_frac, args.mm2_ksize)
  File "/scratch/rupesh/Apps/envs/ultra/lib/python3.9/site-packages/modules/prefilter_genomic_reads.py", line 143, in main
    path_reads_to_align = print_read_categories(reads_unindexed, reads_indexed, reads, outfolder, SAM_file)
  File "/scratch/rupesh/Apps/envs/ultra/lib/python3.9/site-packages/modules/prefilter_genomic_reads.py", line 120, in print_read_categories
    for acc, (seq, _) in help_functions.readfq(open(reads,"r")):
  File "/scratch/rupesh/Apps/envs/ultra/lib/python3.9/site-packages/modules/help_functions.py", line 89, in readfq
    for l in fp: # search for the start of the next record
  File "/scratch/rupesh/Apps/envs/ultra/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
**UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte**

Progress log:

Running ultra full pipeline without creating index or alignment..
total_flanks2: 467112
total_flank_size 527488947
total_unique_segment_counter 145303455
total_segments_bad 83027565
bad 1233041
total parts size: 146506923
total exons size: 359149696
min_intron: 1
Number of ref seqs in gff: 323106
Number of ref seqs in fasta: 194
Warning: Detected 147 sequences in reference fasta that are not in annotation:

KI270466.1 with length:1233
KI270706.1 with length:175055
KI270382.1 with length:4215

..
...

ACTGCAGTGGCGCAATCTCG 200
CAACCTCTGCCTCCCTGGTT 200
223114 223114 out of 467112 sequences has been modified in masking step.
Filtering reads aligned to unindexed regions with minimap2

Please help.

Bug with uLTRA align : TypeError: bad argument type for built-in operation

Hello,

I try to align fastq data with uLTRA but it generates this error :

Process Process-18:
Traceback (most recent call last):
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap
self.run()
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/site-packages/modules/align.py", line 484, in align_single
read_seq, warning_log_file, min_acc)
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/site-packages/modules/classify_read_with_mams.py", line 426, in main
choord_to_exon_id, segment_to_gene, gene_to_small_segments)
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/site-packages/modules/classify_read_with_mams.py", line 266, in get_unique_segment_and_flank_choordinates
small_segments = set((small_chr_id, small_start,small_stop) for gene_id in unique_genes for (small_chr_id, small_start,small_stop) in grouper(gene_to_small_segments[gene_id], 3))
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/site-packages/modules/classify_read_with_mams.py", line 266, in
small_segments = set((small_chr_id, small_start,small_stop) for gene_id in unique_genes for (small_chr_id, small_start,small_stop) in grouper(gene_to_small_segments[gene_id], 3))
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.6/site-packages/modules/create_augmented_gene.py", line 69, in
gene_to_small_segments = defaultdict(lambda :array("L")) #defaultdict(lambda :array("b"))
TypeError: bad argument type for built-in operation
Process Process-25:
Traceback (most recent call last):
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap
self.run()
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/site-packages/modules/align.py", line 484, in align_single
read_seq, warning_log_file, min_acc)
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/site-packages/modules/classify_read_with_mams.py", line 426, in main
choord_to_exon_id, segment_to_gene, gene_to_small_segments)
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/site-packages/modules/classify_read_with_mams.py", line 266, in get_unique_segment_and_flank_choordinates
small_segments = set((small_chr_id, small_start,small_stop) for gene_id in unique_genes for (small_chr_id, small_start,small_stop) in grouper(gene_to_small_segments[gene_id], 3))
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.5/site-packages/modules/classify_read_with_mams.py", line 266, in
small_segments = set((small_chr_id, small_start,small_stop) for gene_id in unique_genes for (small_chr_id, small_start,small_stop) in grouper(gene_to_small_segments[gene_id], 3))
File "/home/rialland/anaconda3/envs/ULTRA/lib/python3.6/site-packages/modules/create_augmented_gene.py", line 69, in
gene_to_small_segments = defaultdict(lambda :array("L")) #defaultdict(lambda :array("b"))
TypeError: bad argument type for built-in operation

I'm not very familiar with python so I do not find what's going on... I've ran this command : uLTRA align GRCh38.p13bgz.fa FT_PIL005_1.fastq.gz /scratchbeta/rialland/LONGREAD/ --index /scratchbeta/rialland/LONGREAD/GRCh38p13 --ont --t 24 > u_5_1.sam
I'm working with uLTRA v0.1 and python 3.5

How can I fix it ?

Thanks,

Kind regards,

Laëtitia RIALLAND

unclear/wrong argument description of uLTRA align in help

Thank you for developing this tool, I think you are approaching an important issue for long read transcriptome sequencing.

The options and prameters of uLTRA align are somewhat unclear:
uLTRA align --help the positional arguments are described as follows:

positional arguments:
  ref                   Path to fasta file with a nucleotide sequence (e.g., gene locus) to simulate isoforms from.
  reads                 Path to fasta file with a nucleotide sequence (e.g., gene locus) to simulate isoforms from.
  outfolder             Path to fasta file with a nucleotide sequence (e.g., gene locus) to simulate isoforms from.

E.g. all arguments have the same description, which does not really fit to any of them.
Also, by default, it seems as the files produced by uLTRA index have to be in the "outfolder", so they need to be copied for each processed samples. Is there a way to specify the "index" folder? Further, the final output is "reads.sam". I was looking for a way to provide an sample specific output path and file. Also, uLTRA align seems to produce a lot of temporary files. Is there a reason to keep them? Could you add an option that they get deleted.

CIGAR string starts/ends with N

Hi,
I am trying to run uLTRA to map long RNA reads but have trouble in downstream analysis.

As mentioned in stringtie issues#356, some alignment (hanging intron with no terminal exon) records start/end with an intron (the CIGAR string always starts/ends with [0-9]+N). Though they match the CIGAR Regexp \*|([0-9]+[MIDNSHPX=])+ in SAM Format, they are unusual.

These records are saved in this link with command samtools view sample1.bam | grep -E '60'$'\t''([0-9]+)N|([0-9]+)N'$'\t' > Nbounder.body.

uLTRA + SQANTI3

Hi Kristoffer,

I'm trying to implement uLTRA as a possible aligner for SQANTI3. I think you've already heard about SQANTI, but we've been working the last year and a half to release a much more complete software and it is SQANTI3, in case you want to have a look.

SQANTI3 was thought to evaluate transcriptomes as a GTF, but in some cases people ends up with a FASTA with all the sequences of their final transcripts (after some clustering, correction, polishing, etc) and they also want to evaluate it. SQANTI3 takes those sequences, map them and creates a GTF that can be evaluated easily. We have implemented GMAP, deSALT and minimpap2 (this is the default), but now I would like to include uLTRA too.

What would your recommendation be for uLTRA parameters? Since sequences are, in theory, processed I was running it with --isoseq and --ignore_rc options, am I using them correctly?

Thanks!!
Fran

Out of bound reads

Hi Kristoffer,

I would need your help on a result that one of nf-core/isoseq pipeline user obtained.
He contacted me to present an issue with the pipeline.
He was able to run the complete pipeline with minimap2 and not with uLTRA.
I identified the read causing the failure and compared the bam files from both aligners.

I found the following result:

$> grep -e 'm54224_190921_125133/25952682/ccs' chunk13_aligned*.sam
chunk13_alignedMinmap2.sam:m54224_190921_125133/25952682/ccs	16	chr11_KI270721v1_random	50361	0	530M1D85M81N123M80N113M95N135M96N1325M4S	*	0	0	CTGT...CCCC	*	NM:i:11	ms:i:2025	AS:i:2158	nn:i:0	ts:A:+	tp:A:P	cm:i:713	s1:i:2218	s2:i:2218	de:f:0.0048	rl:i:15
chunk13_alignedULTRA.sam:m54224_190921_125133/25952682/ccs	272	chr11	1995179	0	210=1X39=1X2=1X261=1X14=1D77=1X7=81N4=1X118=80N113=95N135=96N420=1X167=1X112=1X403=1X212=11S	*	0	0	CTGT...CCCC	*	XA:Z:rna46708XC:Z:FSM	NM:i:11
chunk13_alignedULTRA.sam:m54224_190921_125133/25952682/ccs	16	chr1_KI270713v1_random	50361	60	210=1X39=1X2=1X261=1X14=1D77=1X7=81N4=1X118=80N113=95N135=96N420=1X167=1X112=1X403=1X219=1D1=1X2=	*0	0	CTGT...CCCC	*	XA:Z:XM_006724897.1	XC:Z:FSM	NM:i:13

In the minimap2 alignment, the read is mapped once on the chromosome chr11_KI270721v1_random at POS 50361.
With uLTRA, it's mapped on chromosomes chr11 and chr1_KI270713v1_random at POS 1995179 and 50361.

The length of the sequence are

  • chr11_KI270721v1_random: 100316 bases
  • chr11: 135086622 bases
  • chr1_KI270713v1_random: 40745 bases

As you see third mapping is outside the chromosome itself.

Do you have an idea of could cause that?
Is it possible that there is an error in the sequence name resolution? The POS value of the faulty mapping and minimap2 are the same.

error when aligning direct RNA data during revcomp script

Trying to use Ultra v0.3 to align some direct RNA sequencing to a reference genome when it throws the error below:

Filtering reads aligned to unindexed regions with minimap2
Done filtering. Reads filtered:68052
batch nt: 75577658 total_nt: 3627727560
77733
Traceback (most recent call last):
File "/.local/bin/uLTRA", line 645, in
align_reads(args)
File "/.local/bin/uLTRA", line 400, in align_reads
read_batch_temp_file_rc.write('>{0}\n{1}\n'.format(acc, help_functions.reverse_complement(help_functions.remove_read_polyA_ends(seq, args.reduce_read_ployA, 5))))
File "/.local/lib/python3.7/site-packages/modules/help_functions.py", line 79, in reverse_complement
rev_comp = ''.join([rev_nuc[nucl] for nucl in reversed(string)])

File "/.local/lib/python3.7/site-packages/modules/help_functions.py", line 79, in
rev_comp = ''.join([rev_nuc[nucl] for nucl in reversed(string)])
KeyError: 'U'

The error is resolved by converting U to T in fastq sequences. Just thought I'd flag this because hopefully many people will use your aligner for direct RNA data :)

Python bindings?

Hey Kristoffer,

I was wondering if you were considering making python binding of ultra similar to mappy for minimap2?

Thanks,
Chang

Alignment outside of contig

I used your tool (version 0.3, installed with pip) to align isoseq data to the human reference genome. For at least one contig I got reads beyond the last position (I renamed, sorted and binarized the output file reads.sam to CTL_S2_sorted.bam):

$ samtools view -H  CTL_S2_sorted.bam |grep KI270827.1
@SQ     SN:KI270827.1   LN:67707

e.g. Contig KI270827.1 has 67707 bases.
These are the start position of the aligned reads:

$ samtools view CTL_S2_sorted.bam KI270827.1  | cut -f 4
134490
134490
134494
134494
134496
134501
134510
134516
134520
134520
[...]
236450
236450
236450
236450
236450
236450
236453
236464
250325
259444

Can not access local variable 'read_mems_tmp' when using --use_NAM_seeds

Hi,

I am facing the following problem when running uLTRA with GRCh38_p13_v43 and dataset ENCLB632GYO

uLTRA align /mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/data/refs/GRCh38_p13_v43/genome.fa <(zcat /mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/data/reads/ENCLB632GYO_1k.fastq.gz)  /mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/data/aligned/ --index /mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/data/refs/GRCh38_p13_v43/uLTRA.idx --prefix uLTRA__GRCh38_p13_v43__ENCLB632GYO --use_NAM_seeds --min_mem 17 --min_acc 0.6 --disable_mm2 
<multiprocessing.context.SpawnContext object at 0x7fd92ba61e10>
Environment set: <multiprocessing.context.SpawnContext object at 0x7fd92ba61e10>
Using 3 cores.
Processing reads for MEM finding
Using StrobeMap
['StrobeMap', '-n', '2', '-k', '10', '-v', '11', '-w', '35', '-C', '500', '-L', '1000', '-S', '-t', '3', '-s', '-o', '/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/data/aligned/', '/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/data/aligned/refs_sequences.fa', '/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/data/aligned/reads_tmp.fq']
Time for StrobeMap to find NAMs:266.5836818218231 seconds.
Starting aligning reads.
Nr reads: 0 nr batches: 3 [0, 0, 0]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/.snakemake/conda/c61ef846c9b16f24a589aa5281ebe56c_/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/.snakemake/conda/c61ef846c9b16f24a589aa5281ebe56c_/lib/python3.11/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
           ^^^^^^^^^^^^^^^^
  File "/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/.snakemake/conda/c61ef846c9b16f24a589aa5281ebe56c_/lib/python3.11/site-packages/modules/align.py", line 670, in align_single_helper
    return align_single(*arguments)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/.snakemake/conda/c61ef846c9b16f24a589aa5281ebe56c_/lib/python3.11/site-packages/modules/align.py", line 434, in align_single
    for (read_acc, mems), (_, mems_rc) in zip(mem_wrapper.get_mem_records(mems_path,reads), mem_wrapper.get_mem_records(mems_path_rc, reads)):
  File "/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/.snakemake/conda/c61ef846c9b16f24a589aa5281ebe56c_/lib/python3.11/site-packages/modules/mem_wrapper.py", line 151, in get_mem_records
    for chr_id in list(read_mems_tmp.keys()):
                       ^^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'read_mems_tmp' where it is not associated with a value
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/.snakemake/conda/c61ef846c9b16f24a589aa5281ebe56c_/bin/uLTRA", line 717, in <module>
    align_reads(args)
  File "/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/.snakemake/conda/c61ef846c9b16f24a589aa5281ebe56c_/bin/uLTRA", line 473, in align_reads
    classifications, alignment_outfiles = align.align_parallel(read_batches, refs_lengths, args)
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/.snakemake/conda/c61ef846c9b16f24a589aa5281ebe56c_/lib/python3.11/site-packages/modules/align.py", line 688, in align_parallel
    results =res.get(999999999) # Without the timeout this blocking call ignores all signals.
             ^^^^^^^^^^^^^^^^^^
  File "/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/.snakemake/conda/c61ef846c9b16f24a589aa5281ebe56c_/lib/python3.11/multiprocessing/pool.py", line 774, in get
    raise self._value
UnboundLocalError: cannot access local variable 'read_mems_tmp' where it is not associated with a value
(/mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/.snakemake/conda/c61ef846c9b16f24a589aa5281ebe56c_) 

uLTRA runs perfectly when using the defaults parameters (using slamem instead of strobemap )

uLTRA align /mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/data/refs/GRCh38_p13_v43/genome.fa <(zcat /mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/data/reads/ENCLB632GYO_1k.fastq.gz)  /mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/data/aligned/ --index /mnt/storage3/users/ahberam1/DetectSpJunBenchmark/workflow/data/refs/GRCh38_p13_v43/uLTRA.idx --prefix uLTRA__GRCh38_p13_v43__ENCLB632GYO  --disable_mm2 

<multiprocessing.context.SpawnContext object at 0x7fe4942d2090>
Environment set: <multiprocessing.context.SpawnContext object at 0x7fe4942d2090>
Using 3 cores.
batch nt: 291444 total_nt: 874331
330
331
339
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Time for slaMEM to find mems:3035.6001048088074 seconds.
Starting aligning reads.
Nr reads: 1000 nr batches: 3 [330, 331, 339]
READ 331 RECORDS.
READ 331 RECORDS.
READ 339 RECORDS.
READ 339 RECORDS.
Number of instances solved with quadratic collinear chainer solution: 395
Number of instances solved with n*log n collinear chainer solution: 164
Number of instances solved with quadratic collinear chainer solution: 445
Number of instances solved with n*log n collinear chainer solution: 217
READ 330 RECORDS.
READ 330 RECORDS.
Number of instances solved with quadratic collinear chainer solution: 375
Number of instances solved with n*log n collinear chainer solution: 239
Time elapesd multiprocessing: 1526.938672542572
Time to align reads:1526.9394431114197 seconds.
Time to merge SAM-files:0.029903173446655273 seconds.
defaultdict(<class 'int'>, {'FSM': 364, 'NIC_novel': 87, 'unaligned': 304, 'NO_SPLICE': 121, 'Insufficient_junction_coverage_unclassified': 95, 'ISM/NIC_known': 27, 'NNC': 2})
total alignment coverage: 567.6854821244016
Deleting temporary files...

conda enviroment:

name: env_ultra_exp
channels:
  - bioconda
  - conda-forge
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_kmp_llvm
  - bzip2=1.0.8=h7f98852_4
  - ca-certificates=2022.12.7=ha878542_0
  - k8=0.2.5=hd03093a_2
  - ld_impl_linux-64=2.40=h41732ed_0
  - libexpat=2.5.0=hcb278e6_1
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=12.2.0=h65d4601_19
  - libgomp=12.2.0=h65d4601_19
  - libnsl=2.0.0=h7f98852_0
  - libsqlite=3.40.0=h753d276_0
  - libstdcxx-ng=12.2.0=h46fd767_19
  - libuuid=2.38.1=h0b41bf4_0
  - libzlib=1.2.13=h166bdaf_4
  - llvm-openmp=16.0.0=h417c0b6_0
  - minimap2=2.24=h7132678_1
  - ncurses=6.3=h27087fc_1
  - openssl=3.1.0=h0b41bf4_0
  - pip=23.0.1=pyhd8ed1ab_0
  - python=3.11.1=h2755cc3_0_cpython
  - readline=8.2=h8228510_1
  - setuptools=67.6.1=pyhd8ed1ab_0
  - slamem=0.8.5=hec16e2b_1
  - strobemap=0.0.2=hd03093a_1
  - tk=8.6.12=h27826a3_0
  - tzdata=2023c=h71feb2d_0
  - wheel=0.40.0=pyhd8ed1ab_0
  - xz=5.2.6=h166bdaf_0
  - zlib=1.2.13=h166bdaf_4
  - zstd=1.5.2=h3eb15da_6
  - pip:
      - argcomplete==3.0.5
      - argh==0.28.1
      - dill==0.3.6
      - edlib==1.3.9
      - gffutils==0.11.1
      - intervaltree==3.1.0
      - numpy==1.24.2
      - parasail==1.3.4
      - pyfaidx==0.7.2.1
      - pysam==0.20.0
      - simplejson==3.18.4
      - six==1.16.0
      - sortedcontainers==2.4.0
      - ultra-bioinformatics==0.0.4.2

KeyError when running test pipeline

Hi folks. Superkeen to give ultra a go... looks v cool and will solve some problems.

I've encountered an error running the test package, straight after what I think is a successful, but nonstandard code install.

Here's the observed error to start with:

$ uLTRA pipeline /home/mike.harbour/.virtualenvs/ultra/test/SIRV_genes.fasta /home/mike.harbour/.virtualenvs/ultra/test/SIRV_genes_C_170612a.gtf /home/mike.harbour/.virtualenvs/ultra/test/reads.fa ~/Desktop/ultratest/
Traceback (most recent call last):
File "/home/mike.harbour/.virtualenvs/ultra/bin/uLTRA", line 648, in
prep_splicing(args, refs_lengths)
File "/home/mike.harbour/.virtualenvs/ultra/bin/uLTRA", line 82, in prep_splicing
max_intron_chr, exon_choordinates_to_id, chr_to_id, id_to_chr = augmented_gene.create_graph_from_exon_parts(db, args.flank_size, args.small_exon_threshold, args.min_segm, refs_lengths)
File "/home/mike.harbour/.virtualenvs/ultra/lib/python3.6/site-packages/modules/create_augmented_gene.py", line 323, in create_graph_from_exon_parts
exon_gene_ids = exon.attributes["gene_id"] # is a list of strings
File "/home/mike.harbour/.virtualenvs/ultra/lib/python3.6/site-packages/gffutils/attributes.py", line 63, in getitem
v = self._d[k]
KeyError: 'gene_id'

I am a virtualenv user rather than a conda user, so as I mentioned, my install was not quite as explained in your github notes, but atm I dont believe there's a problem. Just in case, I'll run through what i did:

  1. made new virtual environment, with python at 3.6
  2. went into environment, used pip install ultra-bioinformatics. all went well.
  3. got the slaMEM and minimap2 dependencies installed and demonstrated.
  4. put the uLTRA executable in the PATH and demonstrated it working
  5. downloaded the three test files into a testfolder (see the call above)
  6. issued call above and received error above.

More than very grateful for any help you can give. Hope I havent goofed.
Best
Mike

If malformatted fastq, uLTRA silently aligns only a subset of reads

Originally reported as arising from issue #2. If the length of the quality value fields differs from the length of the sequence, uLTRA will align only a subset of the reads without reporting an error of a malformatted fastq-file.

It is, unfortunately, non-trivial to fix this (or even to output a warning or halt the program) with the current fq-parser, which runs much faster than other parsers. We could consider reverting to a slower parser that implements such checks.

Mapping with uLTRA without GTF?

Hi
I am trying to run uLTRA without an annotation GTF (since i dont have any annotation), but it seems uLTRA does needed and gives me error. It does not work with or without "--disable_mm2" option.

Command i used:

conda activate ultra
uLTRA align --prefix MySample --isoseq genome.fa reads.fastq output_dir/
conda deactivate

ERROR:

FileNotFoundError: [Errno 2] No such file or directory: output_dir/ref_part_sequences.pickle'

I know that its uLTRA try to build an indices within the output dir but this case just just complaining...Any idea why it is so ?

Please help
Thanks

-best
Rupesh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.