mcfrith / last-rna Goto Github PK

View Code? Open in Web Editor NEW

47.0 47.0 6.0 44 KB

License: MIT License

Python 69.40% Shell 30.60%

last-rna's People

Contributors

Stargazers

Watchers

Forkers

yesx1991 al2na inambioinfo bensonlew standardgalactic

last-rna's Issues

About the runtime

Hello, I'm using last-train to determine the rates of insertion, deletion, and substitutions between my reads and the genome:

last-train -P12 -Q0 mydb 17HanZZ0034.fastq > myseq.par

Unfortunately, the program has been running for 70 hours and is still running now. Although I chose to prepare the genome without Repeat - Masking, the current run time seems too long. I want to know if this running time is normal or abnormal. And if it's abnormal, what can I do to speed up. By the way, the size of my fastq file(17HanZZ0034.fastq) is 72G. Thanks a lot!

Align subreads file

Hi,

I am just wondering can last work for subreads file from PacBio?

Thanks,
Wei

Using minion reads with LAST

Hello all,

I am interested in comparing two canu assemblies using LAST. The reference assembly has 25 contigs and the query assembly has 21 contigs. The assemblies are for GC rich bacteria (same species but different strains, so I expect them to be somewhat similar).

I followed the instructions provided here: https://github.com/mcfrith/last-rna/blob/master/last-long-reads.md

However, the output I get seems to just be the comments:

# make an index using the WT strain
~/bin/lastdb -P40 -uNEAR -R01 WT_db reference.contigs.fasta

# align mutant against the WT strain
~/bin/lastal -P40 WT_db query.contigs.fasta
# LAST version 885
#
# a=21 b=9 A=21 B=9 e=60 d=0 x=59 y=44 z=59 D=1e+06 E=5e+11
# R=01 u=0 s=2 S=0 M=0 T=0 m=10 l=1 n=10 k=1 w=1000 t=4.36661 j=3 Q=0
# WT_db
# Reference sequences=25 normal letters=0
# lambda=0.228526 K=0.433378
#
#    A  C  G  T
# A  6 -18 -18 -18
# C -18  6 -18 -18
# G -18 -18  6 -18
# T -18 -18 -18  6
#
# Coordinates are 0-based.  For - strand matches, coordinates
# in the reverse complement of the 2nd sequence are used.
#
# name start alnSize strand seqSize alignment
#
# batch 0
# Query sequences=21

I also tried submitting these two assemblies to the web service but received the message "no alignments were found":

I am new to LAST. Could it be that my sequences are too similar? Is this an appropriate problem for LAST?

Thanks for any advice!
~Lina

last-train: error: no alignments

Hello,
I'm comparing environmental microbial samples to the ARG database,but one of samples wrong.
When I use the following command :
"last-train -P8 -Q0 mydb CH05.fa > reads.train"
It reported an error:
"lastal: can't calculate E-values." "To proceed without E-values, set a score threshold with option -e." "last-train: error: no alignments"
I don't know how to change the value of 'option -e'
When I change command:
last-train -P48 --revsym --matsym --gapsym -E0.05 -C2 mydb CH05.fa > reads.train
It reported :
"last-train: error: no alignments"
Can you give me some suggestion? Thank you!

retaining sam headers maf to sam

Hi, this may be a samtools question, apologies if that is the case. I'm trying to samtools view the sam file to bam, but it complains there is no header.

head_100_myseq.sam.txt
head_100_myseq.maf.txt

This is what I did from the fastq file:
$ awk 'NR % 4 == 2 {print ">" ++n "\n" $0}' myseq.fastq > myseq.fa
$ last-train -P3 /Reference_Sequence/chromFa/chrX/mydb myseq.fa > myseq.par
$ lastal -P8 -p myseq.par /Volumes/GRT01_8TB/FMR1_runs/Reference_Sequence/chromFa/chrX/mydb myseq.fa
$ last-split -m1e-1 >myseq.maf
$ maf-convert sam myseq.maf >myseq.sam
$ samtools view -bS myseq.sam >unsorted_myseq.bam
[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 2
[main_samview] truncated file.

last-train error

Hi, macfrith
@mcfrith
last is really nice split aligner. Now I want to align some contigs(N50=60kb) to genetically close references(4.8Gb). last-rna maybe a good choice. But when I follow recipies ,last-train always show last-train: error: [Errno 2] No such file or directory: '2'.
(repeat masked mydb and simple read identifiers was created successfully with last version 979)

Am I making some error? Can last-rna handle such task? or how about lastal ?

Thanks
lipeng

Questions on Last-train in case of multiple samples

Hello!

I am analyzing multiple nanopore human WGS data.

I noticed that the training results are slightly different between samples.

I also tried merging FASTA from multiple samples and running last-train with the merged FASTA.

This also gave a slightly different result compared to the results from individual samples.

Q1 : Is it better to get the training results from merged FASTA than using the training results for individual samples for the matched samples?

Q2 : If the training result from merged FASTA is the better option, is there any saturation point where increasing the number of samples does not affect the training results significantly?

With regards
Jinyoung

error in running lastdb

Hello,

I try to run lastdb on human reference genome hg38. When I run the command 'lastdb -P8 -uRY4 mydb genome.fa', I get the following error:
lastdb: can't open file: RY4
Could you please explain what the problem is?
Bests,
Maryam

What is the difference of effect between repeat-masking and without repeat-masking on calling tandem repeats?

Hi, I'm using tandem-genotypes to find tandem repeats(TR) from our human PacBio HiFi reads. I found a real TR which is not successfully called by tandem-genotypes. I used a repeat-masking genome to build index using lastdb. The parameters of lastdb, last-train and lastal is same as this recipe suggested.

I wonder if it is the repeat-masking genome that harms alignment so that many reads get a high mismap score. Could you give any suggestions on our problem? Besides, what is the difference of effect between repeat-masking and without repeat-masking on calling tandem repeats?

Thanks.

sam/bam files describing spliced alignments by N in cigar

Thank you for sharing the recipe https://github.com/mcfrith/last-rna/blob/master/last-long-reads.md.

I've aligned long RNA reads to genome, and used maf-convert to obtain sam/bam files.
The sam/bam files contains each spliced region in a read as different entries, but I want a bam file where spliced alignments are described by 'N' in cigar (one entry for one spliced alignment).
Is there any option (or future plan) to make maf-convert generate such sam/bam files?

Prepare a genome with or without repeat-masking

Hi, I was trying to run tandem_genotypes to detect tandem repeats on my ONT data. But I have some questions when preparing a genome. I see there are two options in this step —— prepare a genome with or without repeat-masking. If I care more about effect and accuracy than running time, should I prepare a genome without repeat-masking? Or which option do you recommend? Thanks a lot.

How to visualize output from the long reads tutorial

Hi all,

I have Minion data for two closely related microbial strains. I assembled the two strains into two assemblies of four and five contigs.

I then followed your notes regarding using LAST to align long reads (https://github.com/mcfrith/last-rna/blob/master/last-long-reads.md)

Now I have a .maf file, but I can't seem to visualize it with either Geneious or IGV.

Do you have suggestions for how I can visualize the MAF file? I am used to dealing with VCF format, but I have not been able to convert the MAF file that LAST produces.

Thanks for any advice!
~Lina

'std::bad_alloc'

Hi, I was trying to run tandem_genotypes on HG01891 Pacbio long reads of chromosome 1. Based on the website, first I aligned long reads to the reference genome using LAST. I ran lastdb -P8 -uNEAR mydb genome.fa on my reference genome and then I ran last-train -P8 -Q0 mydb myseq.fq > myseq.par to generate .par file. These two steps finished successfully.

However when I run lastal -P8 -p myseq.par mydb myseq.fq | last-split > myseq.maf, I'll get
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc terminate called recursively terminate called recursively terminate called recursively

This error is not really informative and I have no idea what is causing that. I would appreciate your help.