mcfrith / last-rna Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hello, I'm using last-train to determine the rates of insertion, deletion, and substitutions between my reads and the genome:
last-train -P12 -Q0 mydb 17HanZZ0034.fastq > myseq.par
Unfortunately, the program has been running for 70 hours and is still running now. Although I chose to prepare the genome without Repeat - Masking, the current run time seems too long. I want to know if this running time is normal or abnormal. And if it's abnormal, what can I do to speed up. By the way, the size of my fastq file(17HanZZ0034.fastq) is 72G. Thanks a lot!
Hi,
I am just wondering can last work for subreads file from PacBio?
Thanks,
Wei
Hello all,
I am interested in comparing two canu assemblies using LAST. The reference assembly has 25 contigs and the query assembly has 21 contigs. The assemblies are for GC rich bacteria (same species but different strains, so I expect them to be somewhat similar).
I followed the instructions provided here: https://github.com/mcfrith/last-rna/blob/master/last-long-reads.md
However, the output I get seems to just be the comments:
# make an index using the WT strain
~/bin/lastdb -P40 -uNEAR -R01 WT_db reference.contigs.fasta
# align mutant against the WT strain
~/bin/lastal -P40 WT_db query.contigs.fasta
# LAST version 885
#
# a=21 b=9 A=21 B=9 e=60 d=0 x=59 y=44 z=59 D=1e+06 E=5e+11
# R=01 u=0 s=2 S=0 M=0 T=0 m=10 l=1 n=10 k=1 w=1000 t=4.36661 j=3 Q=0
# WT_db
# Reference sequences=25 normal letters=0
# lambda=0.228526 K=0.433378
#
# A C G T
# A 6 -18 -18 -18
# C -18 6 -18 -18
# G -18 -18 6 -18
# T -18 -18 -18 6
#
# Coordinates are 0-based. For - strand matches, coordinates
# in the reverse complement of the 2nd sequence are used.
#
# name start alnSize strand seqSize alignment
#
# batch 0
# Query sequences=21
I also tried submitting these two assemblies to the web service but received the message "no alignments were found":
I am new to LAST. Could it be that my sequences are too similar? Is this an appropriate problem for LAST?
Thanks for any advice!
~Lina
Hello,
I'm comparing environmental microbial samples to the ARG database,but one of samples wrong.
When I use the following command :
"last-train -P8 -Q0 mydb CH05.fa > reads.train"
It reported an error:
"lastal: can't calculate E-values." "To proceed without E-values, set a score threshold with option -e." "last-train: error: no alignments"
I don't know how to change the value of 'option -e'
When I change command:
last-train -P48 --revsym --matsym --gapsym -E0.05 -C2 mydb CH05.fa > reads.train
It reported :
"last-train: error: no alignments"
Can you give me some suggestion? Thank you!
Hi, this may be a samtools question, apologies if that is the case. I'm trying to samtools view the sam file to bam, but it complains there is no header.
head_100_myseq.sam.txt
head_100_myseq.maf.txt
This is what I did from the fastq file:
$ awk 'NR % 4 == 2 {print ">" ++n "\n" $0}' myseq.fastq > myseq.fa
$ last-train -P3 /Reference_Sequence/chromFa/chrX/mydb myseq.fa > myseq.par
$ lastal -P8 -p myseq.par /Volumes/GRT01_8TB/FMR1_runs/Reference_Sequence/chromFa/chrX/mydb myseq.fa
$ last-split -m1e-1 >myseq.maf
$ maf-convert sam myseq.maf >myseq.sam
$ samtools view -bS myseq.sam >unsorted_myseq.bam
[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 2
[main_samview] truncated file.
Hi, macfrith
@mcfrith
last is really nice split aligner. Now I want to align some contigs(N50=60kb) to genetically close references(4.8Gb). last-rna maybe a good choice. But when I follow recipies ,last-train always show last-train: error: [Errno 2] No such file or directory: '2'
.
(repeat masked mydb and simple read identifiers was created successfully with last version 979)
Am I making some error? Can last-rna handle such task? or how about lastal ?
Thanks
lipeng
Hello!
I am analyzing multiple nanopore human WGS data.
I noticed that the training results are slightly different between samples.
I also tried merging FASTA from multiple samples and running last-train with the merged FASTA.
This also gave a slightly different result compared to the results from individual samples.
Q1 : Is it better to get the training results from merged FASTA than using the training results for individual samples for the matched samples?
Q2 : If the training result from merged FASTA is the better option, is there any saturation point where increasing the number of samples does not affect the training results significantly?
With regards
Jinyoung
Hello,
I try to run lastdb on human reference genome hg38. When I run the command 'lastdb -P8 -uRY4 mydb genome.fa', I get the following error:
lastdb: can't open file: RY4
Could you please explain what the problem is?
Bests,
Maryam
Hi, I'm using tandem-genotypes
to find tandem repeats(TR) from our human PacBio HiFi reads. I found a real TR which is not successfully called by tandem-genotypes
. I used a repeat-masking genome to build index using lastdb
. The parameters of lastdb
, last-train
and lastal
is same as this recipe suggested.
I wonder if it is the repeat-masking genome that harms alignment so that many reads get a high mismap score. Could you give any suggestions on our problem? Besides, what is the difference of effect between repeat-masking and without repeat-masking on calling tandem repeats?
Thanks.
Thank you for sharing the recipe https://github.com/mcfrith/last-rna/blob/master/last-long-reads.md.
I've aligned long RNA reads to genome, and used maf-convert
to obtain sam/bam files.
The sam/bam files contains each spliced region in a read as different entries, but I want a bam file where spliced alignments are described by 'N' in cigar (one entry for one spliced alignment).
Is there any option (or future plan) to make maf-convert
generate such sam/bam files?
Hi, I was trying to run tandem_genotypes to detect tandem repeats on my ONT data. But I have some questions when preparing a genome. I see there are two options in this step โโ prepare a genome with or without repeat-masking. If I care more about effect and accuracy than running time, should I prepare a genome without repeat-masking? Or which option do you recommend? Thanks a lot.
Hi all,
I have Minion data for two closely related microbial strains. I assembled the two strains into two assemblies of four and five contigs.
I then followed your notes regarding using LAST to align long reads (https://github.com/mcfrith/last-rna/blob/master/last-long-reads.md)
Now I have a .maf file, but I can't seem to visualize it with either Geneious or IGV.
Do you have suggestions for how I can visualize the MAF file? I am used to dealing with VCF format, but I have not been able to convert the MAF file that LAST produces.
Thanks for any advice!
~Lina
Hi, I was trying to run tandem_genotypes on HG01891 Pacbio long reads of chromosome 1. Based on the website, first I aligned long reads to the reference genome using LAST. I ran lastdb -P8 -uNEAR mydb genome.fa
on my reference genome and then I ran last-train -P8 -Q0 mydb myseq.fq > myseq.par
to generate .par file. These two steps finished successfully.
However when I run lastal -P8 -p myseq.par mydb myseq.fq | last-split > myseq.maf
, I'll get
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc terminate called recursively terminate called recursively terminate called recursively
This error is not really informative and I have no idea what is causing that. I would appreciate your help.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.