haplokit / vechat Goto Github PK
View Code? Open in Web Editor NEWCorrecting errors in noisy long reads using variation graphs
License: GNU General Public License v3.0
Correcting errors in noisy long reads using variation graphs
License: GNU General Public License v3.0
Dear developers,
Just an ease of question:
Does Vechat be executed on GPU nodes?
Thank you for your time,
Virginia
When a file with reads is compressed with gzip, VeChat splits it into chunks ("--split") not taking into account that it is compressed. Thus, chunks become malformatted.
vechat no longer appears available through conda.
I managed to install from source but got the following errors with ONT reads:
vechat ONT.fastq.gz -t 8 --platform ont -o ONT.vechat.corrected.fa
[ 14%] Built target edlib
[ 42%] Built target spoa
[ 85%] Built target vechat_racon
[100%] Built target vechat_racon_exe
Performing the 1 iteration for error correction...
thread 'main' panicked at 'File is probably empty: FileTooShort', src/file.rs:30:14
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace
perform variation graph based (haplotype-aware) error correction
[racon::Polisher::initialize] loaded target sequences 76.695651 s
[racon::Polisher::initialize] loaded sequences 70.578166 s
[racon::Polisher::initialize] error: empty overlap set!
Performing the 2 iteration for error correction...
thread 'main' panicked at 'File is probably empty: FileTooShort', src/file.rs:30:14
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace
perform linear sequence based error correction
[racon::Polisher::initialize] error: empty target sequences set!
Hey, I have used your software and everything works perfectly. However, something strange happens with the result and I have not been able to understand what is going on. The reads I have are approx 6kb but after running vechat they all reduce to 5.5kb. I have tried modifying the parameters but it doesn't work. Any idea what could be happening? Thanks in advance.
For large eukaryotic genomes, the file overlap.paf may be very large. I think, VeChat can be optimized in two ways to deal with this:
First I tried to install using conda as 'conda install -c bioconda vechat'. conda complained vechat not found.
Then I tried to install from source. With this: 'python ./scripts/vechat -h', I got:
gmake: Makefile: No such file or directory
gmake: *** No rule to make target `Makefile'. Stop.
Any help is greatly appreciated.
Is this repo being maintained? vechat is no longer available on conda
Error message:
$ vechat <inut01>.fq -t 48 --platform ont -o <input01>.vechatCorrected.fq
Performing the 1 iteration for error correction...
Killed
perform variation graph based (haplotype-aware) error correction
[racon::Polisher::initialize] loaded target sequences 0.293457 s
[racon::Polisher::initialize] loaded sequences 0.275359 s
[racon::Polisher::initialize] error: empty overlap set!
Performing the 2 iteration for error correction...
perform linear sequence based error correction
[racon::Polisher::initialize] error: empty target sequences set!
Context:
I tried to error-correct a set of ONT reads using vechat v1.1.1
installed via bioconda and ended up with the error message above. I tried using vechat with two different sets of ONT input data, let's call them input01 and input19 here. In both cases I ended up with the error message above.
input | protocol | mean read coverage | file size |
---|---|---|---|
input01 | amplicon seq | 10059 | 95 MB |
input19 | WGS | 4365 | 276 MB |
Hence, data is present and plenty.
Question:
Why do I end up with the error message given the vechat command above? Should I adjust some parameters?
Additional:
I don't expect the files to be corrupted. Tools like canu and mosdepth work fine on the same files. Also, I already retried using vechat with different thread numbers (-t
1, 16, 48).
Related issues:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byt
after running the command "<longread_file>.fastq.gz --split -t 30 --platform ont -o /path/.fasta"
the error shows UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byt
Dear developers,
I am figuring out how to set the options with vechat_hpc.sh script to correct heavy ONT long reads fastq files (~30 Gb).
Please, could you provide some additional information to use this script in HPC with ont reads, is it possible?
Thank you so much for your time!
Virginia
Hello. I am beginner for "vechat".
I have installed vechat to reduce errors for my CLR reads from Pacbio.
Firstly, I installed all following dependencies.
Then, I used installation git clone https://github.com/HaploKit/vechat.git since installation using conda didn't work.
And I got message "make[1]: warning: Clock skew detected. Your build may be incomplete."
After that I run the script below.
I got error "sh: fpa: command not found" and the error run finished around 4 hours later.
My script is here
#!/bin/bash
#SBATCH --job-name=vechat
#SBATCH --partition=largemem
#SBATCH --mem=300G
#SBATCH --cpus-per-task=30
#SBATCH --mail-user=xxxx
#SBATCH --mail-type=BEGIN,FAIL,END
#SBATCH -e err_vechat.%A_%a
vechat ./Sgig_genome.fa -t 8 --platform pb --split -o reads.corrected_sgigorg.fa
The error message I have obtained from vechat output is
sh: fpa: command not found
[racon::Polisher::initialize] loaded target sequences 46.690481 s
[racon::Polisher::initialize] error: empty sequences set!
These same messages continue until the end of the file.
My out file (reads.corrected_sgigorg.fa) was empty.
Is this because of my installation problem for the vechat or something wrong with my environment?
PS: Sorry, I think that it's working now. I installed by cargo for fpa again and I changed the path to ~/bin for vechat command.
However, I worry about make[1]: warning that I got during my installation.
Hi,
I tried to run vechat as instructed on github page(all depended environment is ready)
python ./scripts/vechat -h
and got a an error:
File "vechat", line 215
subprocess.check_call(['cmake', *cmake_cfg_args], stdout=sys.stdout, stderr=sys.stderr)
^
SyntaxError: invalid syntax
could you figure what is going on that please?
Thanks
vechat gm.ont.fastq.gz --platform ont --split -t 24 -o gm.ont.cor.fa
Performing the 1 iteration for error correction...
processing chunk 1...
#after a while......
Traceback (most recent call last):
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 303, in
run_error_correction(
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 48, in run_error_correction
sub_query_sequences = extract_sub_sequences(sequences,overlap,chunk_target_sequence)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 76, in extract_sub_sequences
mode=fq_or_fa(chunk_target_sequence)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 152, in fq_or_fa
s = fr.readline()[0]
^^^^^^^^^^^^^
File "", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
can you release docker container or fix the install through Conda? I am having trouble getting all the required dependencies.
Old versions of "split" don't have the option "--additional-suffix", which makes VeChat terminate with an error. I don't know in which version this option was introduced, but it was definitely absent in 8.4, but present in 8.30. Maybe, it's worth to indicate the minimum version of "split" in the "Installation and dependencies" paragraph, or to change the source code of VeChat to make it work with older versions of "split".
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.