haplokit / vechat Goto Github PK

View Code? Open in Web Editor NEW

49.0 49.0 5.0 66.51 MB

Correcting errors in noisy long reads using variation graphs

License: GNU General Public License v3.0

CMake 3.56% Meson 0.37% Python 20.16% C++ 67.75% Shell 8.18%

vechat's People

Contributors

Stargazers

Watchers

Forkers

vikash84 jelber2 schaudge ningshuang-yao mkyriak

vechat's Issues

VeChat and GPU

Dear developers,

Just an ease of question:

Does Vechat be executed on GPU nodes?

Thank you for your time,
Virginia

Wrong splitting for compressed files

When a file with reads is compressed with gzip, VeChat splits it into chunks ("--split") not taking into account that it is compressed. Thus, chunks become malformatted.

errors with ONT

vechat no longer appears available through conda.

I managed to install from source but got the following errors with ONT reads:
vechat ONT.fastq.gz -t 8 --platform ont -o ONT.vechat.corrected.fa

[ 14%] Built target edlib
[ 42%] Built target spoa
[ 85%] Built target vechat_racon
[100%] Built target vechat_racon_exe
Performing the 1 iteration for error correction...
thread 'main' panicked at 'File is probably empty: FileTooShort', src/file.rs:30:14
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
perform variation graph based (haplotype-aware) error correction
[racon::Polisher::initialize] loaded target sequences 76.695651 s
[racon::Polisher::initialize] loaded sequences 70.578166 s
[racon::Polisher::initialize] error: empty overlap set!
Performing the 2 iteration for error correction...
thread 'main' panicked at 'File is probably empty: FileTooShort', src/file.rs:30:14
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
perform linear sequence based error correction
[racon::Polisher::initialize] error: empty target sequences set!

reducing length reads

Hey, I have used your software and everything works perfectly. However, something strange happens with the result and I have not been able to understand what is going on. The reads I have are approx 6kb but after running vechat they all reduce to 5.5kb. I have tried modifying the parameters but it doesn't work. Any idea what could be happening? Thanks in advance.

Two ideas on optimization

For large eukaryotic genomes, the file overlap.paf may be very large. I think, VeChat can be optimized in two ways to deal with this:

Instead of making overlap.paf, it can make overlap.paf.gz . This can be achieved by compressing the output of fpa with " | gzip -1 >". Racon can take gzipped files with overlaps as input.
It's probably worth to add a parameter that sets the minimum overlap length. If reads' N50 is, for example, 20 kbp, the minimum overlap can be safely raised from the default 500 bp to, for example, 5000 bp. It will not only decrease the size of the paf file, but also probably accelerate the error correction by avoiding consideration of short overlaps.

Cannot install

First I tried to install using conda as 'conda install -c bioconda vechat'. conda complained vechat not found.

Then I tried to install from source. With this: 'python ./scripts/vechat -h', I got:

gmake: Makefile: No such file or directory
gmake: *** No rule to make target `Makefile'. Stop.

Any help is greatly appreciated.

no longer available on conda

Is this repo being maintained? vechat is no longer available on conda

Error: Empty overlap set / empty target sequence set

Error message:

$ vechat <inut01>.fq -t 48 --platform ont -o <input01>.vechatCorrected.fq
Performing the 1 iteration for error correction...
Killed
perform variation graph based (haplotype-aware) error correction
[racon::Polisher::initialize] loaded target sequences 0.293457 s
[racon::Polisher::initialize] loaded sequences 0.275359 s
[racon::Polisher::initialize] error: empty overlap set!
Performing the 2 iteration for error correction...
perform linear sequence based error correction
[racon::Polisher::initialize] error: empty target sequences set!

Context:
I tried to error-correct a set of ONT reads using vechat v1.1.1 installed via bioconda and ended up with the error message above. I tried using vechat with two different sets of ONT input data, let's call them input01 and input19 here. In both cases I ended up with the error message above.

input	protocol	mean read coverage	file size
input01	amplicon seq	10059	95 MB
input19	WGS	4365	276 MB

Hence, data is present and plenty.

Question:
Why do I end up with the error message given the vechat command above? Should I adjust some parameters?

Additional:
I don't expect the files to be corrupted. Tools like canu and mosdepth work fine on the same files. Also, I already retried using vechat with different thread numbers (-t 1, 16, 48).

Related issues:

Possibly related to #15 .
Possibly related to #3 . My worker instance got 16 GB main memory assigned. I presume that should be enough given the input file sizes?

While working with ONT data

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byt

after running the command "<longread_file>.fastq.gz --split -t 30 --platform ont -o /path/.fasta"
the error shows UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byt

vechat_hpc.sh with ont long reads

Dear developers,

I am figuring out how to set the options with vechat_hpc.sh script to correct heavy ONT long reads fastq files (~30 Gb).
Please, could you provide some additional information to use this script in HPC with ont reads, is it possible?

Thank you so much for your time!

Virginia

Error- sh: fpa: command not found

Hello. I am beginner for "vechat".
I have installed vechat to reduce errors for my CLR reads from Pacbio.

Firstly, I installed all following dependencies.
Then, I used installation git clone https://github.com/HaploKit/vechat.git since installation using conda didn't work.
And I got message "make[1]: warning: Clock skew detected. Your build may be incomplete."

After that I run the script below.
I got error "sh: fpa: command not found" and the error run finished around 4 hours later.

My script is here
#!/bin/bash
#SBATCH --job-name=vechat
#SBATCH --partition=largemem
#SBATCH --mem=300G
#SBATCH --cpus-per-task=30
#SBATCH --mail-user=xxxx
#SBATCH --mail-type=BEGIN,FAIL,END
#SBATCH -e err_vechat.%A_%a

vechat ./Sgig_genome.fa -t 8 --platform pb --split -o reads.corrected_sgigorg.fa

The error message I have obtained from vechat output is
sh: fpa: command not found
[racon::Polisher::initialize] loaded target sequences 46.690481 s
[racon::Polisher::initialize] error: empty sequences set!
These same messages continue until the end of the file.

My out file (reads.corrected_sgigorg.fa) was empty.

Is this because of my installation problem for the vechat or something wrong with my environment?

PS: Sorry, I think that it's working now. I installed by cargo for fpa again and I changed the path to ~/bin for vechat command.
However, I worry about make[1]: warning that I got during my installation.

"invalid syntax" error returned by running vechat

Hi,

I tried to run vechat as instructed on github page(all depended environment is ready)

python ./scripts/vechat -h

and got a an error:

File "vechat", line 215
    subprocess.check_call(['cmake', *cmake_cfg_args], stdout=sys.stdout, stderr=sys.stderr)
                                    ^
SyntaxError: invalid syntax

could you figure what is going on that please?

Thanks

error

vechat gm.ont.fastq.gz --platform ont --split -t 24 -o gm.ont.cor.fa

Performing the 1 iteration for error correction...
processing chunk 1...

#after a while......

Traceback (most recent call last):
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 303, in
run_error_correction(
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 48, in run_error_correction
sub_query_sequences = extract_sub_sequences(sequences,overlap,chunk_target_sequence)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 76, in extract_sub_sequences
mode=fq_or_fa(chunk_target_sequence)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 152, in fq_or_fa
s = fr.readline()[0]
^^^^^^^^^^^^^
File "", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Request: Docker Container or Conda (TODO)

can you release docker container or fix the install through Conda? I am having trouble getting all the required dependencies.

VeChat doesn't work with old versions of "split"

Old versions of "split" don't have the option "--additional-suffix", which makes VeChat terminate with an error. I don't know in which version this option was introduced, but it was definitely absent in 8.4, but present in 8.30. Maybe, it's worth to indicate the minimum version of "split" in the "Installation and dependencies" paragraph, or to change the source code of VeChat to make it work with older versions of "split".