Giter VIP home page Giter VIP logo

vechat's People

Contributors

haplokit avatar jelber2 avatar tbrekalo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

vechat's Issues

VeChat and GPU

Dear developers,

Just an ease of question:

Does Vechat be executed on GPU nodes?

Thank you for your time,
Virginia

Wrong splitting for compressed files

When a file with reads is compressed with gzip, VeChat splits it into chunks ("--split") not taking into account that it is compressed. Thus, chunks become malformatted.

errors with ONT

vechat no longer appears available through conda.

I managed to install from source but got the following errors with ONT reads:
vechat ONT.fastq.gz -t 8 --platform ont -o ONT.vechat.corrected.fa

[ 14%] Built target edlib
[ 42%] Built target spoa
[ 85%] Built target vechat_racon
[100%] Built target vechat_racon_exe
Performing the 1 iteration for error correction...
thread 'main' panicked at 'File is probably empty: FileTooShort', src/file.rs:30:14
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
perform variation graph based (haplotype-aware) error correction
[racon::Polisher::initialize] loaded target sequences 76.695651 s
[racon::Polisher::initialize] loaded sequences 70.578166 s
[racon::Polisher::initialize] error: empty overlap set!
Performing the 2 iteration for error correction...
thread 'main' panicked at 'File is probably empty: FileTooShort', src/file.rs:30:14
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
perform linear sequence based error correction
[racon::Polisher::initialize] error: empty target sequences set!

reducing length reads

Hey, I have used your software and everything works perfectly. However, something strange happens with the result and I have not been able to understand what is going on. The reads I have are approx 6kb but after running vechat they all reduce to 5.5kb. I have tried modifying the parameters but it doesn't work. Any idea what could be happening? Thanks in advance.

Two ideas on optimization

For large eukaryotic genomes, the file overlap.paf may be very large. I think, VeChat can be optimized in two ways to deal with this:

  1. Instead of making overlap.paf, it can make overlap.paf.gz . This can be achieved by compressing the output of fpa with " | gzip -1 >". Racon can take gzipped files with overlaps as input.
  2. It's probably worth to add a parameter that sets the minimum overlap length. If reads' N50 is, for example, 20 kbp, the minimum overlap can be safely raised from the default 500 bp to, for example, 5000 bp. It will not only decrease the size of the paf file, but also probably accelerate the error correction by avoiding consideration of short overlaps.

Cannot install

First I tried to install using conda as 'conda install -c bioconda vechat'. conda complained vechat not found.

Then I tried to install from source. With this: 'python ./scripts/vechat -h', I got:

gmake: Makefile: No such file or directory
gmake: *** No rule to make target `Makefile'. Stop.

Any help is greatly appreciated.

Error: Empty overlap set / empty target sequence set

Error message:

$ vechat <inut01>.fq -t 48 --platform ont -o <input01>.vechatCorrected.fq
Performing the 1 iteration for error correction...
Killed
perform variation graph based (haplotype-aware) error correction
[racon::Polisher::initialize] loaded target sequences 0.293457 s
[racon::Polisher::initialize] loaded sequences 0.275359 s
[racon::Polisher::initialize] error: empty overlap set!
Performing the 2 iteration for error correction...
perform linear sequence based error correction
[racon::Polisher::initialize] error: empty target sequences set!

Context:
I tried to error-correct a set of ONT reads using vechat v1.1.1 installed via bioconda and ended up with the error message above. I tried using vechat with two different sets of ONT input data, let's call them input01 and input19 here. In both cases I ended up with the error message above.

input protocol mean read coverage file size
input01 amplicon seq 10059 95 MB
input19 WGS 4365 276 MB

Hence, data is present and plenty.

Question:
Why do I end up with the error message given the vechat command above? Should I adjust some parameters?

Additional:
I don't expect the files to be corrupted. Tools like canu and mosdepth work fine on the same files. Also, I already retried using vechat with different thread numbers (-t 1, 16, 48).

Related issues:

  • Possibly related to #15 .
  • Possibly related to #3 . My worker instance got 16 GB main memory assigned. I presume that should be enough given the input file sizes?

While working with ONT data

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byt

after running the command "<longread_file>.fastq.gz --split -t 30 --platform ont -o /path/.fasta"
the error shows UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byt

vechat_hpc.sh with ont long reads

Dear developers,

I am figuring out how to set the options with vechat_hpc.sh script to correct heavy ONT long reads fastq files (~30 Gb).
Please, could you provide some additional information to use this script in HPC with ont reads, is it possible?

Thank you so much for your time!

Virginia

Error- sh: fpa: command not found

Hello. I am beginner for "vechat".
I have installed vechat to reduce errors for my CLR reads from Pacbio.

Firstly, I installed all following dependencies.
Then, I used installation git clone https://github.com/HaploKit/vechat.git since installation using conda didn't work.
And I got message "make[1]: warning: Clock skew detected. Your build may be incomplete."

After that I run the script below.
I got error "sh: fpa: command not found" and the error run finished around 4 hours later.

My script is here
#!/bin/bash
#SBATCH --job-name=vechat
#SBATCH --partition=largemem
#SBATCH --mem=300G
#SBATCH --cpus-per-task=30
#SBATCH --mail-user=xxxx
#SBATCH --mail-type=BEGIN,FAIL,END
#SBATCH -e err_vechat.%A_%a

vechat ./Sgig_genome.fa -t 8 --platform pb --split -o reads.corrected_sgigorg.fa

The error message I have obtained from vechat output is
sh: fpa: command not found
[racon::Polisher::initialize] loaded target sequences 46.690481 s
[racon::Polisher::initialize] error: empty sequences set!
These same messages continue until the end of the file.

My out file (reads.corrected_sgigorg.fa) was empty.

Is this because of my installation problem for the vechat or something wrong with my environment?

PS: Sorry, I think that it's working now. I installed by cargo for fpa again and I changed the path to ~/bin for vechat command.
However, I worry about make[1]: warning that I got during my installation.

"invalid syntax" error returned by running vechat

Hi,

I tried to run vechat as instructed on github page(all depended environment is ready)

python ./scripts/vechat -h 

and got a an error:

File "vechat", line 215
    subprocess.check_call(['cmake', *cmake_cfg_args], stdout=sys.stdout, stderr=sys.stderr)
                                    ^
SyntaxError: invalid syntax

could you figure what is going on that please?

Thanks

error

vechat gm.ont.fastq.gz --platform ont --split -t 24 -o gm.ont.cor.fa

Performing the 1 iteration for error correction...
processing chunk 1...

#after a while......

Traceback (most recent call last):
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 303, in
run_error_correction(
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 48, in run_error_correction
sub_query_sequences = extract_sub_sequences(sequences,overlap,chunk_target_sequence)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 76, in extract_sub_sequences
mode=fq_or_fa(chunk_target_sequence)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yfy/miniconda3/envs/assembly/bin/vechat", line 152, in fq_or_fa
s = fr.readline()[0]
^^^^^^^^^^^^^
File "", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

VeChat doesn't work with old versions of "split"

Old versions of "split" don't have the option "--additional-suffix", which makes VeChat terminate with an error. I don't know in which version this option was introduced, but it was definitely absent in 8.4, but present in 8.30. Maybe, it's worth to indicate the minimum version of "split" in the "Installation and dependencies" paragraph, or to change the source code of VeChat to make it work with older versions of "split".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.