xiaochuanle / mecat Goto Github PK

MECAT: an ultra-fast mapping, error correction and de novo assembly tool for single-molecule sequencing reads

Shell 0.06% Makefile 0.76% C 24.97% C++ 67.86% Perl 6.23% CSS 0.13%

mecat's Introduction

We have released a new version MECAT2. Please go and download that new version. This version will not be updated any more.

Introduction
Installation
Quick Start
Input Format
Program Descriptions
Citation
Contact
Update Information

Introdction

MECAT is an ultra-fast Mapping, Error Correction and de novo Assembly Tools for single molecula sequencing (SMRT) reads. MECAT employs novel alignment and error correction algorithms that are much more efficient than the state of art of aligners and error correction tools. MECAT can be used for effectively de novo assemblying large genomes. For example, on a 32-thread computer with 2.0 GHz CPU , MECAT takes 9.5 days to assemble a human genome based on 54x SMRT data, which is 40 times faster than the current PBcR-Mhap pipeline. We also use MECAT to assemble a diploid human genome based on 102x SMRT data only in 25 days. The latter assembly leads a great improvement of quality to the previous genome assembled from the 54x haploid SMRT data. MECAT performance were compared with PBcR-Mhap pipeline, FALCON and Canu(v1.3) in five real datasets. The quality of assembled contigs produced by MECAT is the same or better than that of the PBcR-Mhap pipeline and FALCON. Here are some comparisons on the 32-thread computer with 2.0 GHz CPU and 512 GB RAM memory:

Genome	Pipeline	Thread number	Total running time (h)	NG50 of genome
E.coli	FALCON	16	1.21	4,635,129
	PBcR-MHAP	16	1.29	4,652,272
	Canu	16	0.71	4,648,002
	MECAT	16	0.24	4,649,626
Yeast	FALCON	16	2.16	587,169
	PBcR-MHAP	16	4.2	818,229
	Canu	16	5.11	739,902
	MECAT	16	0.91	929,350
A.thaliana	FALCON	16	223.84	7,583,032
	PBcR-MHAP	16	188.7	9,610,192
	Canu	16	118.57	8,315,338
	MECAT	16	10.73	12600961
D.melanogaster	FALCON	16	140.72	15,664,372
	PBcR-MHAP	16	101.22	13,627,256
	Canu	16	69.34	14,179,324
	MECAT	16	9.58	18,111,159
Human(54X)	PBcR-MHAH(f)	32	5750	1,857,788
	PBcR-MHAH(s)	32	13000	4,320,471
	MECAT	32	230.54	4,878,957

MECAT consists of four modules:

mecat2pw, a fast and accurate pairwise mapping tool for SMRT reads
mecat2ref, a fast and accurate reference mapping tool for SMRT reads
mecat2cns, correct noisy reads based on their pairwise overlaps
mecat2canu, a modified and more efficient version of the Canu pipeline. Canu is a customized version of the Celera Assembler that designed for high-noise single-molecule sequencing

MEAP is written in C, C++, and perl. It is open source and distributed under the GPLv3 license.

Installation

The current directory is /public/users/chenying/smrt_asm.

Install MECAT:

git clone https://github.com/xiaochuanle/MECAT.git
cd MECAT
make 
cd ..

After installation, all the executables are found in MECAT/Linux-amd64/bin. The folder name Linux-amd64 will vary in operating systems. For example, in MAC, the executables are put in MECAT/Darwin-amd64/bin.

Install HDF5:

wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8/hdf5-1.8.15-patch1/src/hdf5-1.8.15-patch1.tar.gz
tar xzvf hdf5-1.8.15-patch1.tar.gz
mkdir hdf5
cd hdf5-1.8.15-patch1
./configure --enable-cxx --prefix=/public/users/chenying/smrt_asm/hdf5
make
make install
cd ..

The header files of HDF5 are in hdf5/include. The library files of HDF5 are in hdf5/lib (in some systems, they are put in hdf5/lib64, check it!).

Install dextract

git clone https://github.com/PacificBiosciences/DEXTRACTOR.git
cp MECAT/dextract_makefile DEXTRACTOR
cd DEXTRACTOR
export HDF5_INCLUDE=/public/users/chenying/smrt_asm/hdf5/include
export HDF5_LIB=/public/users/chenying/smrt_asm/hdf5/lib
make -f dextract_makefile
cd ..

Add relative pathes

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/public/users/chenying/smrt_asm/hdf5/lib
export PATH=/public/users/chenying/smrt_asm/MECAT/Linux-amd64/bin:$PATH
export PATH=/public/users/chenying/smrt_asm/DEXTRACTOR:$PATH

Quick Start

Using MECAT to assemble a genome involves 4 steps. Here we take assemblying the genome of Ecoli as an example, to go through each step in order. Options of each command will be given in next section.

Assemblying Pacbio Data

We download the reads ecoli_filtered.fastq.gz from the MHAP website. By decompressing it we obtain ecoli_filtered.fastq.

Step 1, using mecat2pw to detect overlapping candidates

mecat2pw -j 0 -d ecoli_filtered.fastq -o ecoli_filtered.fastq.pm.can -w wrk_dir -t 16

Step 2, correct the noisy reads based on their pairwise overlapping candidates.

mecat2cns -i 0 -t 16 ecoli_filtered.fastq.pm.can ecoli_filtered.fastq corrected_ecoli_filtered

Step 3, extract the longest 25X corrected reads

extract_sequences corrected_ecoli_filtered.fasta corrected_ecoli_25x.fasta 4800000 25

Step 4, assemble the longest 25X corrected reads using mecat2cacu

mecat2canu -trim-assemble -p ecoli -d ecoli genomeSize=4800000 ErrorRate=0.02 maxMemory=40 maxThreads=16 useGrid=0 Overlapper=mecat2asmpw -pacbio-corrected corrected_ecoli_25x.fasta

Assemblying Nanopore Data

Download MAP006-PCR-1_2D_pass.fasta.

Step 1, using mecat2pw to detect overlapping candidates

mecat2pw -j 0 -d MAP006-PCR-1_2D_pass.fasta -o candidatex.txt -w wrk_dir -t 16 -x 1

Step 2, correct the noisy reads based on their pairwise overlapping candidates.

mecat2cns -i 0 -t 16 -x 1 candidates.txt MAP006-PCR-1_2D_pass.fasta corrected_ecoli.fasta

Step 3, extract the longest 25X corrected reads

extract_sequences corrected_ecoli.fasta corrected_ecoli_25x.fasta 4800000 25

Step 4, assemble the longest 25X corrected reads using mecat2cacu

mecat2canu -trim-assemble -p ecoli -d ecoli genomeSize=4800000 ErrorRate=0.06 maxMemory=40 maxThreads=16 useGrid=0 Overlapper=mecat2asmpw -nanopore-corrected corrected_ecoli_25x.fasta

After step 4, the assembled genome is given in file ecoli/ecoli.contigs.fasta. Details of the contigs can be found in file ecoli/ecoli.layout.tigInfo.

Input Format

MECAT is capable of processing FASTA, FASTQ, and H5 format files. However, the H5 files must first be transfered to FASTA format by running DEXTRACTOR/dextract before running MECAT. For example:

find pathto/raw_reads -name "*.bax.h5" -exec readlink -f {} \; > reads.fofn
while read line; do   dextract -v $line >> reads.fasta ; done <  reads.fofn

the extracted result should be the reads.fasta file for mecat's input file.

Program Descriptions

We describe in detail each module of MECAT, including their options and output formats.

`mecat2pw`

options

The command for running mecat2pw is

mecat2pw -j [task] -d [fasta/fastq] -w [working folder] -t [# of threads] -o [output] -n [# of candidates] -a [overlap size] -k [# of kmers] -g [0/1] -x [0/1]

The options are:

-j [task], job name, 0 = detect overlapping candidates only, 1 = output overlaps in M4 format, default = 1. If we are to correct noisy reads, outputing overlapping candidates is enough.
-d [fasta/fastq], reads file name in FASTA or FASTQ format.
-w [working folder], a directory for storing temporary results, will be created if not exists.
-t [# of threads], number of CPU threads used for overlapping, default=1.
-o [output], output file name
-n [# of candidates], number of candidates considered for gapped extension, default=100. Since each chunk is about 2GB size, number of candidates(NC) should be set by genome size (GS).For GS < 20M, NC should be set as 200; For GS>20M and GS<200M; NC should be set as 100; For GS>200M, NC should be set as 50.
-a [overlap size], only output overlaps with length >= a. Default: 2000 if x is set to 0, 500 if x is set to 1.
-k [# of kmers], two blocks between two reads having >= k kmer matches will be considered as a matched block pair. Default: 4 if x is set to 0, 2 if x is set to 1.
-g [0/1], output the gapped extension start point (1) or not (0), default=0.
-x [0/1], sequencing platform: 0 = Pacbio, 1 = Nanopore. Default: 0.

output format

If the job is detecting overlapping candidates, the results are output in can format, each result of which occupies one line and 9 fields:

[A ID] [B ID] [A strand] [B strand] [A gapped start] [B gapped start] [voting score] [A length] [B length]

mecat2pw outputs overlapping results in M4 format, of which one result is given in one line. The fileds of M4 format is given in order below:

[A ID] [B ID] [% identity] [voting score] [A strand] [A start] [A end] [A length] [B strand] [B start] [B end] [B length]

If the -g option is set to 1, two more fields indicating the extension starting points are given:

[A ID] [B ID] [% identity] [voting score] [A strand] [A start] [A end] [A length] [B strand] [B start] [B end] [B length] [A ext start] [B ext start]

In the strand field, 0 stands for the forward strand and 1 stands for the reverse strand. All the positions are zero-based and are based on the forward strand, whatever which strand the sequence is mapped. Here are some examples:

44 500 83.6617 30 0 349 8616 24525 0 1 10081 21813

353 500 83.2585 28 0 10273 18410 22390 1 0 10025 21813

271 500 80.4192 13 0 14308 19585 22770 1 4547 10281 21813

327 501 89.8652 117 0 10002 22529 22529 1 9403 21810 21811

328 501 90.8777 93 0 0 10945 22521 1 0 10902 21811

In the examples above, read 500 overlaps with reads 44, 353, 271, 327 and 328.

memory consumption

Before overlapping is conducted, the reads will be split into several chunks. Each chunk is about 2GB in size so that the overlapping can be run on a 8GB RAM computer.

`mecat2ref`

options

mecat2ref is used for mapping SMRT reads to the reference genomes. The command is

mecat2ref -d [reads] -r [reference] -w [folder] -t [# of threads] -o [output] -b [# of results] -m [output format] -x [0/1]

The meanings of each option are as follows:

-d [reads], reads file name in FASTA/FASTQ format
-r [reference], reference genome file name in FASTA format
-w [folder], a directory for storing temporary results
-t [# of threads], number of working CPU threads
-o [output], output file name
-b [# of result], output the best b alignments
-m [output format], output format: 0 = ref, 1 = M4, 2 = SAM, default = 0
-x [0/1], sequencing platform: 0 = Pacbio, 1 = Nanopore. Default: 0.

output format

mecat2ref outputs results in one of the three formats: the ref format, the M4 format, and the SAM format.

For the ref format, each result occupies three lines in the form:

[read name] [ref name] [ref strand] [voting score] [read start] [read end] [read length] [ref start] [ref end]

mapped read subsequence

mapped reference subsequence

The strands of the reads are always forward. In the [ref strand] field, F indicates forward strand while R indicates reverse strand. All the positions are zero-based and relative to the forward strand. Here is an example:

1	gi|556503834|ref|NC_000913.3|	F	10	2	58	1988134	1988197

AAT-AGCGCCTGCCAGGCG-TCTTTT--CCGGCCATTGT-CGCAG--CACTGTAACGCGTAAAA

AATTAGCGCCTGCCAGGCGGTCTTTTTTCCGGCCATTGTTCGCAGGG-ACTGTAACGCGTAAAA

In this example, read 1 is mapped to the reference gi|556503834|ref|NC_000913.3|.

memory consumption

Index for the genome: genomeSize * 8 bytes
Compressed index for each CPU thread: genomeSize * 0.1 * t bytes
Local alignment: 100M * t + 1G bytes

`mecat2cns`

mecat2cns is an adaptive error correction tool for high-noise single-molecula sequencing reads. It is as accurate as pbdagcon and as fast as FalconSense. Inputs to mecat2cns can be either can format or M4 format. The command for running mecat2cns is

mecat2cns [options] overlaps-file reads output

The options are

-x [0/1], sequencing platform: 0 = Pacbio, 1 = Nanopore. Default: 0.
-i [input type], input format, 0 = can, 1 = `M4
-t [# of threads], number of CPU threads for consensus
-p [batch size], batch size the reads will be partitioned
-r [ratio], minimum mapping ratio
-a [overlap size], overlaps with length >= a will be used.
-c [coverage], minimum coverage, default=6
-l [length], minimum length of the corrected sequence

If x is 0, then the default values for the other options are:

-i 1 -t 1 -p 100000 -r 0.9 -a 2000 -c 6 -l 5000

If x is 1, then the default values for the other options are:

-i 1 -t 1 -p 100000 -r 0.4 -a 400 -c 6 -l 2000

If the inputs are M4 format, the overlap results in [overlaps-file] must contain the gapped extension start point, which means the option -g in mecat2pw must be set to 1, otherwise mecat2cns will fail to run. Also note that the memory requirement of mecat2cns is about 1/4 of the total size of the reads. For example, if the reads are of total size 1GB, then mecat2cns will occupy about 250MB memory.

output format

The corrected sequences are given in FASTA format. The header of each corrected sequence consists of three components seperated by underlines:

>A_B_C_D

where

A is the original read id
B is the left-most effective position
C is the right-most effective position
D is the length of the corrected sequence

by effective position we mean the position in the original sequence that is covered by at least c (the argument to the option -c) reads.

`extract_sequences`

extract_sequences was applied into extract 25X 0r 40X longest sequences from the corrected data. The command is

extract_sequences [the input fasta file from mecat2cns] [the output filename] [genome size]  [coverage]

`mecat2canu`

mecat2canu is a modified and more efficient version of the Canu pipeline. mecat2canu accelerates canu by replacing its overlapper mhap by mecat2asmpw, which is a customized version of mecat2pw. The options of mecat2canu are the same as canu except its overlapper is replaced by mecat2asmpw. The command for assemblying corrected Pacbio reads is

mecat2canu -d [working-folder] -p [file-prefix] -trim-assemble errorRate=[fraction error] \

	-overlapper=mecat2asmpw genomeSize=[genome size] \

    maxMemory=[host memory size] maxThreads=[# of CPU threads] usedGrid=0 \

    -pacbio-corrected reads-name

The command for assemblying corrected Nanopore reads is

mecat2canu -d [working-folder] -p [file-prefix] -trim-assemble errorRate=[fraction error] \

	-overlapper=mecat2asmpw genomeSize=[genome size] \

    maxMemory=[host memory size] maxThreads=[# of CPU threads] usedGrid=0 \

    -nanopore-corrected reads-name

After assembling, the results are given in the folder working-folder. The assembled genome is given in the file working-folder/file-prefix.contigs.fasta and the details of the contigs are given in the file working-folder/file-prefix.layout.tigInfo.

`Citation`

Chuan-Le Xiao, Ying Chen, Shang-Qian Xie, Kai-Ning Chen, Yan Wang, Yue Han, Feng Luo, Zhi Xie. MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nature Methods, 2017, 14: 1072-1074

`Contact`

Chuan-Le Xiao, [email protected]
Ying Chen, [email protected]
Feng Luo, [email protected]

`Update Information`

Updates in MECAT V1.3 (2017.12.18):

Correct text error in HDF5 Installation.
Update the makefile in dextract .
Update citation.

Updates in MECAT V1.2 (2017.5.22):

Add trimming module in mecat2canu to improve the integrality of the assembly.
Add supports for Nanopore data.
Improve the sensitivity of mecat2ref.

MECAT v1.1 replaced the old MECAT,some debug were resolved and some new fuctions were added:

1. we added the extracted tools for the raw H5 format files.
1. some debugs from running mecat2canu were solved

mecat's People

Contributors

Stargazers

Watchers

mecat's Issues

run PacBio and Nanopore together

Hi,

I want to know if there is the possibility of running Pacbio and nanopore reads together with MECAT and if yes how it should be run. Is it similar to canu?

thanks

extract_sequences error: [main, 53] system() error. Error code is 139.

Hello,

I'm using extract_sequences to extract longest reads from PacBio raw data. But got the following error:

step 1: convert fasta to fastq
[main, 53] system() error. Error code is 139.

The command I used:

/home/software/MECAT/Linux-amd64/bin/extract_sequences all.fasta 50X 2000000000 50

Do you know how to solve this? Thanks in advance!

corrected_fasta is too few

Dear all,
I have NTS plant data. After mecat2pw,mecat2cns and extract_sequences processes, it only left ~4.4G corrected data. My raw data is about 220G. When I choose Canu to correct my data, it left nearly 70G data. Then I use mecat2canu to assembl plant genome(~2.1G genome), Canu corrected data assembly genome seems correct 2G, MECAT corrected data is only ~400M.

Have you met this problem?
Hope your suggestions.

metagenome assembly

Hi Dr. Xiao,
I am useing MECAT to assemble the metagenomics Naopore data. All procedure goes right. While I am confused at the step 3 to extract sequences for the genomeSize parameter. As metagenomic is a pool genomes of diferrent species. I can't predict the genome size like other single species.
So could you give me some suggestions about the genomeSize parameter?

      Best Regards
      Xiangyu

canu failed with 'didn't find 'unitigging/3-overlapErrorAdjustment/oea.files' to add to store, yet overlapper finished'.

We got this error during 4th step with mecat2canu command,

mecat2canu -trim-assemble -p O_australiensis -d O_australiensis_mecat1st_40x genomeSize=960000000 ErrorRate=0.02 maxMemory=1000 maxThreads=40 useGrid=0 Overlapper=mecat2asmpw -pacbio-corrected corrected_40x_for_O_australiensis_mecat1st_40x.fasta

Don't panic, but a mostly harmless error occurred and canu failed.

canu failed with 'didn't find '/<snip>/unitigging/3-overlapErrorAdjustment/oea.files' to add to store, yet overlapper finished'.

the unitigging dir is on a local disk. This disk is also exported via NFS to our other servers. Yet, when we run the same mecat2canu command on another server (older and slower, and via NFS), it is running OK. The only difference on the other server is maxMemory=160 maxThreads=16

mecat2cns error

Hello,
I used this software for my genome assembly in cluster system with more than 2T memory, but I always get follows error:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/opt/gridview//pbs/dispatcher/mom_priv/jobs/1963687.gvadmin.SC: line 17: 23500 Aborted                 (core dumped) /public/home/cotton/software/MECAT/Linux-amd64/bin/mecat2cns -i 0 -t ${cpu} nh.all.subreads.fastq.pm.can ${pbcio} corrected_nh.all.subreads.fasta

So, any suggestion? Thanks.

Erro when run es_fasta2fastq

Sorry to disturb, I installed MECAT and tried it with e.coli data today, but I met some mistakes (maybe in step3), the erro message as follow:

[processing ecoli_filtered.fasta.pm.can.part0] begins.
[processing ecoli_filtered.fasta.pm.can.part0] takes 1102.27 secs.
step 1: convert fasta to fastq
sh: es_fasta2fastq: command not found
[main, 53] system() error. Error code is 32512.

it mention that 'sh: es_fasta2fastq: command not found', but I can find 'es_fasta2fastq' at my MECAT directory.Here r the output:
278M Jun 21 14:59 corrected_ecoli_filtered.fasta
77M Jun 21 14:40 ecoli_filtered.fasta.pm.can
159M Jun 21 14:40 ecoli_filtered.fasta.pm.can.part0
42 Jun 21 14:40 ecoli_filtered.fasta.pm.can.partition_files
87 Jun 21 14:32 fileindex.txt
77M Jun 21 14:40 r_0
95M Jun 21 14:32 vol0

Could u give me some advice to solve it?
Thank you very much~

the problem in transfereing to FASTA format by running DEXTRACTOR/dextract

Hi, when I run "while read line; do dextract -v $line >> reads.fasta ; done < reads.fofn", error exists:
"(null): zlib library is not present, check build/installation"
Could anyone give me some advice？
Many thanks.

MECAT组装高杂合基因

MECAT组装出来的基因组，用purge_haplotigs 去杂合后大小还是比预测的基因组大，有没有什么方法像falcon-unzip一样对二倍体基因组分型？

MAX_SEQ_SIZE too low (crash with reads > 500 kb)

Hi,

I am trying to run MECAT and have a few reads > 500kb in size which leads to a crash with a somewhat cryptic error message giving the rsize. I found MAX_SEQ_SIZE defined as 500000 in common/defs.h which leads to the mecat2pw crash. Is it safe to just increase this? If not it would be good if MECAT could split longer reads to avoid crashes.

alignment

Hi,
in mecat2ref_aux.cpp line 100 and line 101; the code
left_ref_size = min(L2, (long)(L * 1.2));
right_ref_size = min(R2, (long)(R * 1.2));
replaced by
left_ref_size = min(L2, (long)(L * (1 + ddfs_cutoff)));
right_ref_size = min(R2, (long)(R * (1 + ddfs_cutoff)));
is more suitable?

6666，标个星回头研究一下

MECAT维护

mecat可能是唯一一个能胜过canu的软件，是**人独立开发的，值得自豪！
但后期维护不太理想，网友遇到问题不知道如何解决。也希望肖老师团队能多帮帮忙。不要让这一款优秀的软件石沉大海！
非常感谢

make -f dextract_makefile Error

my command is "make -f dextract_makefile CC='/usr/bin/gcc' "

/usr/bin/gcc -O3 -Wall -Wextra -Wno-unused-result -fno-strict-aliasing -I/data/langna/softwares/MECAT/hdf5/include -L/data/langna/softwares/MECAT/hdf5/lib -o dextract dextract.c DB.c QV.c -lhdf5
/usr/bin/gcc -O3 -Wall -Wextra -Wno-unused-result -fno-strict-aliasing -o dexta gdexta.c DB.c QV.c
gcc: error: gdexta.c: No such file or directory
make: *** [dexta] Error 1

installation problem with dextract

Hi,
I am installing DEXTRACT. The dextract_makefile is as follows.
`CFLAGS = -O3 -Wall -Wextra -Wno-unused-result -fno-strict-aliasing
CC = gcc

all: dextract dexta undexta dexqv undexqv

dextract:
${CC} $(CFLAGS) -I$(HDF5_INCLUDE) -L$(HDF5_LIB) -o dextract dextract.c DB.c QV.c -lhdf5

dexta:
${CC} ${CFLAGS} -o dexta gdexta.c DB.c QV.c

undexta:
${CC} ${CFLAGS} -o undexta undexta.c DB.c QV.c

dexqv:
${CC} ${CFLAGS} -o dexqv dexqv.c DB.c QV.c

undexqv:
${CC} ${CFLAGS} -o undexqv undexqv.c DB.c QV.c`

And when I run make -f dextract_makefile, error exists. Could anyone give me a copy of the gdexta.c file? Many thanks.
gcc -O3 -Wall -Wextra -Wno-unused-result -fno-strict-aliasing -o dexta gdexta.c DB.c QV.c gcc: error: gdexta.c: No such file or directory make: *** [dexta] Error 1

Run into error message with mecat2cns: assertion 'size == size_in_ovlp' failed

Dear All,

I have successfully installed MECAT, and successfully tested it with the given Ecoli data.
But when I run it on my dataset (passed mecat2pw with a 170G can file), it crashed in the step of mecat2cns.
Hereafter is the final output, any instruction?

best regards,
Shengfeng.

...
lm_sp_raw_seq.pm.can.part180 contains reads 18000001 --- 18099999
lm_sp_raw_seq.pm.can.part181 contains reads 18100000 --- 18112879
[partition_candidates] takes 138020.07 secs.
[load_fasta_db] begins.
[load_fasta_db] takes 20540.66 secs.
[processing lm_sp_raw_seq.pm.can.part0] begins.
[GetSequence, 33] assertion 'size == size_in_ovlp' failed
tech = 0
./mecat_lamprey.batch: line 12: 69208 Aborted (core dumped) mecat2cns -i 0 -t 60 lm_sp_raw_seq.pm.can raw_fasta/lm_sp_raw_seq.fasta lm_sp_corrected_seq.fasta

Program do not support gcc/g++ 7.x

Cannot execute make command of MECAT when using gcc/g++ version after 7.x, I rolled back my gcc/g++ to 5.0 and it worked.

Choice of error rate

I am new to metcat and attempting to assemble amplicon based sequence reads for a virus genome. I am wondering on the set of optimal parameters for a 16000 long genome with 1500-2500 long amplicon sequences. In particular, how do I set the error rate parameter? why cant metcat deduce this from the data?

canu failed with 'failed to adjust overlap error rates. Made 2 attempts, jobs still failed'.

Dear All,

I am working in the Nanopore whole genome 1D data. I ran canu (refer the below command) for assembly. I am getting this error "canu failed with 'failed to adjust overlap error rates. Made 2 attempts, jobs still failed'." I had checked the same error issues. But I found this same issues in metacanu.

Command:
"canu -p magna -d Magnapothe -assemble genomeSize=41m -nanopore-raw Magnaporthe.fastq stopOnReadQuality=false "

Please help me to fix this error.
Thanks
Krithika

extract_sequences issus

Hi,
when I assemble the giving dataset , and the first and second steps run correctly. But extract_sequences command run with the following error:

MECAT /HDD1/software/MECAT/MECAT/Linux-amd64/bin/extract_sequences corrected_ecoli_filtered.fasta corrected_ecoli_25x.fasta 48000000 25
step 1: convert fasta to fastq
sh: 1: es_fasta2fastq: not found
[main, 53] system() error. Error code is 32512.

I want to know why and need your help,thanks.

[ Mecat2cns ERROR ]

My command is "mecat2cns -i 0 -t 16 reads.fastq.pm.can reads.fastq correct_reads". I got this error: [dw, 433] assertion 'ch >= 0 && ch <= 4' failed, Aborted (core dumped).

input_type: 0
reads reads.fastq.pm.can
output reads.fastq
m4 correct_reads
number of threads: 16
batch size: 100000
mapping ratio: 0.9
align size: 2000
cov: 6
min size: 5000
partition files: 10
tech: 0
[partition_candidates] begins.
Lid = 0, Rid = 100000
reads.fastq.pm.can.part0 contains reads 0 --- 44047
[partition_candidates] takes 5.34 secs.
[load_fasta_db] begins.
[load_fasta_db] takes 20.88 secs.
[processing reads.fastq.pm.can.part0] begins.
[dw, 433] assertion 'ch >= 0 && ch <= 4' failed
Aborted (core dumped)

Thanks!
Looking forward to replies.

Hey there- I was able to get through overall adjustment so that looks good- could you let me know how to restart the mecat pipeline at the appropriate step? It was not clear how to do this.

MECAT can automatically check the task breakpoint in all four step! Now, your error is the step 4 for running mecat2canu and you should submit the step 4 command for restart mecat2canu with setting the same folder of assembly. mecat2canu can check all files in this folder and find the breakpoint to continue this task!

About the SLURM supporting

Do this tool support SLURM system? If not, is it hard to add this function?

high memory usage for mecat2cns

The README noticed that the memory requirement of mecat2cns is about 1/4 of the total size of the reads. For example, if the reads are of total size 1GB, then mecat2cns will occupy about 250MB memory.

However, the size of my PacBio reads is 45GB. When I run the following steps, the memory usage has risen to ~1TB

mecat2pw -j 0 -d reads.fa -o reads.fa.can -w dir_pw -t 24 -n 50 -x 1
mecat2cns -i 0 -t 32 -x 1 reads.fa.can reads.fa reads.correct.fa

( I use NanoPore instead of PacBio mode to keep more corrected reads)

mecat2cns always get killed

The following is the final part of the output log of mecat2cns.

========================================================

hsy.subreads.overlap.can.part87 contains reads 8700000 --- 8799996
hsy.subreads.overlap.can.part88 contains reads 8800000 --- 8899998
hsy.subreads.overlap.can.part89 contains reads 8900001 --- 8999998
hsy.subreads.overlap.can.part90 contains reads 9000002 --- 9043615
[partition_candidates] takes 7542.52 secs.
[load_fasta_db] begins.
[load_fasta_db] takes 435.02 secs.
[processing hsy.subreads.overlap.can.part0] begins.
mecat_pipe.sh: line 14: 103199 Killed mecat2cns -i 0 -t 24 -x $tech $output $subreads $corrected_reads

========================================================

Do you have any suggestion about this problem?

The corresponding code is pasted below:

File: reads_correction_can.cpp

========================================================

99 char process_info[1024];
100 for (std::vector::iterator iter = partition_file_vec.begin(); iter != partition_file_vec.end(); ++iter)
101 {
102 sprintf(process_info, "processing %s", iter->file_name.c_str());
103 DynamicTimer dtimer(process_info);
104 consensus_one_partition_can(iter->file_name.c_str(), iter->min_seq_id, iter->max_seq_id, rco, reads, out);
105 }
106

=========================================================

problem with step 3 extract_sequence,no results file...

Only success once after install the soft, (step1 to step4 are all ok!)

but when repeat again ,fail at step 3 with no results file generated,

command:
extract_sequences ecoli_filtered.fastq.corrected.fasta corrected_ecoli_25x.fasta 4800000 25)

no corrected_ecoli_25x.fasta.fasta.qual ,
no corrected_ecoli_25x.fasta.fasta.qv
no corrected_ecoli_25x.fasta.fasta.frg
no corrected_ecoli_25x.fasta.fasta

also, there were no errors, no warnings,as show below?

does anyone else can fix it ?

[root@localhost Genome_Assembly]# extract_sequences ecoli_filtered.fastq.corrected.fasta corrected_ecoli_25x.fasta 4800000 25

step 1: convert fasta to fastq
step 2: convert fastq to CA
step 3

Starting file 'ecoli_filtered.fastq.corrected.fasta.fastq.libname.frg'.

Processing SINGLE-ENDED SANGER QV encoding reads from:
'/home/workplace/Genome_Assembly/ecoli_filtered.fastq.corrected.fasta.fastq'

GKP finished with no alerts or errors.

step 4
Longest picked cutoff: 12691
Scanning store to find libraries used.
Added 0 reads to maintain mate relationships.
Dumping 0 fragments from unknown library (version 1 has these)
Dumping 7639 fragments from library IID 1
step 5
Longest picked cutoff: 12691
Scanning store to find libraries used and reads to dump.
Added 0 reads to maintain mate relationships.
Dumping 0 fragments from unknown library (version 1 has these)
Dumping 7639 fragments from library IID 1

seg fault in 3-overlapErrorAdjustment

There's a small bug in mecat2canu/src/overlapErrorAdjustment/FindError.C that, with some data sets, can cause repeatable segmentation fault errors. Specifically, if there's a gap in read ids of greater than FRAGS_PER_BATCH, Extract_Needed_Frags() can generate a null list (no reads or basepairs) and then attempt to load a read anyway (causing a seg fault).

It's not an algorithm error, it's just a loop that needs to be skipped (if the looping continues properly, it'll catch up to the higher read id after some boring null extractions). So, it's a simple fix - just check hiID vs fi at the top of the routine:

--- findErrors.C.orig 2018-03-08 11:22:00.553731726 -0600
+++ findErrors.C 2018-03-08 11:22:29.777560012 -0600
@@ -86,6 +86,7 @@
uint32 ii = 0; // Index into reads arrays
uint32 fi = G->olaps[lastOlap].b_iid; // Actual ID we're extracting

+ if (hiID < fi) return;
assert(loID <= fi);

fprintf(stderr, "Extract_Needed_Frags()-- Loading used reads between "F_U32" and "F_U32".\n",

mecat2canu job failed

Hi
I use MECAT to assemble a genome size 300 Mb genome with pacbio subreads.In the last step mecat2canu it give a error "canu failed with 'failed to adjust overlap error rates. Made 2 attempts, jobs still failed'" ,the ErrorRate i set 0.02 0.04 to different attempt it have the same error message.

The error message show 1 read error detection jobs failed: read error detection attempt 2 begins with 132 finished, and 1 to compute 10 overlap error adjustment jobs failed and so on like failed to adjust overlap error rates.

scheduler support

Does MECAT support some other queueing software that’s not SLURM? we are planning to run MECAT in aws cfncluster and want to know whether it supports sge/torque/slurm

mecat2cns error , failed to open file with mode ios::in

hello all,
when i try the test data as the nanopore demo, the error as following:

when i review the source code i find that, but i know a few about cpp, who can help

thx

Mecat2pw failing

Hi,
I am trying to use MECAT but mecat2pw is failing without giving an error message, could you help me out resolving this issue ?
Here is the command line I used :

/env/cns/src/mecat/MECAT/Linux-amd64/bin/mecat2pw -j 0 -d /env/cns/bigtmp1/benchmark_BWW/reads/quality/40/quality_40x_reads.fastq -o candidatex.txt -w wrk_dir -t 36 -x 1

And here is the log obtained :

[split_raw_dataset] begins.
[split_raw_dataset, 264] split '/env/cns/bigtmp1/benchmark_BWW/reads/quality/40/quality_40x_reads.fastq' (3202440 reads, 24000004577 nucls) into 12 volumes.
[split_raw_dataset] takes 395.80 secs.
[create_ref_index] begins.
36 threads are used for filling offset lists.
[create_ref_index] takes 105.06 secs.
[process volume 0] begins.
[process_one_volume, 864] processing wrk_dir/vol0

[process volume 0] takes 104.01 secs.
[process volume 1] begins.
[process_one_volume, 864] processing wrk_dir/vol1

[process volume 1] takes 112.72 secs.
[process volume 2] begins.
[process_one_volume, 864] processing wrk_dir/vol2

[process volume 2] takes 113.07 secs.
[process volume 3] begins.
[process_one_volume, 864] processing wrk_dir/vol3

[process volume 3] takes 114.04 secs.
[process volume 4] begins.
[process_one_volume, 864] processing wrk_dir/vol4

[process volume 4] takes 115.10 secs.
[process volume 5] begins.
[process_one_volume, 864] processing wrk_dir/vol5

[process volume 5] takes 116.07 secs.
[process volume 6] begins.
[process_one_volume, 864] processing wrk_dir/vol6

[process volume 6] takes 115.44 secs.
[process volume 7] begins.
[process_one_volume, 864] processing wrk_dir/vol7

[process volume 7] takes 116.48 secs.
[process volume 8] begins.
[process_one_volume, 864] processing wrk_dir/vol8

[process volume 8] takes 116.77 secs.
[process volume 9] begins.
[process_one_volume, 864] processing wrk_dir/vol9

wrk_dir/fileindex.txt
number of kmers: 1398798847
thread 0: 0     1297892
thread 1: 1297893       2981413
thread 2: 2981414       4326667
thread 3: 4326668       6260602
thread 4: 6260603       8536815
thread 5: 8536816       10321368
thread 6: 10321369      12237351
thread 7: 12237352      13698567
thread 8: 13698568      15261056
thread 9: 15261057      16638257
thread 10: 16638258     18083329
thread 11: 18083330     20063103
thread 12: 20063104     21560774
thread 13: 21560775     24552058
thread 14: 24552059     27954682
thread 15: 27954683     30637011
thread 16: 30637012     32571964
thread 17: 32571965     33918469
thread 18: 33918470     35835911
thread 19: 35835912     37563779
thread 20: 37563780     39716932
thread 21: 39716933     42268996
thread 22: 42268997     44957182
thread 23: 44957183     47094526
thread 24: 47094527     49323826
thread 25: 49323827     50917614
thread 26: 50917615     52727203
thread 27: 52727204     54431528
thread 28: 54431529     55832592
thread 29: 55832593     58176780
thread 30: 58176781     59584561
thread 31: 59584562     61328386
thread 32: 61328387     62925375
thread 33: 62925376     64352369
thread 34: 64352370     65851021
thread 35: 65851022     67108863
rsize = 610810  500000
/var/spool/slurmd/job4935283/slurm_script: line 2:  7638 Abandon                 (core dumped) /env/cns/src/mecat/MECAT/Linux-amd64/bin/mecat2pw -j 0 -d /env/cns/bigtmp1/benchmark_BWW/reads/quality/40/quality_40x_reads.fastq -o candidatex.txt -w wrk_dir -t 36 -x 1

canu failed with 'failed to adjust overlap error rates. Made 2 attempts, jobs still failed'.

Hi,
I used mecat2 to assemble pacbio reads recently. The first three step were successfully done. But when I run the last mecat2canu step. Something wrong happened. It appeared as follows. The command is mecat2canu -assemble -p emu_50x -d emu_50x genomeSize=1.3g ErrorRate=0.015 maxMemory=200 maxThreads=16 useGrid=0 Overlapper=mecat2asmpw -pacbio-corrected corrected_Emu_F_Pacbio_Copenhagen_merge_all_50x.fasta.fasta

Could you give me some suggestions?

-- read error detection attempt 2 begins with 476 finished, and 2 to compute.

-- Starting concurrent execution on Sun Jan 15 11:20:25 2017 with 6292.5 GB free disk space (2 processes; 2 concurrently)
/public/home/lijing/mecat_emu_copenhagen/emu_50x/unitigging/3-overlapErrorAdjustment/red.sh 201 > /public/home/lijing/mecat_emu_copenhagen/emu_50x/unitigging/3-overlapErrorAdjustment/red.000201.out 2>&1
/public/home/lijing/mecat_emu_copenhagen/emu_50x/unitigging/3-overlapErrorAdjustment/red.sh 254 > /public/home/lijing/mecat_emu_copenhagen/emu_50x/unitigging/3-overlapErrorAdjustment/red.000254.out 2>&1
-- Finished on Sun Jan 15 11:27:06 2017 (401 seconds) with 6292.4 GB free disk space

--
-- 82 overlap error adjustment jobs failed:
-- job /public/home/lijing/mecat_emu_copenhagen/emu_50x/unitigging/3-overlapErrorAdjustment/0001.oea FAILED.

Don't panic, but a mostly harmless error occurred and canu failed.

canu failed with 'failed to adjust overlap error rates. Made 2 attempts, jobs still failed'.

Sun Jan 15 11:29:03 CST 2017

The final output file is as follows. There is no contigs.fasta or unitigs.fasta file.

total 56K
drwxr-xr-x 2 lijing users 36K Jan 15 11:27 canu-logs
drwxr-xr-x 2 lijing users 4.0K Jan 6 12:55 canu-scripts
drwxr-xr-x 6 lijing users 4.0K Jan 14 03:42 unitigging
-rw-r--r-- 1 lijing users 6.2K Jan 15 11:29 unitigging.html

Add support for reading gzip-compressed reads

Currently programs such as mecat2pw can not read gzip-compressed reads

OSX: make does not work

Hi,

I download your software and look in the file, I found a ready to use "Makefile"
Then I did "make" and obtain this error:

c++ -o ../Darwin-amd64/obj/extract_sequences/src/extract_sequences/extract_sequences.o -c -MD -D_GLIBCXX_PARALLEL -pthread -O3 -Wall -Isrc/extract_sequences src/extract_sequences/extract_sequences.cpp c++ -o ../Darwin-amd64/bin/extract_sequences -pthread -lm -fopenmp -L../Darwin-amd64/bin ../Darwin-amd64/obj/extract_sequences/src/extract_sequences/extract_sequences.o -les clang: error: unsupported option '-fopenmp' make[1]: *** [../Darwin-amd64/bin/extract_sequences] Error 1 make: *** [extractSequences] Error 2

OSX use clang by default, it might be the reason

canu configuration error

Hi,
when I run the mecat2canu, I get a configuration error :

Don't panic, but a mostly harmless error occurred and canu failed.

canu failed with 'task meryl failed to find a configuration to run on'.

I want to know if I install the mecat correctly? Or the configuration need to configure by myself?

mecat2cns has an error

Hi,
When I use MECAT to deal with my Sequel data, there was an error, the command as follows:

 mecat2cns -i 0 -t 20 Sequel.RunS013.004.fq.can Sequel.RunS013.004.fq corrected_Sequel.RunS013.004.fa

And the errors was
input_type: 0
reads Sequel.RunS013.004.fq.can
output Sequel.RunS013.004.fq
m4 corrected_Sequel.RunS013.004.fa
number of threads: 20
batch size: 100000
mapping ratio: 0.9
align size: 2000
cov: 6
min size: 5000
partition files: 10
tech: 0
[partition_candidates] begins.
Lid = 0, Rid = 800000
Sequel.RunS013.004.fq.can.part0 contains reads 0 --- 99998
Sequel.RunS013.004.fq.can.part1 contains reads 100000 --- 199975
Sequel.RunS013.004.fq.can.part2 contains reads 200001 --- 299999
Sequel.RunS013.004.fq.can.part3 contains reads 300000 --- 307724
Sequel.RunS013.004.fq.can.part5 contains reads 527201 --- 599996
Sequel.RunS013.004.fq.can.part6 contains reads 600002 --- 699999
Sequel.RunS013.004.fq.can.part7 contains reads 700000 --- 739850
[partition_candidates] takes 6.05 secs.
[load_fasta_db] begins.
[load_fasta_db] takes 104.28 secs.
[processing Sequel.RunS013.004.fq.can.part0] begins.
[GetSequence, 33] [GetSequence, 33] [GetSequence, 33] [GetSequence, 33] assertion 'size == size_in_ovlp' failedassertion 'size == size_in_ovlp' failedassertion 'size == size_in_ovlp' failed
assertion 'size == size_in_ovlp' failed
[GetSequence, 33] [GetSequence, 33] assertion 'size == size_in_ovlp' failed

assertion 'size == size_in_ovlp' failed
[GetSequence, 33] Aborted (core dumped)

What is the matter? Please give me some advice to solve it. Thank you very much!

                                                                                                                              Wang

undefined reference to multiple functions when trying to make Dextract

Hi,

I am trying to install dextract following the instruction on the readme, however when typing
make -f dextract_makefile
I get
gcc -O3 -Wall -Wextra -Wno-unused-result -fno-strict-aliasing -I/home/.../hdf5/include -L/home.../hdf5/lib -o dextract dextract.c DB.c QV.c -lhdf5 /tmp/ccwqsPQR.o: In function main':
dextract.c:(.text.startup+0x111): undefined reference to parse_filter' dextract.c:(.text.startup+0x228): undefined reference to initBaxData'
dextract.c:(.text.startup+0x2ee): undefined reference to getBaxData' dextract.c:(.text.startup+0x308): undefined reference to nextSubread'
dextract.c:(.text.startup+0x32b): undefined reference to nextSubread' dextract.c:(.text.startup+0x34c): undefined reference to evaluate_bax_filter'
dextract.c:(.text.startup+0xbfe): undefined reference to sam_close' dextract.c:(.text.startup+0xcec): undefined reference to sam_open'
dextract.c:(.text.startup+0xd02): undefined reference to sam_header_process' dextract.c:(.text.startup+0xd36): undefined reference to sam_record_extract'
dextract.c:(.text.startup+0xd49): undefined reference to SAM_EOF' dextract.c:(.text.startup+0xd5c): undefined reference to evaluate_bam_filter'
dextract.c:(.text.startup+0x112b): undefined reference to getBaxData' dextract.c:(.text.startup+0x1303): undefined reference to sam_open'
dextract.c:(.text.startup+0x1445): undefined reference to parse_filter' dextract.c:(.text.startup+0x1757): undefined reference to printBaxError'
collect2: error: ld returned 1 exit status
make: *** [dextract] Error 1
`
is there something I am missing?

Thanks

Updating to latest Canu version to support PBSpro

Hi,
Do you have any plans to update Mecat to the latest Canu version to support PBSpro?

Thank you in advance,

Michal

Are some reads with repetitive regions filtered by MECAT, but we do want to error correct repetitive regions

we extract high-confidence reads with more 0.9 overlap size into error correction, so all reads with repetitive regions should be better for error correction by high-confidence reads. it was validated in arabidopsis thaliana dataset in our paper. Our ouptput results from our error correction is more than FALCON and Canu.

segmentation faults, crashes of mecat2asmpw, mecat2asmpwConvert, and ovStoreBuild

Using current version of MECAT.

MECAT output: mecat.log (note that some pathnames have been trimmed for readability)
Crashes start happening during "mecat2asmpw attempt 0" (Wed Sep 12 02:24:10 2018)

crashes from system log: messages.log

Segmentation fault during mecat2pw

Tried MECAT today. Keep getting a segmentation fault when running mecat2pw. This happens when I run it both locally and in the grid (SLURM). See below.

CentOS Linux release 7.2.1511 with gcc version 4.8.5.

-bash-4.2$ mecat2pw -j 0 -d reads.fasta -o reads.pm.can -w . -t 16
[split_raw_dataset] begins.
[split_raw_dataset, 264] split 'reads.fasta' (58523 reads, 587045250 nucls) into 1 volumes.
[split_raw_dataset] takes 6.95 secs.
./fileindex.txt
[create_ref_index] begins.
number of kmers: 527965185
16 threads are used for filling offset lists.
thread 0: 0     2925180
thread 1: 2925181       6716638
thread 2: 6716639       11639811
thread 3: 11639812      15479344
thread 4: 15479345      18937044
thread 5: 18937045      24412997
thread 6: 24412998      29871967
thread 7: 29871968      33818810
thread 8: 33818811      38575003
thread 9: 38575004      43806732
thread 10: 43806733     49092545
thread 11: 49092546     53228032
thread 12: 53228033     57123580
thread 13: 57123581     60952578
thread 14: 60952579     64466559
thread 15: 64466560     67108863
[create_ref_index] takes 204.85 secs.
[process volume 0] begins.
[process_one_volume, 834] processing ./vol0

Segmentation fault

mecat2ref error

Dear Dr Xiao:
An error occurred during my use of this software："Segmentation fault".
And all parameters are the default values used.
I would like to ask if this software supports PBS？
Thanks

support for .gz

Will MECAT support .gz format in next edition?

Compilation Error

Hi,
I git cloned and ran make. Came up against a problem:

git clone https://github.com/PacificBiosciences/DEXTRACTOR.git
Cloning into 'DEXTRACTOR'...
remote: Counting objects: 221, done.
remote: Total 221 (delta 0), reused 0 (delta 0), pack-reused 221
Receiving objects: 100% (221/221), 197.58 KiB | 0 bytes/s, done.
Resolving deltas: 100% (150/150), done.
Checking connectivity... done.
gcc -O3 -Wall -Wextra -Wno-unused-result -fno-strict-aliasing -I/share/michelmorelab/rwmwork/fletcher/CabernetPrograms/test/MECAT/aux_tools/hdf5/include -L/share/michelmorelab/rwmwork/fletcher/CabernetPrograms/test/MECAT/aux_tools/hdf5/lib -o /share/michelmorelab/rwmwork/fletcher/CabernetPrograms/test/MECAT/Linux-amd64/bin/dextract DEXTRACTOR/dextract.c DEXTRACTOR/DB.c DEXTRACTOR/QV.c -lhdf5
/usr/bin/ld: cannot open output file /share/michelmorelab/rwmwork/fletcher/CabernetPrograms/test/MECAT/Linux-amd64/bin/dextract: No such file or directory
collect2: error: ld returned 1 exit status
make[1]: *** [dextract] Error 1
make[1]: Leaving directory `/share/michelmorelab/rwmwork/fletcher/CabernetPrograms/test/MECAT/aux_tools/dextractor'
make: *** [dextract] Error 2

I was able to bypass it (I think) by setting the BUILD_DIR in the primary makefile, but might be good to fix for future downloads. Program currently running (mecat2pw)

Thanks
Kyle

Strongly shrinking after correction

My PacBio original data has 28.4G base with read length >5kb. After using mecat2pw and mecat2cns to correct them with all default parameters, only 8.3G base has left. What's the possible problem with my data?

extract_sequences attached fasta suffix

Hi
extract_sequences attached fasta suffix which caused that the created output.fasta.fasta. Or the documentation on github has a mistake.

Thank you in advance.

Michal

How much coverage for extract_sequences

Hi,
Should always 25X be provided to MECAT or is there a rule to identify the optimal coverage?

Thank you in advance.

Michal

Compilation error MacOS

Trying to build on Mac OS Sierra using gcc-6 and getting the following error at the dextractor build step. Same error if I use default clang to build.

echo /usr/local/MECAT
/usr/local/MECAT
cd aux_tools/dextractor && make PATH_HDF5=/usr/local/MECAT/aux_tools/hdf5 BUILD_DIR=/usr/local/MECAT/Darwin-amd64/bin
gcc-6 -O3 -Wall -Wextra -Wno-unused-result -fno-strict-aliasing -I/usr/local/MECAT/aux_tools/hdf5/include -L/usr/local/MECAT/aux_tools/hdf5/lib -o /usr/local/MECAT/Darwin-amd64/bin/dextract DEXTRACTOR/dextract.c DEXTRACTOR/DB.c DEXTRACTOR/QV.c -lhdf5
ld: can't open output file for writing: /usr/local/MECAT/Darwin-amd64/bin/dextract, errno=2 for architecture x86_64
collect2: error: ld returned 1 exit status
make[1]: *** [dextract] Error 1
make: *** [dextract] Error 2

perl Df.so undefined symbol

Hi,

I am trying to assemble corrected PacBio reads from a human sample and get the following error.

$ mecat2canu -trim-assemble -p simon10X -d SIMON genomeSize=3g ErrorRate=0.02 maxMemory=1500 maxThreads=128 useGrid=0 Overlapper=mecat2asmpw -pacbio-corrected simon_seeds_10X.fasta
perl: symbol lookup error: /data/Bioinfo/bioinfo-proj-jmontenegro/Programs/MECAT/Linux-amd64/bin/lib/canu/lib/perl5/x86_64-linux-thread-multi/auto/Filesys/Df/Df.so: undefined symbol: Perl_Gthr_key_ptr

Do you have any suggestion on what to do to solve it?

Cheers,

Juan Montenegro

problem with ‘failed to adjust overlap error rates’

Hi，when I tried mecat to assemble PacBio reads. The same problem happened again.
The mecat version which I used is v1.2. And the command is as follows.

date
mecat2pw -j 0 -d 00.data/Emu_F_Pacbio_Copenhagen_merge_all.fastq -w wk.dir -t 24 -o emu.fastq.pm.can -n 50 -a 2000 -k 4 -g 0 -x 0
date
mecat2cns -x 0 -i 0 -t 24 -p 100000 -r 0.9 -a 2000 -c 6 -l 7000 emu.fastq.pm.can 00.data/Emu_F_Pacbio_Copenhagen_merge_all.fastq emu.corrected.fa
date
mecat2canu -trim-assemble -d wk_result -p emu errorRate=0.015 genomeSize=1.3g maxMemory=200 maxThreads=24 useGrid=0 Overlapper=mecat2asmpw -pacbio-corrected emu.corrected.fa
date

But when it comes to mecat2canu step. Error appears that as follows. But there is no *.oea file in the directory. Could anyone give me some help. Many thanks.

61 overlap error adjustment jobs failed:
/public/home/lijing/01.sd.project/01.genome/01.assembly/02.mecat/01.emu_copenhagen/wk_result/unitigging/3-overlapErrorAdjustment/0001.oea FAILED.
...
Don't panic, but a mostly harmless error occurred and canu failed.
canu failed with 'failed to adjust overlap error rates. Made 2 attempts, jobs still failed'.

xiaochuanle / mecat Goto Github PK

mecat's Introduction

Contents

Introdction

Installation

Quick Start

Assemblying Pacbio Data

Assemblying Nanopore Data

Input Format

Program Descriptions

mecat2pw

options

output format

memory consumption

mecat2ref

options

output format

memory consumption

mecat2cns

output format

extract_sequences

mecat2canu

Citation

Contact

Update Information

mecat's People

Contributors

Stargazers

Watchers

Forkers

mecat's Issues

File: reads_correction_can.cpp

-- read error detection attempt 2 begins with 476 finished, and 2 to compute.

-- Finished on Sun Jan 15 11:27:06 2017 (401 seconds) with 6292.4 GB free disk space

-- -- 82 overlap error adjustment jobs failed: -- job /public/home/lijing/mecat_emu_copenhagen/emu_50x/unitigging/3-overlapErrorAdjustment/0001.oea FAILED.

Recommend Projects

Recommend Topics

Recommend Org

`mecat2pw`

`mecat2ref`

`mecat2cns`

`extract_sequences`

`mecat2canu`

`Citation`

`Contact`

`Update Information`

--
-- 82 overlap error adjustment jobs failed:
-- job /public/home/lijing/mecat_emu_copenhagen/emu_50x/unitigging/3-overlapErrorAdjustment/0001.oea FAILED.