Hi,
I am assembling a genome of 150 MB using 2x150 illumina reads (~70x Coverage) and Pacbio reads with 25 x coverage.
When using SparseAssembler i use the following command:
./SparseAssembler g 15 k 99 LD 0 GS 310000000 NodeCovTh 1 EdgeCovTh 0 i1 ./lumina/lili/LILI_R1_001.fastq i2 ./illumina/lili/LILI_R2_001.fastq
The SparseAssembler step finishes but results in a very small contigs, i tried different parameters and a high k gives me higher but still small contigs:
stats for Contigs.txt
sum = 307302645, n = 1810137, ave = 169.77, largest = 55932
N50 = 170, n = 434542
When proceeding and using DBG2OLC with the following command:
./BG2OLC k 17 AdaptiveTh 0.0001 KmerCovTh 2 MinOverlap 25 RemoveChimera 1 Contigs ./data/sparseAssembler/Contigs.txt f ./pacbio/LILI_cell1_m54278_180927_190653.subreads.fasta
I get the following std_out:
Example command:
For third-gen sequencing: DBG2OLC LD1 0 Contigs contig.fa k 17 KmerCovTh 2 MinOverlap 20 AdaptiveTh 0.005 f reads_file1.fq/fa f reads_file2.fq/fa
For sec-gen sequencing: DBG2OLC LD1 0 Contigs contig.fa k 31 KmerCovTh 0 MinOverlap 50 PathCovTh 1 f reads_file1.fq/fa f reads_file2.fq/fa
Parameters:
MinLen: min read length for a read to be used.
Contigs: contig file to be used.
k: k-mer size.
LD: load compressed reads information. You can set to 1 if you have run the algorithm for one round and just want to fine tune the following parameters.
PARAMETERS THAT ARE CRITICAL FOR THE PERFORMANCE:
If you have high coverage, set large values to these parameters.
KmerCovTh: k-mer matching threshold for each solid contig. (suggest 2-10)
MinOverlap: min matching k-mers for each two reads. (suggest 10-150)
AdaptiveTh: [Specific for third-gen sequencing] adaptive k-mer threshold for each solid contig. (suggest 0.001-0.02)
PathCovTh: [Specific for Illumina sequencing] occurence threshold for a compressed read. (suggest 1-3)
Author: Chengxi Ye [email protected].
last update: Jun 11, 2015.
Loading contigs.
80213454 k-mers in round 1.
41789206 k-mers in round 2.
Analyzing reads...
File1: /home/ls752/genomes/data/raw_data/pacbio/LILI_cell1_m54278_180927_190653.subreads.fasta
Long reads indexed.
Total Kmers: 0
Matching Unique Kmers: 0
Compression time: 0 secs.
Scoring method: 3
Match method: 2
Loading long read index
Loading file: ReadsInfoFrom_LILI_cell1_m54278_180927_190653.subreads.fasta
0 reads loaded.
And the following std_err:
/var/spool/slurmd/job32968/slurm_script: line 15: 44064 Floating point exception(core dumped) ./DBG2OLC k 17 AdaptiveTh 0.0001 KmerCovTh 2 MinOverlap 25 RemoveChimera 1 Contigs ./sparseAssembler/Contigs.txt f ./pacbio/LILI_cell1_m54278_180927_190653.subreads.fasta
Am i overlooking something? Do you suggest any changes to my parameters to increase the contig length?
Many thanks for creating and maintaining a hybrid assembler,
Bests,
L