(WIP) A hifiasm fork (https://github.com/chhylp123/hifiasm) for metagenome assembly.
# Install hifiasm-meta (g++ and zlib required)
git clone https://github.com/xfengnefx/hifiasm-meta.git
cd hifiasm-meta && make
# Run
hifiasm_meta -t32 -oasm reads.fq.gz 2>asm.log
hifiasm_meta -t32 -S -o asm reads.fq.gz 2>asm.log // if the dataset has high redundancy, or overlap & error correction takes way too long
Raw unitig graph: asm.r_utg*.gfa
Cleaned unitig graph: asm.p_utg*.gfa
Contig graph: asm.p_ctg*.gfa and asm.a_ctg*.gfa
Unitig/Contig naming: ^s[0-9]+\.[uc]tg[0-9]{6}[lc]
where the s[0-9]+
is a subgraph label.
Based on the limited available test data, real datasets are unlikely to require read selection; mock datasets, however, might need it.
Non-release commits may have extra debug outputs for dev/debug purposes, even without -V.
Bin file is one-way compatible with the stable hifiasm for now: stable hifiasm can use hifiasm_meta's bin file, but not vice versa. Meta needs to store extra info from overlap & error correction step.
See also README_ha.md, the stable hifiasm doc.
#Interface
-B Name of bin files. Allows to use bin files from other
directories.
# Read selection
-S Enable read selection.
--force-preovec Force kmer frequency-based read selection.
(otherwise if total number of read overlaps
look realistic, won't do selection.)
--lowq-10Lower 10% quantile kmer frequency threshold, runtime. Lower value means less reads kept, if read selection is triggered. [150]
Read selection needs to speed up. Currently there's a blocking sequential step.
Sheep fecal material dataset: wall clock 17.2 h on 48 cpus, peak memory 183.3 GB.
A Bandage plot of primary contig graph:
Evaluation tool | Criteria | Count | Bases or relative ratio |
---|---|---|---|
custom | contig >1Mb | 320 | 638Mb |
contig >100kb | 3307 | 1.30Gb | |
circular contig >1Mb | 147 | 340Mb | |
Barrnap | contig with all three types of rRNA | 1012 | 643Mb |
CheckM | genome completeness >90% | 177 | 423Mb |
^~ contamintation >5% | 2 | 1.10% | |
genome completeness >25% | 444 | 706MB | |
^~ contamintation >5% | 5 | 1.10% | |
CheckV | high quality virus genome* | 186 | 10Mb |
prodigal | genes predicted | 27039 | - |
Mock community ATCC MSA-1003 (with -S --lowq-10 50)
Strain | Abundance | Assembly status | Strain | Abundance | Assembly status | |
---|---|---|---|---|---|---|
Acinetobacter baumannii | 0.18% | pass | Lactobacillus gasseri | 0.18% | pass | |
Bacillus cereus | 1.80% | pass | Neisseria meningitidis | 0.18% | pass | |
Bacteroides vulgatus | 0.02% | fragmented | Porphyromonas gingivalis | 18.00% | pass* | |
Bifidobacterium adolescentis | 0.02% | lost | Pseudomonas aeruginosa | 1.80% | pass | |
Clostridium beijerinckii | 1.80% | pass | Rhodobacter sphaeroides | 18.00% | pass | |
Cutibacterium acnes | 0.18% | pass | Schaalia odontolytica | 0.02% | lost | |
Deinococcus radiodurans | 0.02% | fragmented | Staphylococcus aureus | 1.80% | pass | |
Enterococcus faecalis | 0.02% | fragmented | Staphylococcus epidermidis | 18.00% | pass | |
Escherichia coli | 18.00% | pass | Streptococcus agalactiae | 1.80% | pass* | |
Helicobacter pylori | 0.18% | pass | Streptococcus mutans | 18.00% | pass |
*: not one clean circle.
Mock community Zymo D6331, standard input library: wall clock 15.7 h on 32 cpus, peak memory 121.7 GB.
Strains | Abundance | Assembly status | Strains | Abundance | Assembly status | |
---|---|---|---|---|---|---|
Akkermansia muciniphila | 1.36% | pass | Escherichia coli JM109 | 8.37% | unseparated* | |
Bacteroides fragilis | 13.13% | pass | Faecalibacterium prausnitzii | 14.39% | pass | |
Bifidobacterium adolescentis | 1.34% | pass | Fusobacterium nucleatum | 3.78% | pass | |
Candida albican | 1.61% | fragmented | Lactobacillus fermentum | 0.86% | pass | |
Clostridioides difficile | 1.83% | pass | Methanobrevibacter smithii | 0.04% | 3contigs | |
Clostridium perfringens | 0.00% | lost | Prevotella corporis | 5.37% | partial | |
Enterococcus faecalis | 0.00% | lost | Roseburia hominis | 3.88% | pass | |
Escherichia coli B1109 | 8.44% | unseparated* | Saccharomyces cerevisiae | 0.18% | fragmented | |
Escherichia coli b2207 | 8.32% | pass | Salmonella enterica | 0.02% | unseparated* | |
Escherichia coli B3008 | 8.25% | unseparated* | Veillonella rogosae | 11.02% | pass | |
Escherichia coli B766 | 7.83% | pass |
*: E.coli strains except B2207 and B766 presented in one subgraph.