Giter VIP home page Giter VIP logo

hifiasm-meta's Introduction

(WIP) A hifiasm fork (https://github.com/chhylp123/hifiasm) for metagenome assembly.

Getting Started

# Install hifiasm-meta (g++ and zlib required)
git clone https://github.com/xfengnefx/hifiasm-meta.git
cd hifiasm-meta && make

# Run
hifiasm_meta -t32 -oasm reads.fq.gz 2>asm.log
hifiasm_meta -t32 -S -o asm reads.fq.gz 2>asm.log // if the dataset has high redundancy, or overlap & error correction takes way too long

Output files

Raw unitig graph: asm.r_utg*.gfa

Cleaned unitig graph: asm.p_utg*.gfa

Contig graph: asm.p_ctg*.gfa and asm.a_ctg*.gfa

Unitig/Contig naming: ^s[0-9]+\.[uc]tg[0-9]{6}[lc] where the s[0-9]+ is a subgraph label.

Special Notes

Based on the limited available test data, real datasets are unlikely to require read selection; mock datasets, however, might need it.

Non-release commits may have extra debug outputs for dev/debug purposes, even without -V.

Bin file is one-way compatible with the stable hifiasm for now: stable hifiasm can use hifiasm_meta's bin file, but not vice versa. Meta needs to store extra info from overlap & error correction step.

Switches

See also README_ha.md, the stable hifiasm doc.

#Interface
-B		Name of bin files. Allows to use bin files from other 
       		directories.

# Read selection
-S		Enable read selection.
--force-preovec Force kmer frequency-based read selection. 
                (otherwise if total number of read overlaps 
                 look realistic, won't do selection.)
--lowq-10Lower  10% quantile kmer frequency threshold, runtime. Lower value means less reads kept, if read selection is triggered. [150]

Known issues

Read selection needs to speed up. Currently there's a blocking sequential step.

Preliminary results

Sheep fecal material dataset: wall clock 17.2 h on 48 cpus, peak memory 183.3 GB.

A Bandage plot of primary contig graph:

Evaluation tool Criteria Count Bases or relative ratio
custom contig >1Mb 320 638Mb
contig >100kb 3307 1.30Gb
circular contig >1Mb 147 340Mb
Barrnap contig with all three types of rRNA 1012 643Mb
CheckM genome completeness >90% 177 423Mb
^~ contamintation >5% 2 1.10%
genome completeness >25% 444 706MB
^~ contamintation >5% 5 1.10%
CheckV high quality virus genome* 186 10Mb
prodigal genes predicted 27039 -

Mock community ATCC MSA-1003 (with -S --lowq-10 50)

Strain Abundance Assembly status Strain Abundance Assembly status
Acinetobacter baumannii 0.18% pass Lactobacillus gasseri 0.18% pass
Bacillus cereus 1.80% pass Neisseria meningitidis 0.18% pass
Bacteroides vulgatus 0.02% fragmented Porphyromonas gingivalis 18.00% pass*
Bifidobacterium adolescentis 0.02% lost Pseudomonas aeruginosa 1.80% pass
Clostridium beijerinckii 1.80% pass Rhodobacter sphaeroides 18.00% pass
Cutibacterium acnes 0.18% pass Schaalia odontolytica 0.02% lost
Deinococcus radiodurans 0.02% fragmented Staphylococcus aureus 1.80% pass
Enterococcus faecalis 0.02% fragmented Staphylococcus epidermidis 18.00% pass
Escherichia coli 18.00% pass Streptococcus agalactiae 1.80% pass*
Helicobacter pylori 0.18% pass Streptococcus mutans 18.00% pass

*: not one clean circle.

Mock community Zymo D6331, standard input library: wall clock 15.7 h on 32 cpus, peak memory 121.7 GB.

Strains Abundance Assembly status Strains Abundance Assembly status
Akkermansia muciniphila 1.36% pass Escherichia coli JM109 8.37% unseparated*
Bacteroides fragilis 13.13% pass Faecalibacterium prausnitzii 14.39% pass
Bifidobacterium adolescentis 1.34% pass Fusobacterium nucleatum 3.78% pass
Candida albican 1.61% fragmented Lactobacillus fermentum 0.86% pass
Clostridioides difficile 1.83% pass Methanobrevibacter smithii 0.04% 3contigs
Clostridium perfringens 0.00% lost Prevotella corporis 5.37% partial
Enterococcus faecalis 0.00% lost Roseburia hominis 3.88% pass
Escherichia coli B1109 8.44% unseparated* Saccharomyces cerevisiae 0.18% fragmented
Escherichia coli b2207 8.32% pass Salmonella enterica 0.02% unseparated*
Escherichia coli B3008 8.25% unseparated* Veillonella rogosae 11.02% pass
Escherichia coli B766 7.83% pass

*: E.coli strains except B2207 and B766 presented in one subgraph.

hifiasm-meta's People

Contributors

chhylp123 avatar lh3 avatar xfengnefx avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.