Giter VIP home page Giter VIP logo

classico's Introduction

CLASSICO

CLASSICO is a tool for propagating SNPs along a Newick tree. The tool identifies branches that lead to root nodes of monophyletic, paraphyletic and polyphyletic clades of SNPs based on the distribution of a SNP table. The algorithm first distributes the SNPs within the reconstructed clades with Fitch's algorithm [1], however, nodes that would be randomly labeled with Fitch, because multiple bases are possible for this node, are left ambiguous. Unresolved bases are only propagated as long as all children of a node are labeled with an unresolved base. Then, the branches that lead to root nodes of monophyletic, paraphyletic and polyphyletic clades are computed and saved. Besides that, statistics of the allele count for each clade type and the SNP count of each clade type are returned.

Additionally, CLASSICO provides the option to resolve unresolved bases based on close nodes of the unresolved base in the phylogenetic tree, the so-called neighborhood of a node. Three different methods that define the neighborhood are implemented, i.e. the only-parent, parent-sibling and cladewise method. The methods extend the neighborhood iteratively until the depth of the neighborhood equals the specified relative maximum depth parameter. For each base a score is computed where nodes that are closer to the unresolved base are weighted more than nodes that are further apart. The base with the highest score is the resolved base for the entire clade of unresolved bases, if two or more options have the same score the base remains unresolved. After the resolution, CLASSICO propagates the SNPs and computes the phylogenetic clades again.

Flags

Required:

  • snptable: Path to SNP Table (see example file Data/mini_snp.tsv)
  • nwk: Path to Newick file (see example file Data/mini_nwk.nwk)
  • out: Output direcotry

Optional:

  • clades: Set of phylogenetic clades that should be computed i.e. monophyletic, paraphyletic and polyphyletic clades (default: mono, para, poly)
  • resolve: Specifies whether unresolved bases should be resolved (default: false)
  • method: neighborhood extension method i.e. only-parent, parent-sibling, cladewise method (default: only-parent)
  • relmaxdepth: relative maximum depth, value in range 0-1 (default: 0.2)
  • help: prints the help menu

Output files

Standard:

  • mono.txt: List of monophyletic roots
  • para.txt: List of paraphyletic roots
  • poly.txt: List of polyphyletic roots
  • IDdistribution.txt: Distribution of internal IDs to taxa labels
  • Statistics.txt: Allele and SNP statistics

Additional output if resolution specified:

  • [FilenameSNPTable]_resolved.tsv: SNP table after resolution
  • mono_resolved.txt: List of monophyletic roots after resolution
  • para_resolved.txt: List of paraphyletic roots after resolution
  • poly_resolved.txt: List of polyphyletic roots after resolution
  • Statistics_resolved.txt: Allele and SNP statistics after resolution

Compilation

The .jar file was built using Java version 17.0.5. One can build the tool for other Java versions using the following commands:

cd src

javac -cp ../lib/jcommander-1.82.jar *.java

jar cvfm classicoV2.jar META-INF/MANIFEST.MF *

  • Note: The compiled version requires the library jcommander to be found in the expected directory (../lib/).

Running jar

Simple Example:

java -jar src/classicoV2.jar --snptable Data/mini_snp.tsv --nwk Data/mini_nwk.nwk --out Data

Advanced example with resolution of unresolved bases:

java -jar src/classicoV2.jar --snptable Data/mini_snp.tsv --nwk Data/mini_nwk.nwk --out Data --resolve --method cladewise --relmaxdepth 0.5

Repository structure

The source code and a compiled .jar file are in the src directory. The lib directory contains the jcommander framework (https://jcommander.org), that was used for parsing the input parameters. In the Analysis directory the scripts and resulting plots of all analyses are stored. The Data directory contains a mini example, the validation dataset, the Mycobacterium leprae and Treponema pallidum datasets as well as the additional Mycobacterium leprae dataset used for runtime and memory analysis. Further, the outputs are stored in the Data directory.

References

[1] Walter M Fitch. Toward defining the course of evolution: minimum change for a specific tree topology. Systematic Biology, 20(4):406โ€“416, 1971.

classico's People

Contributors

neelie01 avatar mwittep avatar thharbig avatar

Watchers

Kay Nieselt avatar  avatar  avatar  avatar

classico's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.