Giter VIP home page Giter VIP logo

seqalign's Introduction

This repository contains my solutions to three sequence alignment problems: pairwise, triple, and multiple sequence alignment. All three programs require the input sequences to be in the FASTA format and output the resulting alignment in the FASTA format. In all three algorithms, indel rates are converted to gap penalties by simply taking the log-2 of the specified indel rate. Also, I do not use affine gap penalties. The algorithms currently only use the BLOSUM62 matrix for (mis)match scores, but this may be expanded in the future.

  • pairSeqAlign: Perform pairwise sequence alignment

    • This is simply an implementation of the Needleman-Wunsch algorithm (exact solution)
    • Usage: pairSeqAlign.py [-h] [-i INPUT] [-o OUTPUT] [-p INDEL]
      • If no input file is specified (via -i), the program reads from standard input
      • If no output file is specified (via -o), the program writes to standard output
      • If no indel rate is specified, a default value of 0.1 is used (which corresponds to a gap penalty of -1)
    • Run ./pairSeqAlign.py -h or ./pairSeqAlign.py --help for usage help
  • tripleSeqAlign: Perform triple sequence alignment

    • This is simply a three-dimensional extension of the Needleman-Wunsch algorithm (exact solution)
    • Usage: tripleSeqAlign.py [-h] [-i INPUT] [-o OUTPUT] [-p INDEL]
      • If no input file is specified (via -i), the program reads from standard input
      • If no output file is specified (via -o), the program writes to standard output
      • If no indel rate is specified, a default value of 0.1 is used (which corresponds to a gap penalty of -1)
    • Run ./tripleSeqAlign.py -h or ./tripleSeqAlign.py --help for usage help
  • multiSeqAlign: Perform multiple sequence alignment

    • This is a progressive alignment heuristic (inexact solution)
    • First, a distance matrix is computed from the sequence data
      • If a tree is provided (via -t), the distance matrix is computed from the tree
      • If a tree is not provided, the distance matrix is computed using the resulting scores of all n(n-1) pairwise sequence alignments
    • The initial alignment is constructed from the three closest sequences using tripleSeqAlign
    • One-by-one, additional sequences are added to the alignment in order of distance from the initial three sequences using a Needleman-Wunsch-like algorithm I created to align a single sequence with an existing multiple sequence alignment
    • Usage: multiSeqAlign.py [-h] [-i INPUT] [-o OUTPUT] [-p INDEL] [-t TREE]
      • If no input file is specified (via -i), the program reads from standard input
      • If no output file is specified (via -o), the program writes to standard output
      • If no indel rate is specified, a default value of 0.1 is used (which corresponds to a gap penalty of -1)
    • Run ./tripleSeqAlign.py -h or ./tripleSeqAlign.py --help for usage help

REQUIREMENTS

Currently, both pairSeqAlign and tripleSeqAlign are fully-functional without any additional dependencies. If no tree is specified in multiSeqAlign, it is also fully-functional without any additional dependencies, but if you wish to specify a tree file (which allows for a more efficient distance matrix computation), DendroPy must be installed.

seqalign's People

Contributors

niemasd avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.