Giter VIP home page Giter VIP logo

progprotpy's Introduction

ProgProt (Programming proteins with python)

This repository contains scripts and programs that were initially used for the BIO314 course at IISER Pune. Some scripts were assignments whereas most of the other ones were written solely for the joy of writing and discovery. Some important concepts covered are:

Biological Sequence Alignment

  1. Global Sequence Alignment
  • You can find the implementation of the Needleman-Wunsch algorithm to align two Protein or Nucleotides sequences at global_dp.py.
  • We use a dynamic programming approach split into three subsections : {Initialization, Computation and Traceback} to find the best Sequence alignment.
  • We also show an approach with affine gaps in global_dp_affine.py
  1. Local Sequence Alignment
  • You can find the implementation of the Smith-Waterman algorithm for the local alignment of two sequences at local_dp.py.
  • We follow a similar approach as (1) but show all possible local alignments that arise primarily due to multiple maxima while computing the scores matrix
  • We show an approach with affine gaps in local_dp_affine.py

Comparison between Smith-Waterman and Needleman-Wunsch :

Property Smith-Waterman algorithm Needleman-Wunsh algorithm
Initialization First row and first column are set to 0 First row and first column are subject to gap penalty (affine, linear etc)
Scoring Negative score is set to 0 Score can be negative
Traceback Begin with the highest score, end when 0 is encountered Begin with the cell at the lower right of the matrix, end at top left cell

Next Generation Sequencing (NGS)

  • The advent of NGS techniques has led to wonderful application of the Burrows–Wheeler transformation. In NGS, DNA is fragmented into small pieces, of which the first few bases are sequenced, yielding several millions of "reads", each 30 to 500 base pairs ("DNA characters") long
  • In an effort to reduce the memory requirement for sequence alignment, we use BWT as a data-compression-algorithm.

Hidden Markov Models (HMM)

  • Evaluation (Forward Algorithm)
  • Decoding (Vitterbi Algorithm)
  • Learning (Baum-Welch Algorithm or the Forward-Backward algorithm) (TBA)

How to run these scripts locally ?

  1. Clone this Github repository using

git clone https://github.com/Anantha-Rao12/Bioinformatics-BIO314

  1. Create a Python (>=3.2) virtual environemnt and call it 'bioinfo-BIO314'.

    • Linux/Mac: python3 -m venv bioinfo-BIO314
    • Windows: python -m venv bioinfo-BIO314
    • A new directory called "bioinfo-BIO314" will be created.
  2. Activate the Virtual Environment by running the following.

    • Linux/Mac: source bioinfo-BIO314/bin/activate
    • Windows: .\bioinfo-BIO314\Scripts\activate
  3. In the new virtual environemnt , run pip3 install -r requirements.txt to install all dependencies. On Windows, pip3 will be replaced by pip.

  4. Run python3 global_dp.py for Global Alignment or run python3 local_dp.py for Local Alignment.

References :

  1. Sean, R. (2004). EDDY: What is dynamic programming. Nature biotechnology, 22(7), 909-910.
  2. Durbin, R. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Illustrated ed.). Cambridge University Press.

progprotpy's People

Contributors

anantha-rao12 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.