Giter VIP home page Giter VIP logo

motiffinder's Introduction

Motif Finder

Welcome to Motif Finder! This is a command line utility that allows you to take a FASTA file, specify a few parameters, and (hopefully) get some motifs prevalent in the sequences.

Installation

If you have the Rust toolchain installed, you can install motif_finder with:

cargo install motif_finder

If you do not have the Rust toolchain installed, you can install it here.

If you don't want to install it, you can also use the precompiled binaries in the releases tab on the right for your platform

If your platform isn't included, you can build it for your platform by cloning this repository:

git clone https://github.com/nithishbn/MotifFinder.git

and running

cargo build --release

in the source directory. This will leave an executable in the target/release/ directory which you can then run in the command line: motif_finder

Data format

This tool technically accepts all FASTA files, but the way it's meant to be used is to use an interesting approach in motif finding.

RNASeq

By using RNASeq data and aligning it back to a reference genome, we can identify the alignment sites of transcripts. Using these alignment sites, we can generate the set of sequences x bp upstream of the site in which to look for motifs, specifically for transcription factor binding sites.

This method involves finding an organism with RNASeq data, a reference genome, and a few bioinformatics tools including samtools, bamtools, and bedtools.

Gene Start Sites

The same approach could be derived from gene annotation files which allows you to identify the same upstream sites. By compiling the x bp upstream sequence of gene start sites for known genomes, we can similarly generate a set of sequences in which to look for motifs.

Examples

You can try to find the motifs present in promoters.fasta, a set of 4 promoters known in P. tricornutum, a relatively unknown diatom species.

De novo

Gibbs Sampler

Gibbs Sampler is an algorithm that iteratively searches for the best set of motifs in a set of sequences and throws out motifs at random until all iterations are finished.

motif_finder promoters.fasta -e 4 -k 10 -o promotifs.txt gibbs -t 100 -r 100

Randomized Motif Search

Randomized Motif Search is an algorithm that iteratively searches for the best set of motifs in a set of sequences and throws out motifs at random until the score cannot be improved anymore.

motif_finder promoters.fasta -e 4 -k 10 -o promotifs.txt randomized -r 100

Median String

Median String is an algorithm that checks the hamming distance from each kmer from each sequence and returns the minimized kmer from all strings. This algorithm is incredibly slow but can result in very accurate but short kmers. Be warned when using large k values.

motif_finder promoters.fasta -e 4 -k 8 -o promotifs.txt median

Find Motifs

Find Motif takes in an existing motif, an edit distance i.e. the max distance between motif and the sequence, and finds the positions throughout the entire input file where this match occurs. It will print the matches to the console.

motif_finder promoters.fasta -e 4 find_motif CTCAGCG 0 --quiet

Alignment

If you wish to align the motifs you've generated back to the sequences from which they were generated to identify the highest locally scored motif over all sequences, you can run the same commands as above but with the -a flag

motif_finder promoters.fasta -e 4 -k 8 -a -o promotifs.txt randomized -r 100

This will generate alignments for the motifs after identifying the motifs.

Other flags

verbosity - set verbosity with the --quiet or --verbose flags. --quiet offers some performance improvements in large input files and k values.

motiffinder's People

Contributors

nithishbn avatar

Stargazers

MkvO avatar  avatar

Watchers

 avatar

motiffinder's Issues

Readme.md is confusing

It seems like the Readme.md does not contain proper instructions for how to use this script. As a result I bricked my linux (I use arch btw), setup. This is so sad!

Readme.md es confuso

Parece que el Readme.md no contiene instrucciones adecuadas sobre cómo usar este script. Como resultado, bloqueé mi configuración de Linux (uso arch btw). ¡Esto es tan triste!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.