GOWSH

Perl homology searcher based on webscrapping and heuristic approaches. It's supposed to look up in HomoloGene, Ensemble and Inparanoid after running Bidirectional best hit algorithm (BDBH).

Getting Started

Clone the repo on local:

git clone https://github.com/carrascomj/gowsh

Add script to path (on your bash initialization file; e.g., ~/bashrc):

export PATH=$PATH:"path/to/gowsh/bin"

The program requires additional packages that can be installed with cpanm, if not already done:

cpanm JSON Data::Dumper Bio::SeqIO LWP::Simple File::Basename Getopt::Long XML::Parser

Alternatively, one could install WebAPIsGOWSH as an usual perl package (on 'gowsh/' directory):

perl Makefile.PL
make
make install

Finally, formatdb and blast+ are both required.

Usage

gowsh.pl is the main script. The program takes command-line arguments with the following options:

gowsh.pl --gfile|go|glist "path_to_file|GOid|list" --tfile|torg "path_to_file|organism"
    [--modelf|modelo] "path_to_file|organism" --out "outfile" --preserve

    --gfile path_to_file: input, genes as multiFASTA
    --go GOid: input, Genetic Ontology ID (as in AmiGO)
    --glist list: input, blank separated list gene IDs
    --tfile path_to_file: multiFASTA containing proteins of genome of target organism
    --torg organism: target organism name (genus and specie)
    --modfile path_to_file: optional, multiFASTA containing proteins of genome of model organism
    --modorg organism: optional, model organism name (genus and specie)
    --out "outfile": optional, name of output file; default "GOWSH_output.txt"
    --preserve: optional, if it's added, (nearly) all files generated will be preserved.

Running the test

The script can be tested wit the following command:

gowsh.pl --go 0048507 --modorg "arabidopsis thaliana" --torg "oryza sativa"

You can compare the output with the file "t/GOWSH_outputq1.tsv".

The program will then parse the input file, download both genomes from NCBI and try to match homologues.

What I Learned

This code was developed as a project for one subjects of my BSc in Biotechnology (UPM). To sum up, I learned the following concepts:

Webscrapping biological information using Perl and mygene API.
Use of Entrez E-utilities programmatic access API from NCBI.
Use of Ensembl REST API.
Run BLAST on local using blast+.
Heuristic algorithms to account for homology.
How to build a Perl package.
How to write a README.md.

carrascomj / gowsh Goto Github PK

gowsh's Introduction

GOWSH

Getting Started

Usage

Running the test

What I Learned

gowsh's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent