Giter VIP home page Giter VIP logo

dentas's Introduction

DeNTAS :De Novo Transcriptome Analysis & Statistics

This file contains a basic description of DeNTAS' analysis pipeline, details of the softwares requirements and user instructions to run the program from a local machine. Please see the DeNTAS's documentation for further information <link here?>

DeNTAS Summary

DeNTAS is a software tool for the statistical analysis and visualisation of Transcriptome datasets generated by de novo assemblies of RNA-seq datasets.

User input:

  1. Raw assembled transcripts in FASTA format with FPKM values
  2. Organism that samples were derived from
  3. Experimental groups from the samples

Analysis:

  1. Identification of transcripts via local blast on Apocrita (QMUL's high performace computing cluster)
  2. Differential gene expression analysis and statistics conducted in R using the limma package

Results returned:

  1. Unsupervised data exploration: Principla component and hierachical clustering (dendrograms) plots both for the full set of genes and for the subset of genes determined to be significantly differntially expressed
  2. Volcano plots depicting each experimental comparrison. Those genes that have a logFoldChange > 2 and an adjusted p-value < 0.05 are highleted in red and labelled
  3. Heatmap depicting the top 100 differntially regulated genes
  4. Interactive table listing all differential regulated genes, by refseq ID, gene symbol, full gene name and showing their e-value for each experimental comparison

Setting up the local server

DeNTAS harnesses the power of high performance cluster computing to increase the speed of analysis. Therefore execution of the software on a local machine has the following requirements the user must:

1) Have secure access to Apocrita without the need for a pasword

>> ssh-keygen (hit enter 3 times)
>> ssh-copy-id <username>@login.hpc.qmul.ac.uk (input your apocrita password)
>> ssh <username>@login.hpc.qmul.ac.uk (test connectivity)

2) Utilise this secure access to enable direct connection to the remote-server, Apocrita, when the app is run.

This needs a one-time setup on the user device. Once the user has obtained the app folder, they must make the specific change of the code text "" to reflect their own apocrita username to all of the script files found in the "> app/scripts/apocrita" directory.

This can be performed with ease through an appropriate text editor which is capable of detecting multiple instances of the word "" and enables the user to multi-word edit each instance (for example, 'Sublime' offers this functionality).

Running DeNTAS

Now that the local user machine has been set up as a local server with direct connection to the remote server (apocrita), the user can execute the app through the terminal by going into the app's directory and using the following command:

python app.py

How to Use DeNTAS Well

The user should submit appropriate de novo transcriptome data with the known FPKM values in order for DeNTAS to function effectively.

Contact Us for Troubleshooting

dentas's People

Contributors

jg-nicholson avatar yebadi avatar kristina2345 avatar daniavicente avatar

Watchers

James Cloos avatar Nazrath avatar

dentas's Issues

Widen Species Selection

Instead of it being limited to just Pteropus Alecto, we will give the user a choice of species from a drop down list: they can pick Pteropus Alecto or Mice or Human (Homo sapiens).

  1. I will try and set up each major DB
  2. We need a FLASK incorperated dropdown list
  3. Once the user picks the species we then need to specify that THIS is the db we will carry out the BLAST against.

Documentation - we can use github / pandoc

Hi all,
google pandoc
Install pandoc.
Edit your Word document as needed.
Run pandoc from the linux or Windows command line. ...
Update the ChangeLog.
Commit both files with git git add file.docx file.md git commit

we can use this to collectively write the documentation :)

OR a group google doc I don't mind!

Suggestion to overcome groups issue

Could have raw_input from user for number of columns and use this object throughout so it overcomes the 3 groups barrier? Alternatively if this proves to be too difficult; to meet the deadline we can have a set of hardcoded R scripts for various group numbers and call the appropriate script according to the number of groups the end-user picks in FLASK

Flagged duplicates issue in module.py

See comments in module.py: the duplicates are being deleted without directly considering the reason as to why: the evalues are all the highest for each respective duplicate but this is another layer of optimising and may be supererogatory.

Multi user functionality

In order to enable this, we can incorporate the intrinsic session ID present in FLASK.

We just need to assign " user session " ID to each of the the files uploaded and carry this through each step of the temp files generated in the analysis etc.

Blast - to do list

  • Convert code to run using Biopython rather than exporting commands to the os.system? - this would be the more correct way of doing things
  • Decide on an approach to manage the issue of Transcript to gene ID mapping redundancy
  • Incorporate an E-value threshold into our analysis
  • Discuss with Adrian the possibility of running blast remotely on Apocrita - If we can't we'll have to run it locally on a laptop & reduce the size of the input data
  • Tackle the issue of transcripts not identified by Blast but present in multiple samples/at high FPKM?

Flask/html - to do list

  • Get the user to input the number of experimental groups and their names
  • User login -> we can then email a pdf of the results to the user on completion of analysis
  • make some sort of loading page/giff the show whilst the results are being computed
  • how to manage multiple users?

R - to do list

  • make the script fully soft-coded
  • derive names for graphs/variables etc from myArgs group list
  • make work with different number of input groups?
  • introduce new functionality? perhaps gene ontology visualisations

Improving the Apocrita functionality

Hi,

I'm going to implement some changes to imrpove the functionality with apocrita; but James as we're using your apocrita access only you can change the script thats being called within apocrita.

for the BLAST job i've increased the number of threads to be in line with what's being asked for in apocrita -> to 4 threads.

Furthermore, the default for qsub jobs is 1 core,
I was reviewing the apocrita guide (http://docs.hpc.qmul.ac.uk/using/): you need to add this line to the script:

#$ -pe smp 4       # Request 4 CPU cores

and you can lower the RAM per core to 1 GB instead of 3 ( so we get 1 GB per core)

Oh yeah, the BLAST OS workaround is totally fine, but I would suggest maybe rescripting the FPKM implemntation part after the BLAST section to run in python and not use the cmd functions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.