yebadi / dentas Goto Github PK

DeNTAS: De Novo Transcriptome Analysis and Statistics - a user-friendly web-app.

Python 14.65% R 10.23% HTML 41.72% CSS 0.75% JavaScript 27.36% Shell 5.29%

analysis statistics web-app bioinformatics

dentas's Introduction

DeNTAS :De Novo Transcriptome Analysis & Statistics

This file contains a basic description of DeNTAS' analysis pipeline, details of the softwares requirements and user instructions to run the program from a local machine. Please see the DeNTAS's documentation for further information <link here?>

DeNTAS Summary

DeNTAS is a software tool for the statistical analysis and visualisation of Transcriptome datasets generated by de novo assemblies of RNA-seq datasets.

User input:

Raw assembled transcripts in FASTA format with FPKM values
Organism that samples were derived from
Experimental groups from the samples

Analysis:

Identification of transcripts via local blast on Apocrita (QMUL's high performace computing cluster)
Differential gene expression analysis and statistics conducted in R using the limma package

Results returned:

Unsupervised data exploration: Principla component and hierachical clustering (dendrograms) plots both for the full set of genes and for the subset of genes determined to be significantly differntially expressed
Volcano plots depicting each experimental comparrison. Those genes that have a logFoldChange > 2 and an adjusted p-value < 0.05 are highleted in red and labelled
Heatmap depicting the top 100 differntially regulated genes
Interactive table listing all differential regulated genes, by refseq ID, gene symbol, full gene name and showing their e-value for each experimental comparison

Setting up the local server

DeNTAS harnesses the power of high performance cluster computing to increase the speed of analysis. Therefore execution of the software on a local machine has the following requirements the user must:

1) Have secure access to Apocrita without the need for a pasword

>> ssh-keygen (hit enter 3 times)

>> ssh-copy-id <username>@login.hpc.qmul.ac.uk (input your apocrita password)

>> ssh <username>@login.hpc.qmul.ac.uk (test connectivity)

2) Utilise this secure access to enable direct connection to the remote-server, Apocrita, when the app is run.

This needs a one-time setup on the user device. Once the user has obtained the app folder, they must make the specific change of the code text "" to reflect their own apocrita username to all of the script files found in the "> app/scripts/apocrita" directory.

This can be performed with ease through an appropriate text editor which is capable of detecting multiple instances of the word "" and enables the user to multi-word edit each instance (for example, 'Sublime' offers this functionality).

Running DeNTAS

Now that the local user machine has been set up as a local server with direct connection to the remote server (apocrita), the user can execute the app through the terminal by going into the app's directory and using the following command:

python app.py

How to Use DeNTAS Well

The user should submit appropriate de novo transcriptome data with the known FPKM values in order for DeNTAS to function effectively.

Contact Us for Troubleshooting

dentas's People

Contributors

Watchers

dentas's Issues

Widen Species Selection

Instead of it being limited to just Pteropus Alecto, we will give the user a choice of species from a drop down list: they can pick Pteropus Alecto or Mice or Human (Homo sapiens).

I will try and set up each major DB
We need a FLASK incorperated dropdown list
Once the user picks the species we then need to specify that THIS is the db we will carry out the BLAST against.

Documentation - we can use github / pandoc

Hi all,
google pandoc
Install pandoc.
Edit your Word document as needed.
Run pandoc from the linux or Windows command line. ...
Update the ChangeLog.
Commit both files with git git add file.docx file.md git commit

we can use this to collectively write the documentation :)

OR a group google doc I don't mind!

Suggestion to overcome groups issue

Could have raw_input from user for number of columns and use this object throughout so it overcomes the 3 groups barrier? Alternatively if this proves to be too difficult; to meet the deadline we can have a set of hardcoded R scripts for various group numbers and call the appropriate script according to the number of groups the end-user picks in FLASK

Flagged duplicates issue in module.py

See comments in module.py: the duplicates are being deleted without directly considering the reason as to why: the evalues are all the highest for each respective duplicate but this is another layer of optimising and may be supererogatory.

Multi user functionality

In order to enable this, we can incorporate the intrinsic session ID present in FLASK.

We just need to assign " user session " ID to each of the the files uploaded and carry this through each step of the temp files generated in the analysis etc.

Blast - to do list

Convert code to run using Biopython rather than exporting commands to the os.system? - this would be the more correct way of doing things
Decide on an approach to manage the issue of Transcript to gene ID mapping redundancy
Incorporate an E-value threshold into our analysis
Discuss with Adrian the possibility of running blast remotely on Apocrita - If we can't we'll have to run it locally on a laptop & reduce the size of the input data
Tackle the issue of transcripts not identified by Blast but present in multiple samples/at high FPKM?

Flask/html - to do list

Get the user to input the number of experimental groups and their names
User login -> we can then email a pdf of the results to the user on completion of analysis
make some sort of loading page/giff the show whilst the results are being computed
how to manage multiple users?

R - to do list

make the script fully soft-coded
derive names for graphs/variables etc from myArgs group list
make work with different number of input groups?
introduce new functionality? perhaps gene ontology visualisations

Improving the Apocrita functionality

Hi,

I'm going to implement some changes to imrpove the functionality with apocrita; but James as we're using your apocrita access only you can change the script thats being called within apocrita.

for the BLAST job i've increased the number of threads to be in line with what's being asked for in apocrita -> to 4 threads.

Furthermore, the default for qsub jobs is 1 core,
I was reviewing the apocrita guide (http://docs.hpc.qmul.ac.uk/using/): you need to add this line to the script:

#$ -pe smp 4       # Request 4 CPU cores

and you can lower the RAM per core to 1 GB instead of 3 ( so we get 1 GB per core)

Oh yeah, the BLAST OS workaround is totally fine, but I would suggest maybe rescripting the FPKM implemntation part after the BLAST section to run in python and not use the cmd functions.