refinery-platform / refinery-platform Goto Github PK

The Refinery Platform is a data management, analysis and visualization system for bioinformatics and computational biology applications. The platforms consists of three major components: a data repository with rich metadata capabilities, a workflow engine based on the popular Galaxy system, and visualization tools to support the exploration and interpretation of results at all stages of the analysis process.

Home Page: http://www.refinery-platform.org

License: Other

Python 42.63% HTML 11.82% CSS 3.50% JavaScript 38.48% XSLT 1.41% Ruby 0.01% Puppet 0.75% Shell 0.25% HCL 1.15%

refinery-platform gehlenborglab galaxy-project refinery

refinery-platform's Introduction

Refinery Platform

Additional information about how to administer and develop Refinery can be found in the wiki
Production deployments require access to Amazon Web Services
Refinery supports the latest version of Chrome (Linux and OS X), Firefox (Linux and OS X), and Safari (OS X)

Installing and Launching for Development

Prerequisites

Install Git (2.19.0+), Vagrant (2.2.0+) and Virtualbox (5.2.20+)
Add SSH key to your GitHub account
Note: this procedure has only been tested on local development machines running OS X 10.10+

Configure and Load Virtual Machine

$ git clone [email protected]:refinery-platform/refinery-platform.git
$ cd refinery-platform
$ vagrant up

The above step should take about 15 minutes depending on the speed of your machine and Internet connection. If you get an error, simply retry by:

$ vagrant provision

Open http://192.168.50.50:8000/ in your web browser.

Configure Deployment Environment on the Host

Create a Python 2.7 virtual environment (optional but recommended, assumes virtualenvwrapper is installed):

$ mkvirtualenv -a $(pwd) refinery-deployment

Install deployment tools (assumes header files for Python are installed):

$ pip install -r deployment/requirements.txt

Install Pre-Commit Hooks

Use fabricrc.sample to update or initialize Fabric configuration, for example:

$ cp fabricrc.sample ~/.fabricrc

To pull the latest code and update Refinery installation:

$ fab vm update

Refinery Operations on the VM

Connect to the initialized VM:

$ vagrant ssh
$ workon refinery-platform
$ ./manage.py [command]

Log in to Django admin UI (http://192.168.50.50:8000/admin/) with the default superuser account (username: admin, password: refinery).

Please see installation notes for more details, including information on how to configure Galaxy for this setup.

Troubleshooting

Refinery deployment requires a lot of external dependencies. You might have to run vagrant provision repeatedly to install all dependencies successfully. Any errors in the output of vagrant provision indicate that you have to re-run the command.
If you run into a build error in OS X when trying to install Fabric: export C_INCLUDE_PATH=/usr/local/include
If you have a VPN connection running, you may need to disconnect and reconnect before you can access the VM. In some cases you may have to reboot the host machine.
To make sure all the required services are running after the VM was restarted or shut down, you need to provision again: vagrant reload --provision or vagrant up --provision

refinery-platform's People

Contributors

Stargazers

Watchers

refinery-platform's Issues

add execute flag to workflow engines

Pivotal Tracker story 27410303 (Psalm Haseley - Apr 2, 2012)

add support to retrieve galaxy file type from history

Pivotal Tracker story 27550029 (Nils Gehlenborg - Apr 4, 2012)

Test Issue 2

Created with Eclipe Mylyn.

Finish galaxy analysis pipeline, download results to a stored directory

Finish up galaxy workflows and download results to a selected directory
Tasks:

Download analysis results to directory

Pivotal Tracker story 27458343 (Richard Park - Apr 3, 2012)

accept ISArchives with and without subdirectories in the ISA-Tab parser

Pivotal Tracker story 32709133 (Nils Gehlenborg - Jul 12, 2012)

version data sets

Data sets (core) will contain one or more instances of a ISA-TAB investigation. Only the most recent one will be "active" (e.g. for indexing and searching.

Pivotal Tracker story 28338013 (Nils Gehlenborg - Apr 19, 2012)

Duplicate base resource slugs are possible

The slugs are not enforced to be unique.

web UI for importing ISA-Tab files

submit a zipped ISA-Tab file via POST, unzip and parse

Pivotal Tracker story 28890601 (Ilya Sytchev - May 1, 2012)

readme file has incorrect git clone url

currently git clone url in readme is:

git clone [email protected]:parklab/refinery-platform.git

Running it gives:
Cloning into 'refinery-platform'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

git clone https://github.com/parklab/refinery-platform.git

worked for me.

Instructions for creating django super user didn't work

Instructions indicated to:
django-admin.py createsuperuser

no createsuperuser command found

using

python manage.py createsuperuser

from /vagrant/refinery dir worked for me

ISA-Tab Logging

modify ISA-Tab processing to log into a single file

Pivotal Tracker story 32490181 (Nils Gehlenborg - Jul 9, 2012)

explore options for Django logging and monitoring

I was wondering if you have time to look into logging in Django and how we could use this for Refinery. It looks like Django 1.3 introduced extensive logging facilities but I'm specifically interested in whether there is there is already an extension or project with a UI to centrally monitor the log messages for a Django project (such as Refinery). Maybe this has been implemented or could be implemented in the Django admin interface.

In Refinery we would like to avoid exposing all errors to the end users because there is not much that they can do if a problem occurs so I imagine that we will have (a) a log monitor for the system administrator with all messages and (b) log monitors for project administrators/users that contains only messages that they can address themselves (problems with user uploaded files, approaching disk quotas, etc.).

For now all you'd need to do is to read up on this and get an overview of what is out there that could help us with these tasks. The next step would be to work on (a) and figure out what the best way is to implement filters, search, notifications etc. for log messages. Once we have figured this out (b) should be easy to implement.

Pivotal Tracker story 26823249 (Nils Gehlenborg - Mar 21, 2012)

Figure out Expansion Factor of SPP workflow

First test on fisher. 2x2 inputs and experiments totaling about 2 GB creates a final analysis set of about 6.5 gb. The total expansion factor is about 2-3x.

Pivotal Tracker story 27409219 (Psalm Haseley - Apr 2, 2012)

fix new converter extra tab at end

Pivotal Tracker story 33435297 (Psalm Haseley - Jul 26, 2012)

Update to Django 1.4

Pivotal Tracker story 29236651 (Nils Gehlenborg - May 8, 2012)

use galaxy workflow output flag to indicate what files are output

Tasks:

flag outputs in Galaxy workflow editor
update get_history_file_list() to (by default) ignore visible=false files

Pivotal Tracker story 27611871 (Nils Gehlenborg - Apr 5, 2012)

Correctly identify number of steps in workflow. Differs from number of steps defined in workflow file.

Pivotal Tracker story 33081827 (Richard Park - Jul 19, 2012)

refactor file store to use symlinks

create symlinks to files added to the file store from local file system

Pivotal Tracker story 32560893 (Ilya Sytchev - Jul 10, 2012)

get number of steps for a given Galaxy workflow

The number of steps for an expanded workflow is now stored in the analysis model.

Pivotal Tracker story 27294253 (Nils Gehlenborg - Mar 30, 2012)

Put link to E-R diagram in Wiki

Pivotal Tracker story 27409145 (Psalm Haseley - Apr 2, 2012)

Test Issue

Something needs to be fixed.

store analysis results in repository

Once an analysis has completed the results (i.e. specific output files) have to be stored in the repository together with information about how they were created.

Pivotal Tracker story 26784185 (Nils Gehlenborg - Mar 21, 2012)

create logging monitor for admin

Pivotal Tracker story 26784421 (Nils Gehlenborg - Mar 21, 2012)

store original isa-tab files and mage-tab files in file store

Tasks:

extend data model
store files (zipped)

Pivotal Tracker story 26872163 (Nils Gehlenborg - Mar 22, 2012)

extend parser to work with ISA-Tab files that are a combination of multiple studies

Pivotal Tracker story 27478955 (Psalm Haseley - Apr 3, 2012)

write Refinery architecture overview

Pivotal Tracker story 26823669 (Nils Gehlenborg - Mar 21, 2012)

progress updates for Galaxy workflows

Use states of the current history to implement this.
Tasks:

implement monitoring with callbacks
implemented monitoring via analysis_manager for workflows

Pivotal Tracker story 26811059 (Nils Gehlenborg - Mar 21, 2012)

move logger settings into settings.py from local settings

Pivotal Tracker story 33084467 (Nils Gehlenborg - Jul 19, 2012)

Refinery doesn't work in Firefox (or Internet Explorer)

Nothing initializes in Firefox if an analysis is selected. IGV viewing works.

implement basic file store API

create(), read(), import(), delete() and update()

Pivotal Tracker story 28888117 (Nils Gehlenborg - May 1, 2012)

Integration Refinery with Galaxy. Step 1: Creating a Galaxy Instance

Following the instructions for importing a workflow from galaxy into refinery.
Step 1 is to create a galaxy instance in refinery.

The instructions say to use the administration interface, which i can't seam to find.
At 1st i thought my refinery user didn't have enough permissions to see the administration interface (perhaps my creation of the django super user failed, though an account was created....)

Then I found the create_galaxy_instance command in manage.py.
Though there was no details on how to use it.

Using my imagination i tried:
python manage.py create_galaxy_instance galaxyAPIURL
But that didn't work either.

Add ability to galaxy to download a collection of files, zip, and return a tar ball via the cluster

Figure out a way for galaxy to zip a collection of files with the proper names via the cluster to prevent the front end from being bogged down by extra processing

Pivotal Tracker story 32495693 (Richard Park - Jul 9, 2012)

edit process_isatab command

takes in a directory of isa-archives or a single isa-archive

Pivotal Tracker story 33121935 (Psalm Haseley - Jul 20, 2012)

serve genome sequence in given window

Pivotal Tracker story 35637161 (Nils Gehlenborg - Sep 6, 2012)

Clicking "View in IGV" multiple times in quick succession with lead to multiple sets of launch buttons in the dialog

Reproduce: click the "View in IGV" button multiple times in quick succession after selecting a subset of files.

Possible cause: Multiple requests get send but previous requests do not get aborted or their returns do not get handled correctly.

put our Galaxy extensions into the main branch

Pivotal Tracker story 27410481 (Psalm Haseley - Apr 2, 2012)

add support for api keys

Pivotal Tracker story 28337969 (Nils Gehlenborg - Apr 19, 2012)

Create fixture for New ISA-Tab Model

Pivotal Tracker story 27409117 (Psalm Haseley - Apr 2, 2012)

merge isa-tab importing backend code for view and command line input

Right now there are two ISA-Tab processing tasks and they should be merged to avoid duplication of code (DRY). See data_set_manager.tasks.py (towards the end of the file).

Pivotal Tracker story 33677665 (Nils Gehlenborg - Jul 31, 2012)

add instructions on how to set up Sentry to wiki

Pivotal Tracker story 33084397 (Nils Gehlenborg - Jul 19, 2012)

find out best practices for Python/Django documentation

What is the difference between Sphinx and PyDocs? What does Read the Docs offer? Can we download the docs from RTD easily?

We want a solution that would also allow us to write brief tutorials and step-by-step descriptions on how to perform certain tasks, see e.g. http://django-haystack.readthedocs.org/en/latest/tutorial.html

Pivotal Tracker story 27557737 (Nils Gehlenborg - Apr 4, 2012)

Remove References to Refinery Repository from data_set_manager views

lines 17 and 123

Pivotal Tracker story 33123325 (Psalm Haseley - Jul 20, 2012)

Remove references to Refinery Repository in Analysis Manager

Pivotal Tracker story 33122351 (Psalm Haseley - Jul 20, 2012)

read through eXFrame's createISA.pl to get an idea of what they're doing

Pivotal Tracker story 27478911 (Psalm Haseley - Apr 3, 2012)

add support to file store to rename files

Pivotal Tracker story 32742007 (Nils Gehlenborg - Jul 13, 2012)

Assignment of data files to workflow inputs in the data set view is not working correctly

Steps to reproduce:
Open a data set view (/data_sets//). Select a workflow and assign inputs. Add or remove a column, or change a facet filter. Assign inputs.

Observed behavior:
Workflow input assignment is reset after view columns are added/removed or if filters are applied. Inputs are not available for selection until the workflow is deselected then selected again.

Expected behavior:
Input assignments should not be changed by applying filters or selecting/removing view columns. If input assignments are reset, they should be available for selection without having to de-select then re-select the workflow.

explain to richard, psalm and ilya how pivotal tracker works

A quick overview - mention what "icebox" means.

Pivotal Tracker story 26822389 (Nils Gehlenborg - Mar 21, 2012)

Error starting a workflow

TypeError exception is sometimes raised when trying to run a workflow from Available Samples page.

Steps to reproduce:

Import an ISA-Tab file.
Proceed to the Available Samples page.
Select a workflow and its inputs, and click Run Workflow.

Analysis Results page loads but progress bars are not displayed. Data is not imported in Galaxy. No errors appear in Django server console but the following output appears in Celery console:

[2012-07-13 17:52:40,053: WARNING/PoolWorker-1] analysis_manager.chord_postprocessing called
[2012-07-13 17:52:40,065: WARNING/PoolWorker-1] analysis_manger.download_history_files called
[2012-07-13 17:52:40,122: ERROR/MainProcess] Task analysis_manager.tasks.chord_postprocessing[d5c88166-9d1d-47ac-bacc-375b950cd927] raised exception: TypeError("cannot concatenate 'str' and 'NoneType' objects",)
Traceback (most recent call last):
  File "/Users/isytchev/environments/refinery/lib/python2.7/site-packages/celery/execute/trace.py", line 47, in trace
    return cls(states.SUCCESS, retval=fun(*args, **kwargs))
  File "/Users/isytchev/environments/refinery/lib/python2.7/site-packages/celery/app/task/__init__.py", line 247, in __call__
    return self.run(*args, **kwargs)
  File "/Users/isytchev/environments/refinery/lib/python2.7/site-packages/celery/app/__init__.py", line 175, in run
    return fun(*args, **kwargs)
  File "/Users/isytchev/workspace/Refinery/refinery/analysis_manager/tasks.py", line 74, in chord_postprocessing
    postprocessing_taskset = download_history_files(analysis)
  File "/Users/isytchev/environments/refinery/lib/python2.7/site-packages/celery/app/task/__init__.py", line 247, in __call__
    return self.run(*args, **kwargs)
  File "/Users/isytchev/environments/refinery/lib/python2.7/site-packages/celery/app/__init__.py", line 175, in run
    return fun(*args, **kwargs)
  File "/Users/isytchev/workspace/Refinery/refinery/analysis_manager/tasks.py", line 399, in download_history_files
    download_list = connection.get_history_file_list(analysis.history_id)
  File "/Users/isytchev/workspace/Refinery/refinery/galaxy_connector/connection.py", line 124, in get_history_file_list
    print self.get_history_contents( history_id )
  File "/Users/isytchev/workspace/Refinery/refinery/galaxy_connector/connection.py", line 98, in get_history_contents
    return self.get( "histories" + "/" + history_id + "/" + "contents" )
TypeError: cannot concatenate 'str' and 'NoneType' objects

Traceback (most recent call last):
  File "/Users/isytchev/environments/refinery/lib/python2.7/site-packages/celery/execute/trace.py", line 47, in trace
    return cls(states.SUCCESS, retval=fun(*args, **kwargs))
  File "/Users/isytchev/environments/refinery/lib/python2.7/site-packages/celery/app/task/__init__.py", line 247, in __call__
    return self.run(*args, **kwargs)
  File "/Users/isytchev/environments/refinery/lib/python2.7/site-packages/celery/app/__init__.py", line 175, in run
    return fun(*args, **kwargs)
  File "/Users/isytchev/workspace/Refinery/refinery/analysis_manager/tasks.py", line 74, in chord_postprocessing
    postprocessing_taskset = download_history_files(analysis)
  File "/Users/isytchev/environments/refinery/lib/python2.7/site-packages/celery/app/task/__init__.py", line 247, in __call__
    return self.run(*args, **kwargs)
  File "/Users/isytchev/environments/refinery/lib/python2.7/site-packages/celery/app/__init__.py", line 175, in run
    return fun(*args, **kwargs)
  File "/Users/isytchev/workspace/Refinery/refinery/analysis_manager/tasks.py", line 399, in download_history_files
    download_list = connection.get_history_file_list(analysis.history_id)
  File "/Users/isytchev/workspace/Refinery/refinery/galaxy_connector/connection.py", line 124, in get_history_file_list
    print self.get_history_contents( history_id )
  File "/Users/isytchev/workspace/Refinery/refinery/galaxy_connector/connection.py", line 98, in get_history_contents
    return self.get( "histories" + "/" + history_id + "/" + "contents" )
TypeError: cannot concatenate 'str' and 'NoneType' objects

specify file store

interfaces with: Analysis Manager, Data Set Manager (Repository), external repositories, Connectors?
includes cache, analysis input/output directories, original meta data files, data files for the local repository

Pivotal Tracker story 26871757 (Nils Gehlenborg - Mar 22, 2012)

implement JavaScript tdf parser

Pivotal Tracker story 28259971 (Nils Gehlenborg - Apr 18, 2012)