ioam / lancet Goto Github PK

View Code? Open in Web Editor NEW

32.0 9.0 8.0 5.49 MB

Launch jobs, organize the output, and dissect the results

Home Page: http://ioam.github.io/lancet/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

lancet's Introduction

Lancet

Launch jobs, organize the output, and dissect the results.

More information about Lancet may be found on Lancet's website. Here you may find a quickstart example, user documentation, tutorials as well as more information about publications involving Lancet.

To obtain Lancet from GitHub:

git clone git://github.com/ioam/lancet.git

In common with many projects on IOAM, Lancet uses param, so make sure to install it with pip install param.

Introduction

Lancet is designed to help you organize the output of your research tools, store it, and dissect the data you have collected. The output of a single simulation or analysis rarely contains all the data you need; Lancet helps you generate data from many runs and analyse it using your own Python code.

Parameter spaces often need to be explored for the purpose of plotting, tuning, or analysis. Lancet helps you extract the information you care about from potentially enormous volumes of data generated by such parameter exploration.

Features

A simple, useful core with advanced functionality strictly optional. Use what you need without learning all components.
All components use a declarative style, helping to ensure reproduciblity.
Succinctly express the high dimensional parameter spaces, without nested loops.
Easily interface with external tools using ShellCommand or quickly build a flexible, reusable interfaces to advanced simulators or analysis tools. Currently, the Topographica neural simulator is actively supported.
Seamlessly switch from running jobs locally to launching jobs on a compute cluster.
Keep your output organized together with key metadata for reproducibility.
Quickly load your data and view the parameters from previous runs.
Integrates well with other popular tools such as IPython Notebook and the pandas data analysis library.
Actively used in scientific research and publication.

Contributors

The following people have contributed to Lancet's design and implementation:

Jean-Luc Stevens: Original coding and design

Marco Elver : Python 3 fork, cleaned up many aspects of the design.

James A. Bednar: For supporting the development of a solution that works with any tool and not just Topographica .

Philipp Rudiger: Testing, feedback and suggestions.

lancet's People

Contributors

Stargazers

Watchers

Forkers

melver prabhuramachandran mjabri philippjfr experimentaccount0 experimentaccount4 jbampton mattkram

lancet's Issues

Put the job number first in the Grid Engine qsub job name

It would be really helpful if jobs launched via GridEngine could list the job number in the job name first, such that it is viewable via qstat. Currently although it is included in the job name, qstat usually truncates it.

I'd really appreciate this change because I will often launch large parameter searches with jobs that can run up to 48 hours and hog 8 CPU's and realize relatively early that certain parameters won't work out. Rather than wasting the resources I then cancel these jobs selectively.

In the past I have had to work out which job-IDs correspond to which parameters by looking at the first or last job and counting from there. This is obviously insane (and error-prone) when there are >100 jobs and becomes even more difficult when some jobs have already completed. I'd therefore hugely appreciate if we could prepend the job number to the job name submitted to qsub, making identifying specific jobs much easier.

Currently the job name is built like this:

job_name = "%s_%s_tid_%d" % (self.batch_name, self.job_timestamp, tid)

Producing this qstat output:

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
7817272 0.00941 divisive_g s1048519     r     07/22/2014 20:10:03 ecdf@eddie491                      2        
7817306 0.00941 divisive_g s1048519     t     07/22/2014 23:43:15 ecdf@eddie320                      2 
....

I suggest changing it to this:

job_name = "t%d_%s_%s" % (tid, self.batch_name, self.job_timestamp)

Output:

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
7817272 0.00941 t0_divisiv s1048519     r     07/22/2014 20:10:03 ecdf@eddie491                      2        
7817306 0.00941 t1_divisiv s1048519     t     07/22/2014 23:43:15 ecdf@eddie320                      2
....

python setup.py develop does not work

python setup.py develop does not work out of the box with Python-2.7.x and Lancet.

Subtraction from a cartesian product

While reading your interesting paper and experimenting with lancet, I tried the following:

from lancet import Args, Range
params = Args(arg1=1.0) * Range('arg2', 1, 3, steps=3)
ap = params - Args(arg1=1.0, arg2=2.0)

But this doesn't work. I can imagine scenarios where this is just as natural as extending a parameter space with addition. Was wondering why this was not supported.

Launcher should only re-run unsuccessful runs

Launcher currently re-launches everything when called again. Instead Launcher should only run previous runs if the earlier invocation failed (with a non-zero exit status) or explicitly asked to force-rerun. This would also make it easy to change the parameters and add additional parameters ranges to explore without having to worry about re-running everything.

RuntimeError: maximum recursion depth exceeded

The current git HEAD seems to have trouble running the basic example given on the lancet homepage:

>>> import lancet
>>> integers = lancet.Range('integer', 100, 115)
RuntimeError: maximum recursion depth exceeded while calling a Python object

/Users/user/Code/lancet/lancet/core.pyc in __len__(self)
    431         return DataFrame(self.specs) if DataFrame else "Pandas not available"
    432 
--> 433     def __len__(self): return len(self.specs)
    434 
    435 

/usr/local/lib/python2.7/site-packages/param/parameterized.pyc in __get__(self, obj, objtype)
    372         # Parameterized class); objtype is never None
    373 
--> 374         if not obj:
    375             result = self.default
    376         else:

/Users/user/Code/lancet/lancet/core.pyc in __len__(self)
    431         return DataFrame(self.specs) if DataFrame else "Pandas not available"
    432 
--> 433     def __len__(self): return len(self.specs)
    434 
    435 

/usr/local/lib/python2.7/site-packages/param/parameterized.pyc in __get__(self, obj, objtype)
    372         # Parameterized class); objtype is never None
    373 
--> 374         if not obj:
    375             result = self.default
    376         else:

Launcher.call should return information on the output

This would make it easy to get back the stdout/stderr along with any files generated without having to write a lot of code.
The returned information should essentially be a combination of what Log and FilePattern provide.

For example:

launcher = Launcher(...)
output = launcher()
results = [float(stdout.read()) for stdout in output.stdout]

Ideally the output should contain enough information that it should be very easy to generate HoloMaps or pandas dataframes from this output without having to write a lot of code.

Make lancet packages be noarch

Now that noarch is generally well supported, we should make this pure-Python package be a noarch conda package to avoid ultra-confusing problems like holoviz/datashader#532 due to having no Lancet package for a particular Python version.

ioam / lancet Goto Github PK

lancet's Introduction

Lancet

Introduction

Features

Contributors

lancet's People

Contributors

Stargazers

Watchers

Forkers

lancet's Issues

Recommend Projects

Recommend Topics

Recommend Org