Giter VIP home page Giter VIP logo

lancet's Introduction

BuildStatus License

Lancet

Launch jobs, organize the output, and dissect the results.

More information about Lancet may be found on Lancet's website. Here you may find a quickstart example, user documentation, tutorials as well as more information about publications involving Lancet.

To obtain Lancet from GitHub:

git clone git://github.com/ioam/lancet.git

In common with many projects on IOAM, Lancet uses param, so make sure to install it with pip install param.

Introduction

Lancet is designed to help you organize the output of your research tools, store it, and dissect the data you have collected. The output of a single simulation or analysis rarely contains all the data you need; Lancet helps you generate data from many runs and analyse it using your own Python code.

Parameter spaces often need to be explored for the purpose of plotting, tuning, or analysis. Lancet helps you extract the information you care about from potentially enormous volumes of data generated by such parameter exploration.

Features

  • A simple, useful core with advanced functionality strictly optional. Use what you need without learning all components.
  • All components use a declarative style, helping to ensure reproduciblity.
  • Succinctly express the high dimensional parameter spaces, without nested loops.
  • Easily interface with external tools using ShellCommand or quickly build a flexible, reusable interfaces to advanced simulators or analysis tools. Currently, the Topographica neural simulator is actively supported.
  • Seamlessly switch from running jobs locally to launching jobs on a compute cluster.
  • Keep your output organized together with key metadata for reproducibility.
  • Quickly load your data and view the parameters from previous runs.
  • Integrates well with other popular tools such as IPython Notebook and the pandas data analysis library.
  • Actively used in scientific research and publication.

Contributors

The following people have contributed to Lancet's design and implementation:

Jean-Luc Stevens: Original coding and design

Marco Elver : Python 3 fork, cleaned up many aspects of the design.

James A. Bednar: For supporting the development of a solution that works with any tool and not just Topographica .

Philipp Rudiger: Testing, feedback and suggestions.

lancet's People

Contributors

ceball avatar jbednar avatar jlstevens avatar melver avatar philippjfr avatar prabhuramachandran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lancet's Issues

Put the job number first in the Grid Engine qsub job name

It would be really helpful if jobs launched via GridEngine could list the job number in the job name first, such that it is viewable via qstat. Currently although it is included in the job name, qstat usually truncates it.

I'd really appreciate this change because I will often launch large parameter searches with jobs that can run up to 48 hours and hog 8 CPU's and realize relatively early that certain parameters won't work out. Rather than wasting the resources I then cancel these jobs selectively.

In the past I have had to work out which job-IDs correspond to which parameters by looking at the first or last job and counting from there. This is obviously insane (and error-prone) when there are >100 jobs and becomes even more difficult when some jobs have already completed. I'd therefore hugely appreciate if we could prepend the job number to the job name submitted to qsub, making identifying specific jobs much easier.

Currently the job name is built like this:

job_name = "%s_%s_tid_%d" % (self.batch_name, self.job_timestamp, tid)

Producing this qstat output:

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
7817272 0.00941 divisive_g s1048519     r     07/22/2014 20:10:03 ecdf@eddie491                      2        
7817306 0.00941 divisive_g s1048519     t     07/22/2014 23:43:15 ecdf@eddie320                      2 
....

I suggest changing it to this:

job_name = "t%d_%s_%s" % (tid, self.batch_name, self.job_timestamp)

Output:

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
7817272 0.00941 t0_divisiv s1048519     r     07/22/2014 20:10:03 ecdf@eddie491                      2        
7817306 0.00941 t1_divisiv s1048519     t     07/22/2014 23:43:15 ecdf@eddie320                      2
....

Subtraction from a cartesian product

While reading your interesting paper and experimenting with lancet, I tried the following:

from lancet import Args, Range
params = Args(arg1=1.0) * Range('arg2', 1, 3, steps=3)
ap = params - Args(arg1=1.0, arg2=2.0)

But this doesn't work. I can imagine scenarios where this is just as natural as extending a parameter space with addition. Was wondering why this was not supported.

Launcher should only re-run unsuccessful runs

Launcher currently re-launches everything when called again. Instead Launcher should only run previous runs if the earlier invocation failed (with a non-zero exit status) or explicitly asked to force-rerun. This would also make it easy to change the parameters and add additional parameters ranges to explore without having to worry about re-running everything.

RuntimeError: maximum recursion depth exceeded

The current git HEAD seems to have trouble running the basic example given on the lancet homepage:

>>> import lancet
>>> integers = lancet.Range('integer', 100, 115)
RuntimeError: maximum recursion depth exceeded while calling a Python object
/Users/user/Code/lancet/lancet/core.pyc in __len__(self)
    431         return DataFrame(self.specs) if DataFrame else "Pandas not available"
    432 
--> 433     def __len__(self): return len(self.specs)
    434 
    435 

/usr/local/lib/python2.7/site-packages/param/parameterized.pyc in __get__(self, obj, objtype)
    372         # Parameterized class); objtype is never None
    373 
--> 374         if not obj:
    375             result = self.default
    376         else:

/Users/user/Code/lancet/lancet/core.pyc in __len__(self)
    431         return DataFrame(self.specs) if DataFrame else "Pandas not available"
    432 
--> 433     def __len__(self): return len(self.specs)
    434 
    435 

/usr/local/lib/python2.7/site-packages/param/parameterized.pyc in __get__(self, obj, objtype)
    372         # Parameterized class); objtype is never None
    373 
--> 374         if not obj:
    375             result = self.default
    376         else:

Launcher.__call__ should return information on the output

  1. This would make it easy to get back the stdout/stderr along with any files generated without having to write a lot of code.
  2. The returned information should essentially be a combination of what Log and FilePattern provide.

For example:

launcher = Launcher(...)
output = launcher()
results = [float(stdout.read()) for stdout in output.stdout] 

Ideally the output should contain enough information that it should be very easy to generate HoloMaps or pandas dataframes from this output without having to write a lot of code.

Make lancet packages be noarch

Now that noarch is generally well supported, we should make this pure-Python package be a noarch conda package to avoid ultra-confusing problems like holoviz/datashader#532 due to having no Lancet package for a particular Python version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.