Giter VIP home page Giter VIP logo

codalab-worksheets's Introduction

CodaLab logo Circle CI codecov

What is CodaLab?

CodaLab is an open-source web-based platform that enables researchers, developers, and data scientists to collaborate, with the goal of advancing research fields where machine learning and advanced computation is used. CodaLab helps to solve many common problems in the arena of data-oriented research through its online community where people can share worksheets and participate in competitions.

To see Codalab Competition's in action, visit codalab.lisn.fr.

Codabench, the next-gen of CodaLab Competitions, is out. Try it out!

Documentation

Community

The CodaLab community forum is hosted on Google Groups.

Quick installation (for Linux!)

To participate in competitions, or even organize your own competition, you don't need to install anything, you just need to sign in an instance of the platform (e.g. this one). If you wish to configure your own instance of CodaLab competitions, here are the instructions:

Install docker and add your user to the docker group, if you haven't already

$ wget -qO- https://get.docker.com/ | sh
$ sudo usermod -aG docker $USER

Clone this repo and get the default environment setup

$ git clone https://github.com/codalab/codalab-competitions
$ cd codalab-competitions
$ cp .env_sample .env
$ pip install docker-compose
$ docker-compose up -d

Now you should be able to access http://localhost/

More details on how to configure your own instance:

License

Copyright (c) 2013-2015, The Outercurve Foundation. Copyright (c) 2016-2021, Université Paris-Saclay. This software is released under the Apache License 2.0 (the "License"); you may not use the software except in compliance with the License.

The text of the Apache License 2.0 can be found online at: http://www.opensource.org/licenses/apache2.0.php

Cite CodaLab Competitions in your research

@article{codalab_competitions_JMLR,
  author  = {Adrien Pavao and Isabelle Guyon and Anne-Catherine Letournel and Dinh-Tuan Tran and Xavier Baro and Hugo Jair Escalante and Sergio Escalera and Tyler Thomas and Zhen Xu},
  title   = {CodaLab Competitions: An Open Source Platform to Organize Scientific Challenges},
  journal = {Journal of Machine Learning Research},
  year    = {2023},
  volume  = {24},
  number  = {198},
  pages   = {1--6},
  url     = {http://jmlr.org/papers/v24/21-1436.html}
}

codalab-worksheets's People

Contributors

adiprerepa avatar andrewjgaut avatar andyjin2000 avatar bkgoksel avatar candicegjing avatar cpoulain avatar dependabot[bot] avatar epicfaace avatar fabeschan avatar jizhen-wang avatar kashizui avatar klopyrev avatar kovach avatar leilenah avatar levilian avatar matt-f-wu avatar maxwang7 avatar mergify[bot] avatar nelson-liu avatar nikhilxb avatar percyliang avatar pranavjain avatar pujun-ai avatar raisins avatar teetone avatar w0nche0l avatar wwwjn avatar yiboliu avatar yipenghe avatar yuqijin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

codalab-worksheets's Issues

cl make from running or failed bundles

I am typically training models that take several days to converge.
Every epoch, model parameters are saved.
I often want to extract those parameters and analyze them, while the job continues to run.

Currently, if I try to call cl make 0x34dsaf1/params, Codalab will wait until the job finishes. However, since you never know when a neural net might converge, my jobs are configured to run indefinitely, until I manually decide to kill them. Unfortunately, manually killing a job gives it a "failed" status, and codalab refuses to make bundles from failed bundles.

So, the only way to "snapshot" the parameters is to cl download 0x34dsaf1/params and then cl upload dataset params. This ruins provenance and is very prone to mistakes.

Ideally, calling cl make would create a bundle which is a snapshot of those parameters at a particular point in time. The metadata could say when the snapshot was taken. A snapshot of the directory contents could also be taken. This is not perfect provenance, but it's much better than nothing.

when reference many bundles (^1-10), use worksheet.get_bundle_uuids

Currently, in bundle_cli.py, we're getting one bundle at a time, which is really slow.

For example, in do_rm_command(), we are calling
bundle_uuids = [worksheet_util.get_bundle_uuid(client, worksheet_uuid, bundle_spec) for bundle_spec in args.bundle_spec]
This should be replaced with one call.
Most commands (e.g., kill, add, perm) should be changed.

cl download recursive

The command:
cl download -r
should download the bundle and its dependencies, just like 'cl cp -r'.
This is useful for reproducing results locally (outside of CodaLab).
This would write the dependent bundles into the main bundle directory with the keys determined by the targets. (This is what is present when the bundle actually runs.)

fix argument parsing

tudor@tudor-258:~/prediction$ cl run runner:runner lib:lib -t "python run_experiment.py ds/ner-conll2003-eng-tag/train ds/ner-conll2003-eng-tag/dev FINerProblem"

fails whereas

tudor@tudor-258:~/prediction$ cl run runner:runner lib:lib "python run_experiment.py ds/ner-conll2003-eng-tag/train ds/ner-conll2003-eng-tag/dev FINerProblem" -t

succeeds. Looks like the -t argument isn't declared correctly.

Ignore certain files when uploading a bundle

When uploading a folder containing code, I often want to ignore large subfolders which contain data, saved parameters, IPython caches, the .git folder, etc.

It would be very useful to have a .clignore file, with the same semantics as .gitignore.

let each bundle have its own permissions

Currently, bundles inherit permissions from the worksheets that they belong to. But we need finer-grained control (so that I can have someone else's bundle on my worksheet in a read-only manner). This requires a schema change to the DB.

cl cp sync metadata

Make cl cp copy over the metadata and update the bundle - don't skip entirely if the bundle already exists. This allows us to keep the metadata consistent across bundles.

The annoying thing is that while the bundle is immutable, the metadata of the bundle is mutable (and must be in order to write descriptions about it and transfer ownership, etc.). Which means that the uuid doesn't actually specify everything.

If the user is frequently transferring bundles between two instances, then it's his/her responsibility to know which one is the master version.

Worksheet content disappears after adding unicode characters

To reproduce:

  1. Create a new worksheet and add some content.

    > cl new bug-sandbox
    > cl work bug-sandbox
    > cl wedit
    # (Add some content and then save)
    > cl print
    ### Worksheet: http://localhost:12800::bug-sandbox(0x...)
    ### Owner: ...
    ### Permissions: ...
    Parent:
    [Worksheet ...]
    This is an example sentence.
    
  2. Edit the worksheet, add a Unicode character, and then save.

    > cl wedit
    # (Add ♥ and then save)
    
  3. The following error will appear, and the worksheet will be reset to the placeholder comments. All other contents are wiped out.

    Traceback (most recent call last):
      File "/home/ppasupat/.local/codalab-cli/codalab/bin/../../codalab/bin/cl.py", line 67, in <module>
        run_cli()
      File "/home/ppasupat/.local/codalab-cli/codalab/bin/../../codalab/bin/cl.py", line 58, in run_cli
        cli.do_command(sys.argv[1:])
      File "/home/ppasupat/.local/codalab-cli/codalab/lib/bundle_cli.py", line 337, in do_command
        command_fn(remaining_args, parser)
      File "/home/ppasupat/.local/codalab-cli/codalab/lib/bundle_cli.py", line 1155, in do_wedit_command
        client.update_worksheet(worksheet_info, new_items)
      File "/home/ppasupat/.local/codalab-cli/codalab/client/remote_bundle_client.py", line 136, in inner
        return getattr(self.proxy, command)(*args, **kwargs)
      File "/usr/lib/python2.7/xmlrpclib.py", line 1224, in __call__
        return self.__send(self.__name, args)
      File "/usr/lib/python2.7/xmlrpclib.py", line 1578, in __request
        verbose=self.__verbose
      File "/usr/lib/python2.7/xmlrpclib.py", line 1264, in request
        return self.single_request(host, handler, request_body, verbose)
      File "/usr/lib/python2.7/xmlrpclib.py", line 1297, in single_request
        return self.parse_response(response)
      File "/usr/lib/python2.7/xmlrpclib.py", line 1473, in parse_response
        return u.close()
      File "/usr/lib/python2.7/xmlrpclib.py", line 793, in close
        raise Fault(**self._stack[0])
    xmlrpclib.Fault: <Fault 1: "<type 'exceptions.UnicodeEncodeError'>:'latin-1' codec can't encode character u'\\u2665' in position 0: ordinal not in range(256)">
    

cl wedit doesn't work with sublime text?

I set the editor to Sublime Text with export EDITOR=subl.

When I call cl wedit, Sublime Text opens with the worksheet as expected, but the process behind cl wedit doesn't wait for me to make changes. It immediately shows UsageError: No change made; aborting.

This error might not be specific to Sublime Text.

support multiple genpaths per column

In a table schema, allow multiple genpaths, for example:
% schema foo
% add iters (/output.map:currIter /options.map:totalIters) "format $1/$2"

The specification here is not final.

cl new does not switch to the new worksheet

The help text for cl new says "Create a new worksheet and make it the current one." But after calling cl new [name], the active worksheet is not switched to the new worksheet.

Cannot use / in table schema postprocessing regex

In the schema syntax

% add <key-name> <genpath> [<post-processor>]

One of the post-processor is regular expression substitution s/<old>/<new>.
The problem is that <old> will not accept /, even when it is escaped as \/ or [/]
I had to use \x2f, which is a hacky workaround.

only allow user to read/write data and temp files

This issue applies when the CodaLab instance is running on a server that people have user accounts on, but you don't want them to access all of the CodaLab files.

This means that every time a new bundle directory is created, we need to restrict its permissions.

worker on run throws start_bundle error

cl run date

on the worker side it reports

cl worker
...
...
=== INTERNAL ERROR: start_bundle() takes exactly 4 arguments (5 given)
Traceback (most recent call last):
  File "/codalab/src/codalab-cli/codalab/objects/work_manager.py", line 114, in start_bundle
    status = self.machine.start_bundle(bundle, self.bundle_store, self.get_parent_dict(bundle), username)
TypeError: start_bundle() takes exactly 4 arguments (5 given)

Note my cl commands are running via remote service http://localhost:2800

make cl cp faster

Currently, cp does a lot of unncessary copying/zipping/unzipping. Download the zip file and directly stream it to the other instance. (Ideally, we would not go through the client, but then we would have to deal with authentication, which is more complex.)

MySQL Error when installing CodaLab

Hi Guys,

I was doing a fresh install of CodaLab into Ubuntu Server, and when running ./setup.sh, I received the following error message:

image

Conversely, I saw that in the release notes the setup.sh script was recently updated to install MySQL by default. Has this change not been integrated somehow?

cl logout

Should remove auth tokens from state.json for the current worksheet.

first change status to queued rather than running

In work_manager, change status to 'queued' rather than 'running', which means it's submitted to the job queue, but not necessarily running yet. Only when it's actually running, should we use 'running'.

issues uploading multiple files

tudor@tudor-258:~/prediction$ cl up dataset deptrees/train deptrees/dev                                                 
Traceback (most recent call last):                                                                                      
  File "/home/tudor/codalab-cli/codalab/bin/../../codalab/bin/cl.py", line 67, in <module>                              
    run_cli()                                                                                                           
  File "/home/tudor/codalab-cli/codalab/bin/../../codalab/bin/cl.py", line 58, in run_cli                               
    cli.do_command(sys.argv[1:])                                                                                        
  File "/home/tudor/codalab-cli/codalab/lib/bundle_cli.py", line 329, in do_command                                     
    command_fn(remaining_args, parser)                                                                                  
  File "/home/tudor/codalab-cli/codalab/lib/bundle_cli.py", line 458, in do_upload_command                              
    print client.upload_bundle(args.path, {'bundle_type': args.bundle_type, 'metadata': metadata}, worksheet_uuid, args.
follow_symlinks)                                                                                                        
  File "/home/tudor/codalab-cli/codalab/client/remote_bundle_client.py", line 157, in upload_bundle                     
    if path_util.path_is_url(path):                                                                                     
  File "/home/tudor/codalab-cli/codalab/lib/path_util.py", line 421, in path_is_url                                     
    if path.startswith(prefix + '://'):                                                                                 
AttributeError: 'list' object has no attribute 'startswith'                                                             

allow wcp/cp to copy dependencies for runs

Use 'cl cp/wcp -d' to copy the dependencies of run (not make) bundles, so that we can rerun things. Note that this is not a fully recursive copy (which might be too much).

make cl wcp copy all items at once

Currently, cl wcp copies one worksheet item at a time, which is slow. Instead, copy all the worksheet items first before going over the items.

Cl search returning bundle permission errors.

When searching for a simple sleep run I am getting permission errors. I believe this is happening because user Raisins has a worksheet shared with him that he can edit created by another user. Which has bundles called run-sleep

cl search sleep
Traceback (most recent call last):
  File "/Users/Dave/Work/codalab/src/codalab-cli/codalab/bin/../../codalab/bin/cl.py", line 67, in <module>
    run_cli()
  File "/Users/Dave/Work/codalab/src/codalab-cli/codalab/bin/../../codalab/bin/cl.py", line 58, in run_cli
    cli.do_command(sys.argv[1:])
  File "/Users/Dave/Work/codalab/src/codalab-cli/codalab/lib/bundle_cli.py", line 320, in do_command
    command_fn(remaining_args, parser)
  File "/Users/Dave/Work/codalab/src/codalab-cli/codalab/lib/bundle_cli.py", line 686, in do_search_command
    bundle_infos = client.get_bundle_infos(bundle_uuids)
  File "/Users/Dave/Work/codalab/src/codalab-cli/codalab/client/remote_bundle_client.py", line 149, in inner
    raise PermissionError(e.faultString[index + 1:])
codalab.common.PermissionError: User Raisins(4) does not have sufficient permissions on bundles [u'0x41fa7ccce4af46729741d13a4430aa11', u'0x3c85781dd65242f4a194630b10e802d8', u'0x936c961f5423467b8bfabeaf0f80e1e5', u'0x1f19e667756340a79157396d909a5c75', u'0x04c1ff46b5aa4b5eb7d39c60969c0c4c', u'0x33a27d65615545039ea15937853d734c', u'0xadd67d0b100f4e9d86f2597b5bef61ef', u'0xd195388fd49c4898bc5ddcea16212a6f', u'0x6eef40d89cb545f2a67716bd77a5600b', u'0x6e2a9cd269314d45b4e4e466be3078a2', u'0x27353c82a9654920af1f9c3c2aecb77c', u'0x4016aba940d349e9b2e8c413d0c3230d', u'0x96fa52a4cb0f456a92042c6b98315b7d', u'0x8bc6439f4d4e43d696e907a362ddbf82', u'0x9d393372574b415aa682c9789ed27127', u'0x895c4f2925894450a1f33b515307b94d', u'0x066945d0c7094792b467f5c17a31539c', u'0x6ce47fca4afe41439d48c2749dd35be5', u'0x67792d263c484ff1890d240b0b21f24d'] (have ['all', 'all', 'all', 'all', 'all', 'all', 'all', 'none', 'all', 'all', 'all', 'none', 'all', 'all', 'all', 'all', 'all', 'all', 'all'], need read).

support multithreading

cl server is currently single-threaded. Make this multi-threaded so one user doesn't block another.

be able to sort by fields that are not metadata

Want to be able to do:

cl search .mine .sort=/output.map:errorRate

The errorRate is on the file system, so we can't incorporate it directly into the SQL query, but we can use it to post-process the results.

cl download appears to block other actions

While waiting for a long cl download, I try to open a new terminal and continue working.
However, everything stalls until cl download reaches a point when (I think) it has zipped the bundle. Multi-threading could make this nicer.

hide bundles from a worksheet

This removes the bundle from the worksheet, but does not remove the bundle itself.
Example: cl hide ^3
Note: ^3 resolves to the bundle, which might appear multiple times on the worksheet.
Sugar for hide and add: cl mv ^3

Not sure this command is super necessary since we can do everything via editing. Especially since what happens to the markup around a bundle becomes a bit confusing.

Simple Runs failing on worker

Any cl run command returns failed.

$  cl run date 
0x9d306531d25d4eaa83b56f79474d092c

$  cl info 0x9d306531d25d4eaa83b56f79474d092c -v
bundle_type          : run
uuid                 : 0x9d306531d25d4eaa83b56f79474d092c
data_hash            : 0xf50cf41086b9325bc4195508f2e202c85e08c647
state                : failed
command              : date
owner                : user2(4)
name                 : run-date
created              : 2015-02-17 11:42:12
data_size            : 68
request_cpus         : 0
request_gpus         : 0
exitcode             : 127
job_handle           : 27126
temp_dir             : /Users/Dave/.codalab/temp/0x9d306531d25d4eaa83b56f79474d092c
=== contents ===
name  size
----------

From the worker

$  cl worker
2015-02-17 19:41:47: Running worker loop (num_iterations = None, sleep_time = 1)
2015-02-17 19:42:13: 1 CREATED bundles => 1 STAGED, 0 FAILED; 18 bundles still waiting on dependencies.
LocalMachine.start_bundle: copying dependencies of 0x9d306531d25d4eaa83b56f79474d092c to /Users/Dave/.codalab/temp/0x9d306531d25d4eaa83b56f79474d092c
/bin/sh: stdbuf: command not found
work_manager: 0x9d306531d25d4eaa83b56f79474d092c (running): {'job_handle': '27126', 'bundle': RunBundle(uuid='0x9d306531d25d4eaa83b56f79474d092c', name='run-date'), 'success': False, 'exitcode': 127}
Worker.finalize_bundle: installing dependencies to /Users/Dave/.codalab/temp/0x9d306531d25d4eaa83b56f79474d092c (copy=False)
BundleStore.upload: hashing /Users/Dave/.codalab/temp/0x9d306531d25d4eaa83b56f79474d092c
-- END BUNDLE: RunBundle(uuid='0x9d306531d25d4eaa83b56f79474d092c', name='run-date') [failed]

Error is command not found but I am able to exicute this command out side of codalab

$  date
Tue Feb 17 11:45:30 PST 2015

This also fails other commands python sleep etc..

cl upload URL slow

When cl uploading an URL, currently, entire bundle service is stalled. Don't do that.

make "ls" list files in a bundle instead of in the worksheet

There's unexpected behavior where

  • calling cat on a bundle with multiple files will list the files
  • calling cat on a bundle with one file will actually print the entire file

To keep things consistent with the unix semantics of cat and ls we should

  1. have cat only print files and throw an error if it's called on a non-file (e.g. a folder bundle)
  2. have ls list the contents of bundles instead of worksheets, since we already have print for the latter

If you guys think this is a reasonable idea I can make a pull request

When defining a schema, provide an option to hide certain rows

Just a suggested feature:

% schema runs_only
% addschema run
% hide bundle_type program

% display table runs_only
[program code]{0xc2803d826c244148b3bd26a6e8a4bc9e}
[run evaluation]{0xa3422744f2f1469c8c7b7a98b5c4f613}
[program code]{0x4a8ff2269dde4aeea88c9a0b9141f9dc}
[run evaluation]{0x1608155ee5894a9d80c96957126f570a}
[program code]{0x0e52113413744396869daa4474adcecd}
...

This can be useful when a user has been in the pattern of:

  1. modifying the code, 2. submitting a run, 3. repeat

cl wcp failing on copy from local to test env

It is failing on a foreign key constraint. This might because the bundle exists already on test env.

Traceback (most recent call last):
  File "/codalab-cli/codalab/bin/../../codalab/bin/cl.py", line 67, in <module>
    run_cli()
  File "/codalab-cli/codalab/bin/../../codalab/bin/cl.py", line 58, in run_cli
    cli.do_command(sys.argv[1:])
  File "/codalab-cli/codalab/lib/bundle_cli.py", line 342, in do_command
    command_fn(remaining_args, parser)
  File "/codalab-cli/codalab/lib/bundle_cli.py", line 1317, in do_wcp_command
    self.copy_bundle(source_client, source_bundle_info['uuid'], dest_client, dest_worksheet_uuid)
  File "/codalab-cli/codalab/lib/bundle_cli.py", line 558, in copy_bundle
    print dest_client.upload_bundle(source_path, info, dest_worksheet_uuid, False)
  File "/codalab-cli/codalab/client/remote_bundle_client.py", line 177, in upload_bundle
    result = self.upload_bundle_zip(remote_file_uuid, info, worksheet_uuid, follow_symlinks)
  File "/codalab-cli/codalab/client/remote_bundle_client.py", line 137, in inner
    return getattr(self.proxy, command)(*args, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1224, in __call__
    return self.__send(self.__name, args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1578, in __request
    verbose=self.__verbose
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1264, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1297, in single_request
    return self.parse_response(response)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1473, in parse_response
    return u.close()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 793, in close
    raise Fault(**self._stack[0])
xmlrpclib.Fault: <Fault 1: "<class 'sqlalchemy.exc.IntegrityError'>:(IntegrityError) (1452, 'Cannot add or update a child row: a foreign key constraint fails (`bundlesdb`.`bundle_dependency`, CONSTRAINT `bundle_dependency_ibfk_2` FOREIGN KEY (`parent_uuid`) REFERENCES `bundle` (`uuid`))') 'INSERT INTO bundle_dependency (child_uuid, child_path, parent_uuid, parent_path) VALUES (%s, %s, %s, %s), (%s, %s, %s, %s)' ('0xf150a31a891e4173a4cfc90638c143e7', 'input', '0x89e1235ca35e4e07a7dfaa33626058f7', '', '0xf150a31a891e4173a4cfc90638c143e7', 'sort.py', '0xbe37d8b9fac845a4a26abc73271d8177', '')">

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.