Giter VIP home page Giter VIP logo

papermill's Introduction

CI CI image Documentation Status badge badge PyPI - Python Version Code style: black papermill Anaconda-Server Badge pre-commit.ci status

papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.

Papermill lets you:

  • parameterize notebooks
  • execute notebooks

This opens up new opportunities for how notebooks can be used. For example:

  • Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.
  • Do you want to run a notebook and depending on its results, choose a particular notebook to run next? You can now programmatically execute a workflow without having to copy and paste from notebook to notebook manually.

Papermill takes an opinionated approach to notebook parameterization and execution based on our experiences using notebooks at scale in data pipelines.

Installation

From the command line:

pip install papermill

For all optional io dependencies, you can specify individual bundles like s3, or azure -- or use all. To use Black to format parameters you can add as an extra requires ['black'].

pip install papermill[all]

Python Version Support

This library currently supports Python 3.8+ versions. As minor Python versions are officially sunset by the Python org papermill will similarly drop support in the future.

Usage

Parameterizing a Notebook

To parameterize your notebook designate a cell with the tag parameters.

enable parameters in Jupyter

Papermill looks for the parameters cell and treats this cell as defaults for the parameters passed in at execution time. Papermill will add a new cell tagged with injected-parameters with input parameters in order to overwrite the values in parameters. If no cell is tagged with parameters the injected cell will be inserted at the top of the notebook.

Additionally, if you rerun notebooks through papermill and it will reuse the injected-parameters cell from the prior run. In this case Papermill will replace the old injected-parameters cell with the new run's inputs.

image

Executing a Notebook

The two ways to execute the notebook with parameters are: (1) through the Python API and (2) through the command line interface.

Execute via the Python API

import papermill as pm

pm.execute_notebook(
   'path/to/input.ipynb',
   'path/to/output.ipynb',
   parameters = dict(alpha=0.6, ratio=0.1)
)

Execute via CLI

Here's an example of a local notebook being executed and output to an Amazon S3 account:

$ papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

NOTE: If you use multiple AWS accounts, and you have properly configured your AWS credentials, then you can specify which account to use by setting the AWS_PROFILE environment variable at the command-line. For example:

$ AWS_PROFILE=dev_account papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

In the above example, two parameters are set: alpha and l1_ratio using -p (--parameters also works). Parameter values that look like booleans or numbers will be interpreted as such. Here are the different ways users may set parameters:

$ papermill local/input.ipynb s3://bkt/output.ipynb -r version 1.0

Using -r or --parameters_raw, users can set parameters one by one. However, unlike -p, the parameter will remain a string, even if it may be interpreted as a number or boolean.

$ papermill local/input.ipynb s3://bkt/output.ipynb -f parameters.yaml

Using -f or --parameters_file, users can provide a YAML file from which parameter values should be read.

$ papermill local/input.ipynb s3://bkt/output.ipynb -y "
alpha: 0.6
l1_ratio: 0.1"

Using -y or --parameters_yaml, users can directly provide a YAML string containing parameter values.

$ papermill local/input.ipynb s3://bkt/output.ipynb -b YWxwaGE6IDAuNgpsMV9yYXRpbzogMC4xCg==

Using -b or --parameters_base64, users can provide a YAML string, base64-encoded, containing parameter values.

When using YAML to pass arguments, through -y, -b or -f, parameter values can be arrays or dictionaries:

$ papermill local/input.ipynb s3://bkt/output.ipynb -y "
x:
    - 0.0
    - 1.0
    - 2.0
    - 3.0
linear_function:
    slope: 3.0
    intercept: 1.0"

Supported Name Handlers

Papermill supports the following name handlers for input and output paths during execution:

Development Guide

Read CONTRIBUTING.md for guidelines on how to setup a local development environment and make code changes back to Papermill.

For development guidelines look in the DEVELOPMENT_GUIDE.md file. This should inform you on how to make particular additions to the code base.

Documentation

We host the Papermill documentation on ReadTheDocs.

papermill's People

Contributors

aaronmak avatar akx avatar betatim avatar borda avatar captainsafia avatar carlee0 avatar casperdcl avatar choldgraf avatar dependabot[bot] avatar duarteocarmo avatar ewmassey avatar gogasca avatar harph avatar huonw avatar janfreyberg avatar jdetle avatar madhu94 avatar mbektas avatar menglewis avatar michelorengomoodys avatar mpacer avatar mseal avatar mwouts avatar ramantehlan avatar reoono avatar rgbkrk avatar surdouski avatar vincentblortie avatar vizhur avatar willingc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

papermill's Issues

S3 Interface and Testing

From #110 (review) I wanted to keep the issue open for discussing improving our s3 code and testing.

Michel:

Ideally, we would want to use a library to mock boto: either moto (https://github.com/spulec/moto) or placebo (https://github.com/garnaat/placebo) is a good choice. There may be others. Idk. Any preferences?
However, you/we may want to ask yourselves/ourselves whether such module (s3.py) should really exist. It probably makes sense to use a library such as s3fs (https://github.com/dask/s3fs). This is what Pandas uses for S3 and given that this project already has a dependency on Pandas it will not add an "exotic" dependency. Not to say that tests should not be written but if we were to spend time on it we may as well refactor the code and use s3fs.

Matt:

I generally agree with that approach. I'ved used https://github.com/jubos/fake-s3 in the past with a before hook launch -- but it adds ruby as a dependency to tests so I wouldn't recommend it here.

I haven't used s3fs before but your argument sounds solid. There may be a case to be made to add a minimal test here without too much refactor and then go a bigger PR with the swap-over. I'd leave that judgement call to you but I can probably help with s3fs changes/testing later on as well.

`record`ing non-JSON'able objects

I tried to record a numpy array but that fails somewhere in a step to clean the JSON. I think the problem is that there is no builtin way to convert a numpy array to JSON.

What is the recommendation for outputting numpy arrays? For example I could imagine that "arrays tend to be too big, so store them somewhere else and record the path to it" is one option. An alternative would be to add a step to pickle/serialize types that can't be stored as JSON in record before dumping them into the notebook.

I'd be up for implementing the latter option but wanted to discuss ideas first.

Use parameter types to convert arguments

Right now all arguments passed as -p foo 2 are created as floats in the notebook. When using parameters to specify shapes of numpy arrays and the like you end up having to convert them to integers first. We could use the type of the value of the parameter as a guide and convert arguments to that type.

So a "parameters" cell with foo = 2.123 and papermill ... -p foo 2 would result in 2. but a cell with foo = 2 would result in 2.

TraitError: The 'timeout' trait of an ExecutePreprocessor instance must be an integer

When trying to execute a notebook I get the following error:

TraitError: The 'timeout' trait of an ExecutePreprocessor instance must be an integer, but a value of None <type 'NoneType'> was specified.

Code:

pm.execute_notebook(
   'TEST.ipynb',
   'TEST-out.ipynb',
   parameters = dict(test_param='3333')
)

Notebook: TEST.zip
Full stacktrace

Using:
papermill: 0.12.1
Jupyter-notebook: 5.3.1
Python 2.7.11 |Anaconda 2.2.0 (32-bit)| (default, Mar 4 2016, 15:18:41)

Get code coverage to ~90%

We can do it and anyone can help. ๐Ÿ˜„

I've tagged this as Hacktoberfest friendly, feel free to ask questions.

Requirements

Must have knowledge:

  • Python

Optional (you can learn it):

Issue Creation

It's ok if people want to create issues for each of the untested areas then go ahead and tackle them

Need CONTRIBUTING.md

A guide for new contributors would help with sprints & all kinds of new contributors.

misleading warning during papermill run

With the following papermill run,

$ papermill -p x 1 -p y 70 template.ipynb out.ipynb
/usr/local/lib/python2.7/site-packages/jupyter_client/connect.py:157: RuntimeWarning: Failed to set sticky bit on u'/var/folders/kd/cylz4mhs1_9cpsjh0_c_gzfr0000gn/T': [Errno 1] Operation not permitted: '/var/folders/kd/cylz4mhs1_9cpsjh0_c_gzfr0000gn/T'
  RuntimeWarning,

it does write out the out.ipynb file, though the RuntimeWarning made me think that it completely failed. This raises a few things for me:

  • Should warnings be suppressed / collated?
  • Should we have a --verbose / -v mode to show progress at the CLI?

Universal wheel dependency issue

Mostly a note for me to fix the issue, but we're building universal wheel while python 2 has a stricter requirements section for ipython. This means we're pushing that stricter requirement to python 3 installs that use the wheel.

error message suddenly popped up

Hi,

I have been using papermill and NbConvert to parameterize and convert Jupyter Notebooks. It worked really well until recently I started getting the following messages:

/home/florathecat/anaconda3/lib/python3.6/site-packages/nbconvert/filters/datatypefilter.py:41: UserWarning: Your element with mimetype(s) dict_keys(['application/papermill.record+json']) is not able to be represented.
mimetypes=output.keys())

The notebooks are still parameterized and converted fine. It is just the error message bugs me. I have tried to update conda, python, nbconvert, and reinstall papermill. None of these seem to work. I'll appreciate if somebody can help.

Yun

Hide all input for output notebook

From a user:

It would be cool if papermill had an option to hide code cell input when generating an output notebook. This could be very helpful when generating a nightly report and not wanting to see the code that generates the report.

Simple enough. nteract uses metadata.inputHidden as a boolean value to indicate if a cell's input is hidden.

I suppose a flag like --hide-inputs or something? Maybe if we wanted to be more opinionated about the naming we'd call it --report-mode or --mode=report?

Thus far convergence on flag name is --report-mode.

Need a CHANGELOG

We should add a changelog file and include how to update / automate it's population in RELEASE.md.

Recognizing and setting parameter cells

Hey all!

I'd love to make a menu option in nteract to denote a parameter cell in a single click within the nteract web app. As for design, for the moment just to make this simple with our current setup, I'd just be adding it to our menu:

screen shot 2018-01-25 at 8 22 05 am

When it is a parameterized cell, we can show a special border on the top or otherwise to indicate that it's a parameter cell. Haven't thought much on the design much here other than that we want some way to see it visually as users.

It's easy enough for me to mark it with a tag under the covers, however, it would be really nice to just set it in the cell's metadata (or even using the metadata.name attribute):

{
  "cells": [
    {
      "cell_type": "code",
      "metadata": {
        "papermill": {
          "parameter_cell": true
        }
      },
      "outputs": [],
      "source": [
        "x = 3\n",
        "y=3"
      ]
    },
  ],
  "metadata": {},
  "nbformat": 4,
  "nbformat_minor": 2
}

What do you all think? Would papermill be open to looking for this metadata in addition to the current tagging mechanism?

Begin using pytest to simplify tests

Let's start using pytest with future tests since they are simpler to write and maintain. This should result in better code coverage over time.

Feature Request: Dryrun

It would be handy if we had a dryrun mode which parameterized a notebook and saved it to the output path without actually executing the cells. This would enable preparing notebooks for execution elsewhere or with alternative kernels in an upstream process. Should be relatively simple to add.

Can't capture pipes if kernel dies.

The kernels today are usually capturing stdout and stderr messages directly and buffing them into the cell json contents. But if one is running papermill and the kernel dies (e.g. OOM, kill -9) the active messages get lost. I'm trying to figure best approaches for capturing these logs in these events.

There is an attempt to explore capturing pipes on the kernel process via:
ipython/ipykernel#315

papermill fails to install

When in non-development, an install attempts to read the requirements-dev.txt (when, of course, it shouldn't be here). This fails all papermill installs from pip currently:

papermill/setup.py

Lines 26 to 27 in 4e950d0

test_req_path = os.path.join(os.path.dirname('__file__'), 'requirements-dev.txt')
test_required = [req.strip() for req in read(test_req_path).splitlines() if req.strip()]

Papermill hangs if kernel is killed

Today papermill is hanging when kernels get OOM killed instead of returning with an error status code within some reasonable timeframe.

Steps to reproduce:

  • Make a notebook which sleeps indefinitely.
  • Call papermill on the notebook
  • Kill -9 the kernel process
  • See papermill not exit (ever)

Append instead of replace parameter cell

Cells overwrite the contents of the parameter tagged cell which is non-intuitive and easy to mess up. Instead it would be more reasonable if the tagged cell used values there as default values and simply appended its contents to the cell definition. Repeated names would still be overwritten and non-papermill execution could enumerate sane values without breaking papermill execution as it is today. But, it would allow for easier assignment of defaults without adding a new cell. Today all the notebooks I've seen for papermill leave a blank cell for parameters which has to be explained to each person who sees a papermill notebook as opposed to a normal notebook when there shouldn't be so much of a difference.

Debug / Skip cell tag

It'd be useful to have an additional tag which tells papermill to skip particular cells. This would enable cells which have lots of prints, long outputs, or graph outputs which aren't needed for all run to be visible in the notebook but not run on each papermill execution.

execute module is missing

Traceback

$ ipython
Python 3.6.1 (default, Apr  4 2017, 09:40:21)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import papermill
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-790e320f196c> in <module>()
----> 1 import papermill

~/code/src/github.com/nteract/papermill/papermill/__init__.py in <module>()
      5 del get_versions
      6
----> 7 from execute import (
      8     execute_notebook,
      9     set_environment_variable_names,

ModuleNotFoundError: No module named 'execute'

Release 0.12.0

I could really use some of the recent changes in my current tasks. Any objections or particular PRs people want in this release? Was going to maybe wait for #100.

Refactor python2 handling after increasing test coverage

There's a bit of inconsistency of how Python2 is being used alongside Python 3. After test coverage is at 80% ish, we should consider refactoring which libraries are used to best support python2 and python3 so that the code base essentially handles both with a minimum of "if 2" or "if 3". Focus on selecting the best practices for migration from 2 to 3.

nbconvert not working after running execute_notebook in papermill

I'm using papermill to parameterize some notebooks which I'm then exporting as html reports using nbconvert. But, I've run into issues with nbconvert erroring out if I use it after running pm.execute_notebook(). Specifically, this line is the cause since it hijacks nbconvert's Preprocessor.preprocess method: https://github.com/nteract/papermill/blob/master/papermill/execute.py#L141

I ended up getting it to work by saving the original nbconvert preprocess method and then reassigning it after I've used papermill. But, I was wondering, is there a better way?

Notebook validation failed

I get a notebook validation failed

Notebook validation failed: {u'TEST_All': 7.2} is not valid under any of the given schemas:
{
 "TEST_All": 7.2
}

The code is the most basic possible:

import papermill as pm
pm.record("TEST_All",7.2)

image

Using:
papermill: 0.12.1
Jupyter-notebook: 5.3.1
Python 2.7.11 |Anaconda 2.2.0 (32-bit)| (default, Mar 4 2016, 15:18:41)

Any help is appreciated
Attached notebook: TEST.zip
.

Quoted parameters cause Python parse failures

Inputs with escaped or wrapped double quotes inside (-p foo '{"bar":"baz"}') causes notebook execution to fail with syntax errors. The example above will result in foo = "{"bar":"baz"}" in the notebook which isn't valid Python with the wrapping quotes.

Cell execution duration

On a notebook that has been executed, it would be helpful if the cell run duration was stored with the cell output. Ideally this could be visualized as well, perhaps something like:

In [5]: 
1:42

python2 install?

Can papermill be installed in a docker container with python2.7? My docker container does not have a connection to the internet. I have successfully installed the dependencies (botocore, boto3, tqdm, click, s3transfer). When I try to install using pip install --user papermill-0.12.3-py3-none-any.whl I get hte message papermill-0.12.3-py3-none-any.whl is not a supported wheel on this platform. Any ideas?

metadata info not set up in the cell

I was going through the example "Displaying Plots and Images Saved by Other Notebooks". However when I tried to display plots in another notebook, it failed.
I found that, in the ceil that has the plot, the ceill has the keys:[u'output_type', u'data', u'metadata'], and the data field actuallly has the plot, but the metadata is a empty dictionary. It seems to me the metadata is not correctly set up for the ceil, but I am not sure what happened.
My ipython version is 5.3.0, python is 2.7. matplotlib version is 1.5.1.
c1

c2

Setup some papermill "jobs" on CircleCI

Either as part of this repo or a new repo called something like "papermill-example-workflow", create a regularly running job on CircleCI that will run a notebook via papermill and post the result somewhere. It should be a nice way to demonstrate how to use papermill while also operating as a good functional test.

CircleCI will need to be enabled for the repo for this to work.

Use name not tags for unique cell objects

In

if "parameters" in cell.metadata.tags:
parameters_indices.append(idx)
if not parameters_indices:
raise PapermillException("No parameters tag found")
elif len(parameters_indices) > 1:
raise PapermillException("Multiple parameters tags found")
return parameters_indices[0]

it makes the assumption that only one cell will have any parameters being defined in the notebook and raises an error if this is not the case.

We have name as a cell-level metadata field in the nbformat spec that specifically requires values to be unique across the notebook. If you want to maintain this uniqueness qualifier it would be cleaner to require setting the name on a cell rather than a tag.

If the only reason to not do this is that the UI for setting a name vs. tags is more inconvenient, that suggests we have a frontend UI issue. In that caseโ€ฆ using tags is just a compromise made for the current state of front-ends and should probably be deprecated as soon as possible in favour of setting name properly.

Dask (or threadpool) friendly functions

Provide plain functions that let you run a parametrized notebook directly on a dask client.

from dask.distributed import Client
client = Client()

futures = []
for param1 in range(20):
    future = client.submit(papermill.execute_notebook, notebook, param1=param1)
    # Future<NotebookNode>
    futures.append(future)

summary = client.submit(summarize_notebooks, futures)
df = summary.result()
# DataFrame<PapermillNotebookSummary>

Problem when outputing long dataframes

For example

The target notebook, 'test.ipynb',

import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=200)
df = pd.DataFrame(np.random.randn(200,4), index=dates, columns=list('ABCD'))
df

The code running it

import papermill as pm
pm.execute_notebook(
   notebook='test.ipynb',
   output='test_out.ipynb',
)

When open the output, it will display

qq

However, if I use df.head(20), there will not be such a problem

Default to first cell or prefix a new cell for parameters

When a user doesn't specify a parameter tag cell it would be nice if papermill defaulted to some sane logical setting. Nominally parameterizing the beginning of the notebook seems reasonable and would make adoption of existing notebooks quicker when they have naive inputs.

read_notebooks should be able to take a glob

Instead of having to have a special directory to read from, I'd like to be able to read in a collection of notebooks like this:

pm.read_notebooks('out-*.ipynb')

I'm assuming this could probably be done with the glob module.

possibility start parametrized notebook server instead of only headless execution / write parametrized notebook to *.ipynb

I love the run_notebook option and this is probably the main use-case, but sometimes i'd like to be able (or better have a service to which I don't have direct access to like binder) to start an interactive notebook server but with the parameters predetermined

a la

papermill save --input input.ipynb --parameter a='hello world' --output output.ipynb
jupyter notebook.ipynb

or better

papermill serve input.ipynb --parameter a='hello world' 

would others see this as useful as well?

Is it possible to run with Jupyter Notebook (web env)?

As I see in the doc, papermill requires

Parameterizing a Notebook.
To parameterize your notebook designate a cell with the tag parameters. Papermill looks for the parameters cell and replaces those values with the parameters passed in at execution time.

However, I don't find places to add tag to cell in original Jupyter Notebook. Is it only available in nteract?

Master build failing

Looks like a dependency broke somewhere. Discovered it in #88 where I though I had messed up but clean master recreates the issue now for me. I have an older virtualenv where the tests pass while a fresh env fails. The error is with traitlets but that version didn't change so it's another package interacting with it. If it doesn't resolve itself beforehand I'll look more deeply at it on Monday.

Broken pip list

$ pip freeze -l
ansiwrap==0.8.3
attrs==17.4.0
backports-abc==0.5
backports.shutil-get-terminal-size==1.0.0
bleach==2.1.2
boto3==1.5.14
botocore==1.8.28
certifi==2017.11.5
chardet==3.0.4
click==6.7
codecov==2.0.13
configparser==3.5.0
coverage==4.4.2
decorator==4.2.0
docutils==0.14
entrypoints==0.2.3
enum34==1.1.6
funcsigs==1.0.2
functools32==3.2.3.post2
futures==3.2.0
html5lib==1.0.1
idna==2.6
ipykernel==4.7.0
ipython==5.5.0
ipython-genutils==0.2.0
ipywidgets==7.1.0
Jinja2==2.10
jmespath==0.9.3
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.1
jupyter-console==5.2.0
jupyter-core==4.4.0
MarkupSafe==1.0
mistune==0.8.3
mock==2.0.0
nbconvert==5.3.1
nbformat==4.4.0
notebook==5.2.2
numpy==1.14.0
pandas==0.22.0
pandocfilters==1.4.2
pathlib2==2.3.0
pbr==3.1.1
pexpect==4.3.1
pickleshare==0.7.4
pluggy==0.6.0
prompt-toolkit==1.0.15
ptyprocess==0.5.2
py==1.5.2
Pygments==2.2.0
pytest==3.3.2
pytest-cov==2.5.1
python-dateutil==2.6.1
pytz==2017.3
PyYAML==3.12
pyzmq==16.0.3
qtconsole==4.3.1
requests==2.18.4
s3transfer==0.1.12
scandir==1.6
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.11.0
terminado==0.8.1
testpath==0.3.1
textwrap3==0.9.1
tornado==4.5.3
tqdm==4.19.5
traitlets==4.3.2
urllib3==1.22
wcwidth==0.1.7
webencodings==0.5.1
widgetsnbextension==3.1.0

Working pip list

$ pip freeze -l
ansiwrap==0.8.3
astroid==1.5.3
attrs==17.3.0
backports-abc==0.5
backports.functools-lru-cache==1.4
backports.shutil-get-terminal-size==1.0.0
bleach==2.1.1
boto3==1.4.7
botocore==1.7.44
certifi==2017.11.5
chardet==3.0.4
click==6.7
codecov==2.0.10
configparser==3.5.0
coverage==4.4.2
decorator==4.1.2
docutils==0.14
entrypoints==0.2.3
enum34==1.1.6
funcsigs==1.0.2
functools32==3.2.3.post2
future==0.16.0
futures==3.1.1
html5lib==1.0b10
idna==2.6
ipykernel==4.6.1
ipython==5.5.0
ipython-genutils==0.2.0
ipywidgets==7.0.4
isort==4.2.15
Jinja2==2.10
jmespath==0.9.3
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.1.0
jupyter-console==5.2.0
jupyter-core==4.4.0
lazy-object-proxy==1.3.1
MarkupSafe==1.0
mccabe==0.6.1
mistune==0.8.1
mock==2.0.0
nbconvert==5.3.1
nbformat==4.4.0
notebook==5.2.1
numpy==1.13.3
pandas==0.21.0
pandocfilters==1.4.2
papermill==0.11.5
pathlib2==2.3.0
pbr==3.1.1
pexpect==4.3.0
pickleshare==0.7.4
pkginfo==1.4.1
pluggy==0.6.0
prompt-toolkit==1.0.15
ptyprocess==0.5.2
py==1.5.2
Pygments==2.2.0
pylint==1.7.4
pytest==3.3.1
pytest-cov==2.5.1
python-dateutil==2.6.1
pytz==2017.3
PyYAML==3.12
pyzmq==16.0.3
qtconsole==4.3.1
requests==2.18.4
requests-toolbelt==0.8.0
s3transfer==0.1.11
scandir==1.6
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.11.0
terminado==0.7
testpath==0.3.1
textwrap3==0.9.1
tornado==4.5.2
tqdm==4.19.4
traitlets==4.3.2
twine==1.9.1
urllib3==1.22
wcwidth==0.1.7
webencodings==0.5.1
widgetsnbextension==3.0.7
wrapt==1.10.11

Error on record() with new versions

Versions above 0.11 seems to present an error when running the record() function.
As the example on the readme file:

"""notebook.ipynb"""
import papermill as pm

pm.record("hello", "world")
pm.record("number", 123)
pm.record("some_list", [1, 3, 5])
pm.record("some_dict", {"a": 1, "b": 2})`

gives:
`---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-447d49aec58c> in <module>()
      2 import papermill as pm
      3 
----> 4 pm.record("hello", "world")
      5 pm.record("number", 123)
      6 pm.record("some_list", [1, 3, 5])

~\Anaconda3\envs\mestrado\lib\site-packages\papermill\api.py in record(name, value)
     33     # IPython.display.display takes a tuple of objects as first parameter
     34     # `http://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html#IPython.display.display`
---> 35     ip_display(({RECORD_OUTPUT_TYPE: {name: value}},), raw=True)
     36 
     37 

~\Anaconda3\envs\mestrado\lib\site-packages\IPython\core\display.py in display(include, exclude, metadata, transient, display_id, *objs, **kwargs)
    293     for obj in objs:
    294         if raw:
--> 295             publish_display_data(data=obj, metadata=metadata, **kwargs)
    296         else:
    297             format_dict, md_dict = format(obj, include=include, exclude=exclude)

~\Anaconda3\envs\mestrado\lib\site-packages\IPython\core\display.py in publish_display_data(data, metadata, source, transient, **kwargs)
    118         data=data,
    119         metadata=metadata,
--> 120         **kwargs
    121     )
    122 

~\Anaconda3\envs\mestrado\lib\site-packages\ipykernel\zmqshell.py in publish(self, data, metadata, source, transient, update)
    115         if transient is None:
    116             transient = {}
--> 117         self._validate_data(data, metadata)
    118         content = {}
    119         content['data'] = encode_images(data)

~\Anaconda3\envs\mestrado\lib\site-packages\IPython\core\displaypub.py in _validate_data(self, data, metadata)
     48 
     49         if not isinstance(data, dict):
---> 50             raise TypeError('data must be a dict, got: %r' % data)
     51         if metadata is not None:
     52             if not isinstance(metadata, dict):

TypeError: data must be a dict, got: {'application/papermill.record+json': {'hello': 'world'}}`

Use argument values as default values

I was half expecting the values in the cell tagged as "parameters" to work as default values. This would be convenient for things like random seeds.

Namespaced parameters syntax improvement

Passing objects representing dicts is painful in papermill right now. You have to pass each value individually and catch the corresponding names inside the notebook. But when you have a large group of parameters all together (in say a json object) it'd be very convenient to specify those associated values on the input side but not inside the notebook.

Specifically a syntax along the lines of -p foo.bar.baz value could be passed to the command line resulting in the parameters cell a dict of form "foo" = {"bar": {"baz": "value"}}}. For repeated paths the dict could be enriched such that a second param of the form -p foo.bar.baz2 value2 would combine to form foo = {"bar": {"baz": "value", "baz2": "value2"}}}. These parameters wouldn't need to share prefix paths, so -p foo.bar2 baz3 would augment the top foo dict instead of the nested foo -> baz dict.

This would enable passing dynamic or many valued parameters that are together associated as individual inputs which are human readable and getting a clean hierarchy of dicts on the output.

Given the merge behavior of each assignment it could also be used to merge with existing dict variables to provide foo = {'default': 'values'} beforehand and have augmentation via the command line pass-through.

Scala parameter mapping needs to indicate long status for larger integers

// Parameters
val RUN_TS = 1528866180240
==========================================
Name: Compile Error
Message: <console>:5: error: integer number too large
val RUN_TS = 1528866180240
             ^
<console>:6: error: ';' expected but 'val' found.
val RUN_DATE_MINUS_1 = 20180612
^

StackTrace: 

Should instead populate as val RUN_TS = 1528866180240L

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.