nteract / papermill Goto Github PK

View Code? Open in Web Editor NEW

5.6K 5.6K 415.0 1.92 MB

📚 Parameterize, execute, and analyze notebooks

Home Page: http://papermill.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 84.08% Jupyter Notebook 15.82% Shell 0.10%

julia jupyter notebook notebook-generator notebooks nteract pipeline publishing python r scala

papermill's Issues

Use argument values as default values

I was half expecting the values in the cell tagged as "parameters" to work as default values. This would be convenient for things like random seeds.

Dask (or threadpool) friendly functions

Provide plain functions that let you run a parametrized notebook directly on a dask client.

from dask.distributed import Client
client = Client()

futures = []
for param1 in range(20):
    future = client.submit(papermill.execute_notebook, notebook, param1=param1)
    # Future<NotebookNode>
    futures.append(future)

summary = client.submit(summarize_notebooks, futures)
df = summary.result()
# DataFrame<PapermillNotebookSummary>

Universal wheel dependency issue

Mostly a note for me to fix the issue, but we're building universal wheel while python 2 has a stricter requirements section for ipython. This means we're pushing that stricter requirement to python 3 installs that use the wheel.

Default to first cell or prefix a new cell for parameters

When a user doesn't specify a parameter tag cell it would be nice if papermill defaulted to some sane logical setting. Nominally parameterizing the beginning of the notebook seems reasonable and would make adoption of existing notebooks quicker when they have naive inputs.

Quoted parameters cause Python parse failures

Inputs with escaped or wrapped double quotes inside (-p foo '{"bar":"baz"}') causes notebook execution to fail with syntax errors. The example above will result in foo = "{"bar":"baz"}" in the notebook which isn't valid Python with the wrapping quotes.

setup tox for testing locally

Make sure we work across python2 and python3

metadata info not set up in the cell

I was going through the example "Displaying Plots and Images Saved by Other Notebooks". However when I tried to display plots in another notebook, it failed.
I found that, in the ceil that has the plot, the ceill has the keys:[u'output_type', u'data', u'metadata'], and the data field actuallly has the plot, but the metadata is a empty dictionary. It seems to me the metadata is not correctly set up for the ceil, but I am not sure what happened.
My ipython version is 5.3.0, python is 2.7. matplotlib version is 1.5.1.

Make Custom Preprocessor for papermill

We are currently monkey patching the preprocessor in the execute.py module. We should be using subclassing the Preprocessor class as intended.

https://github.com/nteract/papermill/blob/master/papermill/execute.py#L22

Recognizing and setting parameter cells

Hey all!

I'd love to make a menu option in nteract to denote a parameter cell in a single click within the nteract web app. As for design, for the moment just to make this simple with our current setup, I'd just be adding it to our menu:

When it is a parameterized cell, we can show a special border on the top or otherwise to indicate that it's a parameter cell. Haven't thought much on the design much here other than that we want some way to see it visually as users.

It's easy enough for me to mark it with a tag under the covers, however, it would be really nice to just set it in the cell's metadata (or even using the metadata.name attribute):

{
  "cells": [
    {
      "cell_type": "code",
      "metadata": {
        "papermill": {
          "parameter_cell": true
        }
      },
      "outputs": [],
      "source": [
        "x = 3\n",
        "y=3"
      ]
    },
  ],
  "metadata": {},
  "nbformat": 4,
  "nbformat_minor": 2
}

What do you all think? Would papermill be open to looking for this metadata in addition to the current tagging mechanism?

papermill could warn me I forgot to provide a file extension for the output

Maybe .ipynb and .json are the only reasonable choices here? Maybe just a warning?

nbconvert not working after running execute_notebook in papermill

I'm using papermill to parameterize some notebooks which I'm then exporting as html reports using nbconvert. But, I've run into issues with nbconvert erroring out if I use it after running pm.execute_notebook(). Specifically, this line is the cause since it hijacks nbconvert's Preprocessor.preprocess method: https://github.com/nteract/papermill/blob/master/papermill/execute.py#L141

I ended up getting it to work by saving the original nbconvert preprocess method and then reassigning it after I've used papermill. But, I was wondering, is there a better way?

Get code coverage to ~90%

We can do it and anyone can help. 😄

I've tagged this as Hacktoberfest friendly, feel free to ask questions.

Requirements

Must have knowledge:

Python

Optional (you can learn it):

py.test

Issue Creation

It's ok if people want to create issues for each of the untested areas then go ahead and tackle them

Setup some papermill "jobs" on CircleCI

Either as part of this repo or a new repo called something like "papermill-example-workflow", create a regularly running job on CircleCI that will run a notebook via papermill and post the result somewhere. It should be a nice way to demonstrate how to use papermill while also operating as a good functional test.

CircleCI will need to be enabled for the repo for this to work.

error message suddenly popped up

Hi,

I have been using papermill and NbConvert to parameterize and convert Jupyter Notebooks. It worked really well until recently I started getting the following messages:

/home/florathecat/anaconda3/lib/python3.6/site-packages/nbconvert/filters/datatypefilter.py:41: UserWarning: Your element with mimetype(s) dict_keys(['application/papermill.record+json']) is not able to be represented.
mimetypes=output.keys())

The notebooks are still parameterized and converted fine. It is just the error message bugs me. I have tried to update conda, python, nbconvert, and reinstall papermill. None of these seem to work. I'll appreciate if somebody can help.

Yun

Clearly tag original parameter cells and inserted parameter cells

Follow on to the work from #74 to more clearly identify the original cell that was tagged as a parameters cell and the inserted cell with passed parameters.

Cell execution duration

On a notebook that has been executed, it would be helpful if the cell run duration was stored with the cell output. Ideally this could be visualized as well, perhaps something like:

In [5]: 
1:42

Error on record() with new versions

Versions above 0.11 seems to present an error when running the record() function.
As the example on the readme file:

"""notebook.ipynb"""
import papermill as pm

pm.record("hello", "world")
pm.record("number", 123)
pm.record("some_list", [1, 3, 5])
pm.record("some_dict", {"a": 1, "b": 2})`

gives:
`---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-447d49aec58c> in <module>()
      2 import papermill as pm
      3 
----> 4 pm.record("hello", "world")
      5 pm.record("number", 123)
      6 pm.record("some_list", [1, 3, 5])

~\Anaconda3\envs\mestrado\lib\site-packages\papermill\api.py in record(name, value)
     33     # IPython.display.display takes a tuple of objects as first parameter
     34     # `http://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html#IPython.display.display`
---> 35     ip_display(({RECORD_OUTPUT_TYPE: {name: value}},), raw=True)
     36 
     37 

~\Anaconda3\envs\mestrado\lib\site-packages\IPython\core\display.py in display(include, exclude, metadata, transient, display_id, *objs, **kwargs)
    293     for obj in objs:
    294         if raw:
--> 295             publish_display_data(data=obj, metadata=metadata, **kwargs)
    296         else:
    297             format_dict, md_dict = format(obj, include=include, exclude=exclude)

~\Anaconda3\envs\mestrado\lib\site-packages\IPython\core\display.py in publish_display_data(data, metadata, source, transient, **kwargs)
    118         data=data,
    119         metadata=metadata,
--> 120         **kwargs
    121     )
    122 

~\Anaconda3\envs\mestrado\lib\site-packages\ipykernel\zmqshell.py in publish(self, data, metadata, source, transient, update)
    115         if transient is None:
    116             transient = {}
--> 117         self._validate_data(data, metadata)
    118         content = {}
    119         content['data'] = encode_images(data)

~\Anaconda3\envs\mestrado\lib\site-packages\IPython\core\displaypub.py in _validate_data(self, data, metadata)
     48 
     49         if not isinstance(data, dict):
---> 50             raise TypeError('data must be a dict, got: %r' % data)
     51         if metadata is not None:
     52             if not isinstance(metadata, dict):

TypeError: data must be a dict, got: {'application/papermill.record+json': {'hello': 'world'}}`

Debug / Skip cell tag

It'd be useful to have an additional tag which tells papermill to skip particular cells. This would enable cells which have lots of prints, long outputs, or graph outputs which aren't needed for all run to be visible in the notebook but not run on each papermill execution.

Papermill hangs if kernel is killed

Today papermill is hanging when kernels get OOM killed instead of returning with an error status code within some reasonable timeframe.

Steps to reproduce:

Make a notebook which sleeps indefinitely.
Call papermill on the notebook
Kill -9 the kernel process
See papermill not exit (ever)

Collaborate on the Resource Info Request

We'd like to see papermill be able to instrument/request memory usage. This is currently in progress with jupyter/jupyter#264.

Ideally, we'd stick the metrics within the metadata per cell in a similar way to how we do timing (duration, start time, end time, etc.).

/cc @ivanov @jeffwong @ewmassey @mpacer

Review third party package for parameterizing notebooks (Python 3)

Mostly a note to self, as I'll be traveling for a few weeks and would like to check this out: https://github.com/hz-inova/run_jnb

Problem when outputing long dataframes

For example

The target notebook, 'test.ipynb',

import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=200)
df = pd.DataFrame(np.random.randn(200,4), index=dates, columns=list('ABCD'))
df

The code running it

import papermill as pm
pm.execute_notebook(
   notebook='test.ipynb',
   output='test_out.ipynb',
)

When open the output, it will display

However, if I use df.head(20), there will not be such a problem

misleading warning during papermill run

With the following papermill run,

$ papermill -p x 1 -p y 70 template.ipynb out.ipynb
/usr/local/lib/python2.7/site-packages/jupyter_client/connect.py:157: RuntimeWarning: Failed to set sticky bit on u'/var/folders/kd/cylz4mhs1_9cpsjh0_c_gzfr0000gn/T': [Errno 1] Operation not permitted: '/var/folders/kd/cylz4mhs1_9cpsjh0_c_gzfr0000gn/T'
  RuntimeWarning,

it does write out the out.ipynb file, though the RuntimeWarning made me think that it completely failed. This raises a few things for me:

Should warnings be suppressed / collated?
Should we have a --verbose / -v mode to show progress at the CLI?

in papermill github description, it should be 'parameterize', not 'parametrize'

just noticed while writing a blog post linking to you! ;)

python2 install?

Can papermill be installed in a docker container with python2.7? My docker container does not have a connection to the internet. I have successfully installed the dependencies (botocore, boto3, tqdm, click, s3transfer). When I try to install using pip install --user papermill-0.12.3-py3-none-any.whl I get hte message papermill-0.12.3-py3-none-any.whl is not a supported wheel on this platform. Any ideas?

`record`ing non-JSON'able objects

I tried to record a numpy array but that fails somewhere in a step to clean the JSON. I think the problem is that there is no builtin way to convert a numpy array to JSON.

What is the recommendation for outputting numpy arrays? For example I could imagine that "arrays tend to be too big, so store them somewhere else and record the path to it" is one option. An alternative would be to add a step to pickle/serialize types that can't be stored as JSON in record before dumping them into the notebook.

I'd be up for implementing the latter option but wanted to discuss ideas first.

execute module is missing

Traceback

$ ipython
Python 3.6.1 (default, Apr  4 2017, 09:40:21)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import papermill
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-790e320f196c> in <module>()
----> 1 import papermill

~/code/src/github.com/nteract/papermill/papermill/__init__.py in <module>()
      5 del get_versions
      6
----> 7 from execute import (
      8     execute_notebook,
      9     set_environment_variable_names,

ModuleNotFoundError: No module named 'execute'

Use name not tags for unique cell objects

papermill/papermill/execute.py

Lines 157 to 163 in 7b717e7

 if "parameters" in cell.metadata.tags: 

 parameters_indices.append(idx) 

 if not parameters_indices: 

 raise PapermillException("No parameters tag found") 

 elif len(parameters_indices) > 1: 

 raise PapermillException("Multiple parameters tags found") 

 return parameters_indices[0]

it makes the assumption that only one cell will have any parameters being defined in the notebook and raises an error if this is not the case.

We have name as a cell-level metadata field in the nbformat spec that specifically requires values to be unique across the notebook. If you want to maintain this uniqueness qualifier it would be cleaner to require setting the name on a cell rather than a tag.

If the only reason to not do this is that the UI for setting a name vs. tags is more inconvenient, that suggests we have a frontend UI issue. In that case… using tags is just a compromise made for the current state of front-ends and should probably be deprecated as soon as possible in favour of setting name properly.

Hide all input for output notebook

From a user:

It would be cool if papermill had an option to hide code cell input when generating an output notebook. This could be very helpful when generating a nightly report and not wanting to see the code that generates the report.

Simple enough. nteract uses metadata.inputHidden as a boolean value to indicate if a cell's input is hidden.

~~I suppose a flag like --hide-inputs or something? Maybe if we wanted to be more opinionated about the naming we'd call it --report-mode or --mode=report?~~

Thus far convergence on flag name is --report-mode.

Need a CHANGELOG

We should add a changelog file and include how to update / automate it's population in RELEASE.md.

Use parameter types to convert arguments

Right now all arguments passed as -p foo 2 are created as floats in the notebook. When using parameters to specify shapes of numpy arrays and the like you end up having to convert them to integers first. We could use the type of the value of the parameter as a guide and convert arguments to that type.

So a "parameters" cell with foo = 2.123 and papermill ... -p foo 2 would result in 2. but a cell with foo = 2 would result in 2.

Can't capture pipes if kernel dies.

The kernels today are usually capturing stdout and stderr messages directly and buffing them into the cell json contents. But if one is running papermill and the kernel dies (e.g. OOM, kill -9) the active messages get lost. I'm trying to figure best approaches for capturing these logs in these events.

There is an attempt to explore capturing pipes on the kernel process via:
ipython/ipykernel#315

Feature Request: Dryrun

It would be handy if we had a dryrun mode which parameterized a notebook and saved it to the output path without actually executing the cells. This would enable preparing notebooks for execution elsewhere or with alternative kernels in an upstream process. Should be relatively simple to add.

possibility start parametrized notebook server instead of only headless execution / write parametrized notebook to *.ipynb

I love the run_notebook option and this is probably the main use-case, but sometimes i'd like to be able (or better have a service to which I don't have direct access to like binder) to start an interactive notebook server but with the parameters predetermined

a la

papermill save --input input.ipynb --parameter a='hello world' --output output.ipynb
jupyter notebook.ipynb

or better

papermill serve input.ipynb --parameter a='hello world'

would others see this as useful as well?

Is it possible to run with Jupyter Notebook (web env)?

As I see in the doc, papermill requires

Parameterizing a Notebook.
To parameterize your notebook designate a cell with the tag parameters. Papermill looks for the parameters cell and replaces those values with the parameters passed in at execution time.

However, I don't find places to add tag to cell in original Jupyter Notebook. Is it only available in nteract?

Scala parameter mapping needs to indicate long status for larger integers

// Parameters
val RUN_TS = 1528866180240
==========================================
Name: Compile Error
Message: <console>:5: error: integer number too large
val RUN_TS = 1528866180240
             ^
<console>:6: error: ';' expected but 'val' found.
val RUN_DATE_MINUS_1 = 20180612
^

StackTrace:

Should instead populate as val RUN_TS = 1528866180240L

Begin using pytest to simplify tests

Let's start using pytest with future tests since they are simpler to write and maintain. This should result in better code coverage over time.

Add ability to parse http(s) input paths

Passing an http route as the notebook input results in an io.read error trying to use the local handler. Instead it should check for http prefixed uris.

TraitError: The 'timeout' trait of an ExecutePreprocessor instance must be an integer

When trying to execute a notebook I get the following error:

TraitError: The 'timeout' trait of an ExecutePreprocessor instance must be an integer, but a value of None <type 'NoneType'> was specified.

Code:

pm.execute_notebook(
   'TEST.ipynb',
   'TEST-out.ipynb',
   parameters = dict(test_param='3333')
)

Notebook: TEST.zip
Full stacktrace

Using:
papermill: 0.12.1
Jupyter-notebook: 5.3.1
Python 2.7.11 |Anaconda 2.2.0 (32-bit)| (default, Mar 4 2016, 15:18:41)

Release 0.12.0

I could really use some of the recent changes in my current tasks. Any objections or particular PRs people want in this release? Was going to maybe wait for #100.

Master build failing

Looks like a dependency broke somewhere. Discovered it in #88 where I though I had messed up but clean master recreates the issue now for me. I have an older virtualenv where the tests pass while a fresh env fails. The error is with traitlets but that version didn't change so it's another package interacting with it. If it doesn't resolve itself beforehand I'll look more deeply at it on Monday.

Broken pip list

$ pip freeze -l
ansiwrap==0.8.3
attrs==17.4.0
backports-abc==0.5
backports.shutil-get-terminal-size==1.0.0
bleach==2.1.2
boto3==1.5.14
botocore==1.8.28
certifi==2017.11.5
chardet==3.0.4
click==6.7
codecov==2.0.13
configparser==3.5.0
coverage==4.4.2
decorator==4.2.0
docutils==0.14
entrypoints==0.2.3
enum34==1.1.6
funcsigs==1.0.2
functools32==3.2.3.post2
futures==3.2.0
html5lib==1.0.1
idna==2.6
ipykernel==4.7.0
ipython==5.5.0
ipython-genutils==0.2.0
ipywidgets==7.1.0
Jinja2==2.10
jmespath==0.9.3
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.1
jupyter-console==5.2.0
jupyter-core==4.4.0
MarkupSafe==1.0
mistune==0.8.3
mock==2.0.0
nbconvert==5.3.1
nbformat==4.4.0
notebook==5.2.2
numpy==1.14.0
pandas==0.22.0
pandocfilters==1.4.2
pathlib2==2.3.0
pbr==3.1.1
pexpect==4.3.1
pickleshare==0.7.4
pluggy==0.6.0
prompt-toolkit==1.0.15
ptyprocess==0.5.2
py==1.5.2
Pygments==2.2.0
pytest==3.3.2
pytest-cov==2.5.1
python-dateutil==2.6.1
pytz==2017.3
PyYAML==3.12
pyzmq==16.0.3
qtconsole==4.3.1
requests==2.18.4
s3transfer==0.1.12
scandir==1.6
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.11.0
terminado==0.8.1
testpath==0.3.1
textwrap3==0.9.1
tornado==4.5.3
tqdm==4.19.5
traitlets==4.3.2
urllib3==1.22
wcwidth==0.1.7
webencodings==0.5.1
widgetsnbextension==3.1.0

Working pip list

$ pip freeze -l
ansiwrap==0.8.3
astroid==1.5.3
attrs==17.3.0
backports-abc==0.5
backports.functools-lru-cache==1.4
backports.shutil-get-terminal-size==1.0.0
bleach==2.1.1
boto3==1.4.7
botocore==1.7.44
certifi==2017.11.5
chardet==3.0.4
click==6.7
codecov==2.0.10
configparser==3.5.0
coverage==4.4.2
decorator==4.1.2
docutils==0.14
entrypoints==0.2.3
enum34==1.1.6
funcsigs==1.0.2
functools32==3.2.3.post2
future==0.16.0
futures==3.1.1
html5lib==1.0b10
idna==2.6
ipykernel==4.6.1
ipython==5.5.0
ipython-genutils==0.2.0
ipywidgets==7.0.4
isort==4.2.15
Jinja2==2.10
jmespath==0.9.3
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.1.0
jupyter-console==5.2.0
jupyter-core==4.4.0
lazy-object-proxy==1.3.1
MarkupSafe==1.0
mccabe==0.6.1
mistune==0.8.1
mock==2.0.0
nbconvert==5.3.1
nbformat==4.4.0
notebook==5.2.1
numpy==1.13.3
pandas==0.21.0
pandocfilters==1.4.2
papermill==0.11.5
pathlib2==2.3.0
pbr==3.1.1
pexpect==4.3.0
pickleshare==0.7.4
pkginfo==1.4.1
pluggy==0.6.0
prompt-toolkit==1.0.15
ptyprocess==0.5.2
py==1.5.2
Pygments==2.2.0
pylint==1.7.4
pytest==3.3.1
pytest-cov==2.5.1
python-dateutil==2.6.1
pytz==2017.3
PyYAML==3.12
pyzmq==16.0.3
qtconsole==4.3.1
requests==2.18.4
requests-toolbelt==0.8.0
s3transfer==0.1.11
scandir==1.6
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.11.0
terminado==0.7
testpath==0.3.1
textwrap3==0.9.1
tornado==4.5.2
tqdm==4.19.4
traitlets==4.3.2
twine==1.9.1
urllib3==1.22
wcwidth==0.1.7
webencodings==0.5.1
widgetsnbextension==3.0.7
wrapt==1.10.11

read_notebooks should be able to take a glob

Instead of having to have a special directory to read from, I'd like to be able to read in a collection of notebooks like this:

pm.read_notebooks('out-*.ipynb')

I'm assuming this could probably be done with the glob module.

Namespaced parameters syntax improvement

Passing objects representing dicts is painful in papermill right now. You have to pass each value individually and catch the corresponding names inside the notebook. But when you have a large group of parameters all together (in say a json object) it'd be very convenient to specify those associated values on the input side but not inside the notebook.

Specifically a syntax along the lines of -p foo.bar.baz value could be passed to the command line resulting in the parameters cell a dict of form "foo" = {"bar": {"baz": "value"}}}. For repeated paths the dict could be enriched such that a second param of the form -p foo.bar.baz2 value2 would combine to form foo = {"bar": {"baz": "value", "baz2": "value2"}}}. These parameters wouldn't need to share prefix paths, so -p foo.bar2 baz3 would augment the top foo dict instead of the nested foo -> baz dict.

This would enable passing dynamic or many valued parameters that are together associated as individual inputs which are human readable and getting a clean hierarchy of dicts on the output.

Given the merge behavior of each assignment it could also be used to merge with existing dict variables to provide foo = {'default': 'values'} beforehand and have augmentation via the command line pass-through.

S3 Interface and Testing

From #110 (review) I wanted to keep the issue open for discussing improving our s3 code and testing.

Michel:

Ideally, we would want to use a library to mock boto: either moto (https://github.com/spulec/moto) or placebo (https://github.com/garnaat/placebo) is a good choice. There may be others. Idk. Any preferences?
However, you/we may want to ask yourselves/ourselves whether such module (s3.py) should really exist. It probably makes sense to use a library such as s3fs (https://github.com/dask/s3fs). This is what Pandas uses for S3 and given that this project already has a dependency on Pandas it will not add an "exotic" dependency. Not to say that tests should not be written but if we were to spend time on it we may as well refactor the code and use s3fs.

Matt:

I generally agree with that approach. I'ved used https://github.com/jubos/fake-s3 in the past with a before hook launch -- but it adds ruby as a dependency to tests so I wouldn't recommend it here.

I haven't used s3fs before but your argument sounds solid. There may be a case to be made to add a minimal test here without too much refactor and then go a bigger PR with the swap-over. I'd leave that judgement call to you but I can probably help with s3fs changes/testing later on as well.

Notebook validation failed when output with markdown

If I add a markdown cell, for example

# This is the title

After execute from papermill, it will report

I think the reason might be that papermill has not dealt with markdown cell yet？

Notebook validation failed

I get a notebook validation failed

Notebook validation failed: {u'TEST_All': 7.2} is not valid under any of the given schemas:
{
 "TEST_All": 7.2
}

The code is the most basic possible:

import papermill as pm
pm.record("TEST_All",7.2)

Using:
papermill: 0.12.1
Jupyter-notebook: 5.3.1
Python 2.7.11 |Anaconda 2.2.0 (32-bit)| (default, Mar 4 2016, 15:18:41)

Any help is appreciated
Attached notebook: TEST.zip
.

Refactor python2 handling after increasing test coverage

There's a bit of inconsistency of how Python2 is being used alongside Python 3. After test coverage is at 80% ish, we should consider refactoring which libraries are used to best support python2 and python3 so that the code base essentially handles both with a minimum of "if 2" or "if 3". Focus on selecting the best practices for migration from 2 to 3.

Append instead of replace parameter cell

Cells overwrite the contents of the parameter tagged cell which is non-intuitive and easy to mess up. Instead it would be more reasonable if the tagged cell used values there as default values and simply appended its contents to the cell definition. Repeated names would still be overwritten and non-papermill execution could enumerate sane values without breaking papermill execution as it is today. But, it would allow for easier assignment of defaults without adding a new cell. Today all the notebooks I've seen for papermill leave a blank cell for parameters which has to be explained to each person who sees a papermill notebook as opposed to a normal notebook when there shouldn't be so much of a difference.

Need CONTRIBUTING.md

A guide for new contributors would help with sprints & all kinds of new contributors.

papermill fails to install

When in non-development, an install attempts to read the requirements-dev.txt (when, of course, it shouldn't be here). This fails all papermill installs from pip currently:

papermill/setup.py

Lines 26 to 27 in 4e950d0

 test_req_path = os.path.join(os.path.dirname('__file__'), 'requirements-dev.txt') 

 test_required = [req.strip() for req in read(test_req_path).splitlines() if req.strip()]

	if "parameters" in cell.metadata.tags:
	parameters_indices.append(idx)
	if not parameters_indices:
	raise PapermillException("No parameters tag found")
	elif len(parameters_indices) > 1:
	raise PapermillException("Multiple parameters tags found")
	return parameters_indices[0]

	test_req_path = os.path.join(os.path.dirname('__file__'), 'requirements-dev.txt')
	test_required = [req.strip() for req in read(test_req_path).splitlines() if req.strip()]

nteract / papermill Goto Github PK

papermill's Issues

Requirements

Issue Creation

Recommend Projects

Recommend Topics

Recommend Org