rossant / ipycache Goto Github PK

Defines a %%cache cell magic in the IPython notebook to cache results of long-lasting computations in a persistent pickle file

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

ipycache's Introduction

ipycache

Defines a %%cache cell magic in the IPython notebook to cache results and outputs of long-lasting computations in a persistent pickle file. Useful when some computations in a notebook are long and you want to easily save the results in a file.

Examples

Installation

Latest PyPI release:

pip install ipycache

Latest development version:

pip install git+https://github.com/rossant/ipycache.git

Usage

In IPython, execute the following:

%load_ext ipycache

Then, create a cell with:

%%cache mycache.pkl var1 var2
var1 = 1
var2 = 2

When you execute this cell the first time, the code is executed, and the variables var1 and var2 are saved in mycache.pkl in the current directory along with the outputs. Rich display outputs are only saved if you use the development version of IPython. When you execute this cell again, the code is skipped, the variables are loaded from the file and injected into the namespace, and the outputs are restored in the notebook.

Alternatively use $file_name instead of mycache.pkl, where file_name is a variable holding the path to the file used for caching.

Use the --force or -f option to force the cell's execution and overwrite the file.

Use the --read or -r option to prevent the cell's execution and always load the variables from the cache. An exception is raised if the file does not exist.

Use the --cachedir or -d option to specify the cache directory. You can specify a default directory in the IPython configuration file in your profile (typically in ~\.ipython\profile_default\ipython_config.py) by adding the following line:

c.CacheMagics.cachedir = "/path/to/mycache"

If both a default cache directory and the --cachedir option are given, the latter is used.

ipycache's People

Contributors

Stargazers

Watchers

ipycache's Issues

Adding tags for corresponding releases

It would be really helpful if we could have tags for corresponding releases here. In particular, this would help us checkout that release and test it before we package and ship it in conda-forge.

cc @jochym

Semi-automatic caching of outputs by input hash.

I would like to be able to cache the output of several different cells. The problem is that each cache file seems to store exactly one output so that I must use a different file for each cell, which is cumbersome.

Ideally, I would like to be able to first specify a cache file (not sure the optimal syntax) like:

%%cache --set-cache "my_cache.pkl"

Then cache several output cells with something as simple as:

%%cache
!hg sum

Another cell might be

%%cache
from my_module import f
from line_profiler import LineProfiler
profile = LineProfiler()
profile.add_function(f)
profile.run('f()')
profile.print_stats()

Ideally, each output would be stored in a dictionary who's key is a hash of the cell's input so that the cell is executed if needed, but -- more importantly -- the appropriate output is restored if one has several cells.

This mechanism would be defeated by cells that have identical inputs:

%%cache
!time

but I could live with that. (To get around this, some sort of context of the cell in the original notebook would be needed, but I think this might be a can of worms.)

Backwards Compatibility

I have not delved into the source yet, so I am not sure how easy this would be to implement, however, I propose as a minimal syntax that this feature would only work if one first specifies a cache file with --set-cache "my_cache.pkl", otherwise the default behaviour continues. Only if this has been set would a blank %%cache line work as described (otherwise, the usual error would be raised).

Is this feasible, or is there a better way to "freeze" the output of calculations?

Michael.

P.S. My use case is interactive profiling and improving code. I want to freeze the previous profiling outputs so that the notebook becomes a log of the profiling process. I don't yet see much of a need for caching variables, but need to cache the output, so perhaps this extension is not the best fit for my needs, but it almost works.

Migrate from Travis CI to GitHub Actions

This repo is currently using Travis CI for testing, but Travis CI has long stopped supporting open-source projects the way it used to (you now have to apply initially, and keep renewing access for Travis CI, which may be cancelled or rejected at any time).

In contrast, GitHub offers GitHub Actions as a very simple (and similar) CI system, which is free for open-source projects. I'm happy to help migrate this project from Travis CI to GitHub Actions (I've done this for other projects I own; no affiliation with either Travis or GitHub).

@rossant — since you mentioned on another issue that you're no longer maintaining this project, I'm happy to help maintain the repo if you'd like to give me access.

allow code evaluation in path spec

Hi,

I like to automatically generate cache-file names such that they are nicely organized in a folder structure by notebook. I'd like to do that in the cache magic itself such that the following should work:

%%cache {get_path('test.pkl')} var
# create var

I implemented this in the following commit. If you like it, I can set up a PR for it.

ihrke/ipycache@617f408

cPickle import error

ImportError: No module named 'cPickle'

On Python 3.5

automatically re-run cell if content changed

I think it would be convenient to re-run cells whose content has changed.
One way is to hash the cells' code and put it into the pickle file and compare it when re-running.
We could enable/disable this behavior with some configuration flag, e.g., --enable-auto-rerun.

NameError for `moves` in Py3 conditional

At this line: https://github.com/rossant/ipycache/blob/master/ipycache.py#L32

I don't see what it is supposed to be, so I'm just pointing out a bug

Update license copyright line, wording, and file extension

The current version of LICENSE.md has a few issues:

the copyright is listed as "Copyright (c) 2013, Cyrille Rossant" but there have been other contributors since then besides Cyrille

Solution options: (a) we can switch this to "Copyright (c) 2013 Cyrille Rossant and contributors" or (b) as some projects do this "Copyright (c) 2013 ipycache authors" with a relevant AUTHORS.txt and CONTRIBUTORS.txt in the repo to clarify. @rossant – I'm fine with (a) if you'd like to keep your name in the copyright line as the original author and primary contributor.
There's strange wording in the license:

ipycache/LICENSE.md

Lines 16 to 18 in 4042ed1

* Neither the name of border nor the names of its contributors

may be used to endorse or promote products derived from this

software without specific prior written permission.

This looks like the license text was copied from another project or company where the name "border" was relevant, but the actual BSD-3-Clause license from OSI says:

Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

Solution: I would like to change this to match the original wording since this is not relevant.
the file extension is .md but it's really a text file and written as such, so it should be switched to a plain-text file

Solution is easy: switch the file to .txt extension

@rossant — what are your thoughts on (1)? Would you prefer option (a) or (b)? I think items (2) and (3) are non-controversial; please let me know if there's some history I should be aware of for (2) and if you have any concerns about the proposed changes.

Thanks!

Import pickle

How to install pickle or cpickle in Python3 for ubuntu 16. 04?
Is there any documentation for pickle pls share it.

Should fail if not all variables could be saved/loaded?

I tried the code using commas between the variable names (by accident)
%%cache myvars.pkl a, b, c

This will only save c (and will give out warnings about not finding a,, b,, in the namespace).

Upon rerunning the code, it will only load c and not throw an error again.

I would suggest the default behaviour should be to throw out a warning if the process of saving all specified variables failed and to make sure the cell is not skipped in the case it is rerun later.

Great and very useful package!!!

Add automatic tests with IPython notebooks

Clarify BSD license as BSD-3-Clause

We were adding ipycache to conda-forge, but there seems to be a bit of confusion about the license. Is this intended to be BSD 3-clause?

cc @jochym

install fails due to missed README file

the setup.py is looking for a README file but it cannot be found.

in setup.py (line 23) we have :

long_description=read('README'),

should be :

long_description=read('README.md'),

Use of cloudpickle

Warning: I'm not much of a python developer!
I just tried to include the use of cloudpickle in case pickle fails:
https://github.com/fabianrost84/ipycache
For me this comes very handy as I often like to pickle lambda functions.
What do you think about including the (possibly optional) use of cloudpickle in ipcache?
I'm also not sure how this relates to #28.

ipycache without ipy

i would be useful to separate out the ipycache into 2 parts: one that can be used without ipython/jupyter like a memoization decorator and magic build on top of the first one. This is to be able to use in both modes, once the scripts from notebook are integrated into package or application.

ipycache.load_vars method is vulnerable

import os
import pickle
import ipycache

class Test(object):

def __init__(self):
    self.a = 1

def __reduce__(self):
    return (os.system,('ls',))

tmpdaa = Test()
with open("a-file.pickle",'wb') as f:

pickle.dump(tmpdaa,f)

ipycache.load_vars('a-file.pickle','')

Hi，ipycache.load_vars function with evil data will cause command execution,if attack share evil data on internet,when user load it , it will cause command execution.

Accept arguments starting with $ (eval)

Like in IPython, i.e. %%cache $filename would save the variables in the file referred by the filename variable.

Exceptions not visible when using %%cache

The exceptions raised in a cell with %%cache, are not displayed

In[0]:
%load_ext ipycache
Out[0] :
/home/daniilhayrapetyan/.virtualenvs/test/lib/python3.6/site-packages/IPython/config.py:13: ShimWarning: The `IPython.config` package has been deprecated since IPython 4.0. You should import from traitlets.config instead.
  "You should import from traitlets.config instead.", ShimWarning)
/home/daniilhayrapetyan/.virtualenvs/test/lib/python3.6/site-packages/ipycache.py:17: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.
  from IPython.utils.traitlets import Unicode
In[1] :
%%cache _delete_me.pkl
raise Exception
Out[1] :
[Saved variables '' to file 'home/daniilhayrapetyan/.../_delete_me.pkl'.]

I am using Jupyter Lab:
Here are the versions:

Python: 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0]
ipycache: 0.1.4
jupyterlab: 1.1.1

Also if using tqdm_notebook from tqdm, the widget is not displayed. Just thought that this fact will be helpful for debuging the issue.

Option to automatically save newly created variables in the cell

%%cache [file] --autovars automatically detects the variables that have been created in the cell, by looking at the namespace dictionaries before and after the cell's execution.

Alternative caching backends

Hi! ipycache is great, but one issue I've run into is that raw pickles are slow and big, specially for large arrays. In the past I've tried a bunch of alternatives (pickle+gzip, hdf5, etc). So I implemented a couple of these as alternative backends in ipycache here: https://github.com/dimatura/ipycache/tree/npyz. They all have tradeoffs, but I think overall something like this could be pretty useful overall. Any interest in a PR? I'd be willing to clean things up.

Add caching to `pip` in GitHub Actions

These docs should help, we should be able to apply this directly to our config:

https://github.com/actions/cache/blob/main/examples.md#python---pip

invalid syntax

File "", line 3
%%cache company_facebook_data.pkl company_facebook_data
^
SyntaxError: invalid syntax

I am using python 3.4

Only pickle the results when the cell executes with no errors.

Often I get an error in the cell, but the caching pickles the results anyway. So once you fix the error, you have to run the cell with -f flag for cashing.

It seems more convenient as well as more intuitive to have the cashing detect the error and not save the results when the cell exited with an error.

Move most config from `setup.py` to `setup.cfg` to simplify static config management

We are currently using setup.py and have code to read in a README.md file as a string, but we don't need this to be in code, since none of our configuration is dynamic. Instead, we can have a very simple setup.py and move our config into a static setup.cfg instead.

Docs:

output in cached cells is delayed

Hi,

when running a cell under the %%cache magic, the output is delayed until the evaluation has finished. I.e., in the following code, 1 shows up after 10 secs.

%%cache 'test.pkl' var
import time
for i in range(10):
    time.sleep(1)
    print i
var=2

I usually insert progress-bars or other on-the-fly output to long computations so that I'm able to estimate how long they are going to take.
Is it possible to enable dynamic output in ipycache? I'm not sure what would be the best way to proceed since I'm not very familiar with ipython's internals.

Is it possible to keep previous output (e.g., figures) when re-run (and skip) a cached cell?

At present, the output will be replaced by "Skipped the cell's code and loaded variables...", and all the figures are gone.

I've tried to set the default value of --verbose to False in the source code (I don't know how to disable a default True option). This will disable any new print out, but still clear previous output.

I think support for long-running cell (like this %%cache magic) is a major feature that ipython notebook lacks at present. If some cells can be skipped but left their output figures intact when re-run the entire notebook, it would be very useful, at least for data analysis purpose. See my question at stackoverflow.

Thanks, rossant! I really like your project (and register an account just for making this comment:)

Replace deprecated, unmaintained `nose` with another solution

As noted in our GitHub Actions config, we had to exclude Python 3.10 and 3.11 due to nose incompatibility that will not be fixed, so we need to migrate to another option (nose2 or pytest):

ipycache/.github/workflows/master.yml

Lines 16 to 23 in ca8a1fb

 # Tests running via `nosetests` on Python 3.10 fail with the error: 

 # 

 # "AttributeError: module 'collections' has no attribute 'Callable'" 

 #  

 # Per https://github.com/nose-devs/nose/issues/1099, nose is no longer 

 # maintained and so this will not be fixed; we need to migrate to either 

 # nose2 or pytest, per the discussion on the above issue. 

 python: [ '3.7', '3.8', '3.9' ]

However, nosetests appears to have been removed in the GitHub Actions images for Ubuntu and macOS, as evidenced by this failure with Python 3.8 on ubuntu-18.04:

Run nosetests
/home/runner/work/_temp/1b158be5-bebc-467e-a8c7-2491044116ae.sh: line 1: nosetests: command not found
Error: Process completed with exit code 127.

We see similar failures with Python (3.8, 3.9) on Ubuntu (18.04, 20.04) and macOS (10.15, 11).

Although we can probably quick-fix this with pip install nose or adding it to test_requirements.txt, we might as well use this opportunity to instead migrate to a newer tool to enable us to test with newer Python versions as well.

Install `nose` explicitly as a quick fix to unblock license clarification

#60 is failing tests due to missing command nosetests; in #61, we initially decided to replace nose with nose2 instead of the quick fix, but #62 shows it's not as simple.

We can put in a quick fix to manually install nose in test_requirements.txt for now to unblock #60 to address #41, and come back for a more complete solution later.

module 'IPython.utils.io' has no attribute 'stdout'

I just installed this package and tried to use it and got this error:
I did use %load_ext ipycache in a previous cell

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [37], line 1
----> 1 get_ipython().run_cell_magic('cache', "'myvars.pkl' a b c", '\ndatalist=[]\nfor path in os.listdir(\'data/\'):\n    datalist.append(pd.read_excel(\'data/\'+path,decimal=","))\ndata=pd.concat(datalist).set_index(["N_FACTURE","N_LIGNES"])\n\n# data.dtypes==object\n# data.select_dtypes(include=[object])\ndata.head(5)\n')

File c:\Users\amine\OneDrive\Documents\Project\.venv\lib\site-packages\IPython\core\interactiveshell.py:2362, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2360 with self.builtin_trap:
   2361     args = (magic_arg_s, cell)
-> 2362     result = fn(*args, **kwargs)
   2363 return result

File c:\Users\amine\OneDrive\Documents\Project\.venv\lib\site-packages\ipycache.py:396, in CacheMagics.cache(self, line, cell)
    394             pass
    395     path = os.path.join(cachedir, path)
--> 396 cache(cell, path, vars=vars,
    397       force=args.force, verbose=not args.silent, read=args.read,
    398       # IPython methods
    399       ip_user_ns=ip.user_ns,
    400       ip_run_cell=ip.run_cell,
    401       ip_push=ip.push,
    402       ip_clear_output=clear_output
    403       )

File c:\Users\amine\OneDrive\Documents\Project\.venv\lib\site-packages\ipycache.py:265, in cache(cell, path, vars, ip_user_ns, ip_run_cell, ip_push, ip_clear_output, force, read, verbose)
    261 cell_md5 = hashlib.md5(cell.encode()).hexdigest()
    263 if do_save(path, force=force, read=read):
    264     # Capture the outputs of the cell.
--> 265     with capture_output_and_print() as io:
    266         try:
    267             ip_run_cell(cell)

File c:\Users\amine\OneDrive\Documents\Project\.venv\lib\site-packages\ipycache.py:228, in capture_output_and_print.__enter__(self)
    225 stdout = stderr = outputs = None
    226 if self.stdout:
    227     #stdout = sys.stdout = StringIO()
--> 228     stdout = sys.stdout = myStringIO(out=IPython.utils.io.stdout)
    229 if self.stderr:
    230     #stderr = sys.stderr = StringIO()
    231     stderr = sys.stderr = myStringIO(out=self.sys_stderr)

AttributeError: module 'IPython.utils.io' has no attribute 'stdout'

Caching a whole notebook

Hi,

What about enabling caching of a whole notebook? With special magic, all the cells would be automatically cached and the user would not have to think about it in each cell. This could be especially useful as we may not be aware in advance of the time a calculation will take.

I think this could be some magic on top of #13.

P.S.: Another approach, if you do not want to implement a global caching, would be to check calculation time for each cell, and if it exceeds a given threshold, to automatically cache the cell. However, I am not sure iPython provides this info.

deprecation warning for IPython.config import

C:\Python\Python27\lib\site-packages\IPython\config.py:13: ShimWarning: The `IPython.config` package has been deprecated. You should import from traitlets.config instead.
  "You should import from traitlets.config instead.", ShimWarning)
C:\Python\Python27\lib\site-packages\IPython\utils\traitlets.py:5: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.
  warn("IPython.utils.traitlets has moved to a top-level traitlets package.")

Include the license file in PyPI package

There's an in-progress PR #45 on this, but given the potential file extension change in #68 and the migration from setup.py to setup.cfg in #69, we may need to adjust this.

Here's an easy way to include the license file via setup.cfg:

[metadata]
license_files = LICENSE.txt

Allow to save cache at arbitrary location

This magic is just what I was looking for in my workflow. One thing I would like to see is being able to save the cache file to arbitrary location, not necessarily to the folder with the notebook.

Simply trying to input the file name with the full path didn't work.

I don't know how far you wanted to go with this magic, but given its usefulness to my own work, I would be happy to contribute myself.

Spaces in directoryname

With %%cache one can specifiy in which files the caching is done. One can also select files that are in another directory than $pwd. However, these pathspec cannot contain spaces. One cannot use ' or " to quote the string:

%%cache -d '/tmp/a b' c.pkl d
d=2

returns:
IOError: [Errno 2] No such file or directory: u"/home/bla'/tmp/a bl'/c.pkl"

Test on Python 3

integrate with cachey (suggestion)

cachey seems to be solving some issues with regular caching: a conflict between limited space and limited computation time. This is especially important for notebooks, where cache can grow quite large with data analysis.
To integrate both of the caching libraries, ipycache should point all cache to one file.

https://github.com/mrocklin/cachey

Configure travisCI with nosetests

Python 2.7 and 3.3
Latest stable IPython and master

Unintended exception when trying to cache variables that do not exist

C:\PyKit\Python27\lib\site-packages\ipycache.py in cache(cell, path, vars, ip_user_ns, ip_run_cell, ip_push, force, read, verbose)
129 except KeyError:
130 raise ValueError(("Variable '{0:s}' could not be found in the "
--> 131 "interactive namespace").format(var))
132 # Save the cache in the pickle file.
133 save_vars(path, cache)

NameError: global name 'var' is not defined

Remove 2 of the 3 README files in repo?

This repo currently has 3 README files:

Of these, README.md appears to be the most recently-updated one, and it is the one that is automatically rendered by GitHub upon loading the repo, so maybe it makes sense to remove the other 2 to keep it clean and easier to maintain? Or maybe I'm missing a reason to maintain the README in multiple markup languages?