tkem / cachetools Goto Github PK

View Code? Open in Web Editor NEW

2.2K 2.2K 159.0 385 KB

Extensible memoizing collections and decorators

License: MIT License

Python 100.00%

cachetools's People

Contributors

Stargazers

Watchers

Forkers

raffaele-w vaab trein wisertogether ncalexan akash1808 joe1234wu moreati maartenbreddels jiocloudcompute alejandro-rivera pombredanne salamb dmonroy sublee ido-jether bsdz hellysmile batistahermogenes xianfengju aluriak sherlock-holo gridl niulinlnc lorenmh pkoz robertobarreda chubbymaggie dastergon sekhar2889 jpemberthy smallbaby tgwizard hephex zhucc haiyanghe aaronwang062441 batermj stephengroat ncalkins89 m-pattinson keris matityahul mebling berlinguyinca alexeykarnachev antonidabek zombig xinshang-iai cning112 diepmv paulmelnikow readall walrusvision hugovk brianbruggeman rkoschmitzky a-d-i-t-y-a apurveyajnik leepand alexxyjiang chrisburr judahrand sheeryer binzhouchn bost thorwhalen kaisero ajkeeton syyunn cool-rr veluga judej lch277 mrsanzhi eltoder weibeu thetorpedodog dmcp89 ut-382v-ashkan-and-david ctian1 achillesxu tysoncung lioncui daongoc315 wfyhz yogevkr kvtm-ravikant 2or3inthemorning timrichardson shaffooo kalicolorgold yxlwfds thomasleveil stephen-aruba willdhb junpengxu user9000 rodxavier fwoonsg

cachetools's Issues

Rewrite @cachedmethod example to use per-object cache

The main advantage of @cachedmethod over function decorators is that cache properties can be set at runtime. Therefore, the example class should have a constructor with a cachesize parameters. This may also avoid questions regarding locking, though these should be addressed at some point.

Exclude expired items when iterating over TTLCache

It may be confusing if expired items show up when iterating over TTLCache.items(), or in its string representation. Therefore, a custom TTLCache.__iter__ method should be defined, which fast-forwards over expired items on start.

TBD:

handle expiration during iteration
how to handle TTLCache.currsize (maybe ignore for now)

Change Changes to RST

Changes is already - almost - in RST format, so this should be renamed to CHANGES.rst and link in README.rst should be updated accordingly.

Improve Cache.getsizeof()

See https://docs.python.org/2/library/collections.html#defaultdict-examples

Add choice parameter to RRCache constructor

Making the choice function configurable may provide additional use cases for RRCache.

No lock provided in cachedmethod

In cachedfunc implementation, you are providing a lock argument, why not in cachedmethod ?

It seems that multiple thread could be sharing the same instance of an object, and require locking facility also ? Am I wrong ?

BTW, shouldn't the locking mecanism implemented into the Cache objects ? (in this case, no need of this argument in either cachedfunc nor cachedmethod).

Add proper locking for cache_clear(), cache_info()

Calling cache.clear() must be locked in cache_clear().
We might get away with unlocked access to cache.size and cache.maxsize in cache_info(), but proper locking should be added in case these are implemented as properties.

Document TTLCache.expire method

Delayed, until more hands-on experience using this is available.

Provide TTLCache.setdefault() with optional ttl

Allows per-item ttl, but cannot change ttl of existing item for simplicity.

Test with Python 3.5

Travis allows testing with 3.5-nightly.

Add TTLCache

This module needs a cache that supports per-item time-to-live values.

Clear cache statistics in clear_cache()

@functools.lru_cache clears the hits and misses counts in clear_cache(). The Python 3 style decorators should do the same.

Document deprecated @cachedmethod behavior with default key function

For backward compatibility, @cachedmethod adds the method to the key function's arguments, to be easier to use with shared per-instance caches. This is deprecated in favor of custom key functions, and should be documented.

Refactor LFUCache

Use a collections.Counter to keep track of usage counts. Though not the fastest way possible, Counter has some features that should make implementation quite simple. This may also serve as an example/blueprint for other cache implementors. Do not use the documented method for getting the least common elements, though, since this will perform a full item sort. Use a simple min instead.

Public getsizeof() method takes wrong parameter for LFUCache, LRUCache and TTLCache

Since the Cache.getsizeof() method is public, users may call it, passing a cache element.
For some derived caches, especially LFUCache, LRUCache and TTLCache, getsizeof() does not take a user-level cache element as its argument, but a (value, link) or (value, counter) tuple.

Replace foo() and bar() with fib() and fac() in documentation

Less abstract, and factorial actually benefits from memoization.

Update README.rst

Should use same wording as documentation/index.rst for introductory section.

Use Python 2.7/3.2 NullContext implementation for all versions

The extra import probably outweighs any gains; also improves Python >= 3.3 unitest coverage.

Improve documentation

Add an example showing how to use getsizeof with sys.getsizeof() and/or len() for sequences/strings.

Save hash value for tuples returned by makekey()

Since hashes of tuples need to be recomputed each time, objects returned by makekey could compute the hash once and store it in a slot. See Python 3's lru_cache implementation for details (using list).

Refactor cache base class

The base Cache class should be made public (again), so that individual caching strategies can be built on top of it. Core features:

getsizeof should be a static method returning 1 by default
getsizeof should be called only once, at item insertion; this probably means the result has to be stored alongside the item's value

mapping doesn't handle dict as an input arg

When trying to use a cache on a function that accepts dict args, an error is raised because dict is not hashable.

>>> import cachetools
>>> @cachetools.ttl_cache(maxsize=1, ttl=1)
... def whatever(a, b):
...     pass
...
>>> whatever({1:"A"}, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cachetools/decorators.py", line 39, in wrapper
    result = cache[key]
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cachetools/ttlcache.py", line 69, in __getitem__
    link = cache_getitem(self, key)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cachetools/cache.py", line 39, in __getitem__
    return self.__mapping[key][0]
TypeError: unhashable type: 'dict'

Add flake8 checks to .travis.yml

Would have caught 732d792.

Move Python3 style decorators to sub module

If/when a generic @cachedfunc decorator gets added, the current function decorators should be moved to a sub module, and documented as add-ons.

Since we can't break the API before v2.0, they should still be available, i.e. imported, by the main module in v1.x.

Report cache.size in cache_info()

CacheInfo.currsize as reported by cache_info() is set to len(cache).
For caches that use getsizeof() to compute their size, this should be changed to cache.size.

Provide RRCache.choice property

Similar to TTLCache.ttl and TTLCache.timer, the choice used by an RRCache should be available as a read-only property.

Custom exception type for "size too large"

Derived from ValueError, raised if getsizeof(value) > maxsize.

May be caught in decorators to bypass caching. This way, maxsize=0 may be supported, for improved @lru_cache compatibility.

Add "missing" class decorator

The "missing" stuff is somewhat orthogonal to the rest of the library. Instead of providing this as a constructor argument, this should be made a class decorator, which could be applied to dict as well (details TBD).
Cache classes would only have to provide basic __missing__ support, with the default method raising a KeyError. Constructor arguments should be deprecated. Documentation should use an extra chapter, describing this as an "alternative way" of doing memoization.

Refactor LRU/TTLCache

Regarding performance, it might be desirable to not derive TTLCache from LRUCache, but to duplicate the LRU implementation.

Refactor unit tests

The CacheTestMixin class works well for cache implementations, but the state of decorator tests could be improved.

Refactor cache implementations

Separate values from support data structures in cache implementations, e.g. LRU and TTL links. This may require more space, but leads to cleaner implementation. Will also make __missing__ implementation more intuitive.
In Cache base class, separate cached size from value for similar reasons. If no getsizeof is given, this can be optimized away. Implementation details TBD.

Code cleanup

Perform some code cleanup, to bring this more in line with PEP 8; for example, add underscores to functions that should not be exported.

Refactor Cache's getsizeof and missing default implementation

Use default static or class methods that can be overridden per instance.

Improve key functions

~~make hashed tuple's add and radd insert marker objects~~
compute hash value on demand (lazy)
"flatten" tuples in hashkey and typedkey
add docs how to use hashkey with custom key functions, e.g. with dict argument

Invalid formatting in raising KeyError

See https://github.com/tkem/cachetools/blob/master/cachetools/ttlcache.py#L45

At least with python 2.7, this throws some other exception (can't remember which) as key cannot be formatted to %r.

The problem is in key[1] which is () without any kwargs or ('kwarg1', 'value1', ...) with kwargs. I think kwargs should go to key[0] as well with other args and key[0] should be left empty?

TTLCache operations must be atomic/transactional

Some TTLCache operations may behave erratically if a cache items expires in the middle of the operation. This especially concerns the "mixin" methods such as get, pop and setdefault, which have been changed for __missing__ support to first check for item presence using __contains__.

If, for example, a cache item expires after TTLCache.get calls __contains__, but before __getitem__ is called, an exception will be raised unexcpectedly.

So all operations should behave as if the timer was frozen at the start of the outermost method call. Since passing optional time arguments to each and every method call is unwieldy, at least for "special" methods, other implementation strategies need to be evaluated.

Note this does not affect normal cache operations, i.e. __getitem__, __setitem__ and popitem, and does not affect decorators.

Evict callback

Hi,

I feel what is missing from this library is an evict callback. When an item is evicted, maybe you want to do something with the item - for example persist it to storage somewhere because it might have changed during its lifetime in the cache.

Do you have something like this planned?

Optional key function argument for @cachedmethod

See #36 for use case.

replaces typed; True and False should be accepted as magic values for backwards compatibility, but eventually get deprecated.
exposed as property key in method wrapper.

Pickle improvements

Hi,

Thanks for your cache implementation. I was hitting the "maximum recursion limit" when pickling a large LRU cache. I think this is due to the recursive nature of the double linked list you use for sorting the values. My workaround was to pickle only this list using a loop, instead of the list+dict (this also saves space).

The code below was added to lru.py to fix the problem. I'm posting it here in case you're interested.

Thanks!
Marco

  def __getstate__(self):
        c = self.__root
        l = []
        while c.prev!=self.__root:
            l.append((c.prev.key,c.prev.value))
            c = c.prev
        state = {
          "currsize" : self.currsize ,
          "maxsize" : self.maxsize,
          "ordering" : l
        }
        if hasattr(self,"__missing"):
            state["missing"] = self.__missing
        else:
            state["missing"] = None
        if hasattr(self,"__getsizeof"):
            state["getsizeof"] = self.__getsizeof
        else:
            state["getsizeof"] = None
        return state

    def __setstate__(self,state):
        self.__init__(state["maxsize"],state["missing"],state["getsizeof"])
        for kv in reversed(state["ordering"]):
            self.__setitem__(kv[0],kv[1])

Rename Changes to CHANGES.rst in MANIFEST.in

  Downloading cachetools-0.7.0.tar.gz
  Running setup.py egg_info for package cachetools

    warning: no files found matching 'Changes'

Feature request: manually clearing cache

It seems that currently you can only pop individual items by a key.

It also looks like a cache object isn't saved by cachedfunc decorator into a wrapped function, it's done in cachedmethod only.

Use custom exception type for expired TTLCache items

Derived from KeyError.

Do not delete expired items in TTLCache.getitem

Deleting items in a normally non-mutating method may lead to unexpected behavior, e.g. a RuntimeError when iterating over the cache elements. By simply raising a KeyError for expired elements and leaving the underlying dict as is, this can be avoided. Expired items should be removed on the next mutating operation.

Provide generic @cache decorator

A generic @cachedfunc decorator should work with any mutable mapping, similar to @cachedmethod. One issue is with cache_info() support for currsize and maxsize (which dict does not have, but this might be handled with optional "getter" parameters, e.g.

def cachedfunc(cache_factory, typed=False, currsize=len, maxsize=None, lock=None)

Note that to be generally usable as a decorator, the first argument should be a factory function, which could be used like

@cachedfunc(dict)
def myfun(arg1, arg2)

In the @lru_cache compatible decorators, this could be passed a lambda, e.g.

def lru_cache(maxsize=128, typed=False, getsizeof=None, lock=RLock):
    return cachedfunc(lambda: LRUCache(maxsize, getsizeof), 
                                      typed, 
                                      currsize=operator.attrgetter('currsize'),
                                      maxsize=operator.attrgetter('maxsize'),
                                      lock=lock)

Alternatively, an info() method could be added to the Cache base class, tracking cache hits and misses, but this would probably conflict with how cache implementations use the base class's __getitem__. For example, TTLCache and LRUCache call Cache.__getitem__ during __setitem__, which would increment cache hits/misses.

On the other hand, providing currsize and maxsize for @cachedfunc might call for similar support in @cachedmethod, which has to be evaluated seperately.

Add a __missing__ method to Cache base class, which will be called from __getitem__ if key cannot be found. Make sure that __missing__ behaves as specified in collections and elsewhere, e.g is only called from __getitem__ and not from get. This may involve providing extra methods in the base class, and possible derived classes, e.g. __contains__.
By default, Cache.__missing__ raises KeyError.
Add missing key argument to Cache constructor. If provided, missing(key) will be called from Cache.__missing__, and the return value will be added to the cache (respecting maxsize and possibly discarding items).
Adapt other cache classes to support missing key argument. TTLCache will probably need to be modified the most; maybe need to provide ExpiredError support in the Cache base class.
Optional: adapt decorators to use missing instead of __setitem__.