tkem / cachetools Goto Github PK
View Code? Open in Web Editor NEWExtensible memoizing collections and decorators
License: MIT License
Extensible memoizing collections and decorators
License: MIT License
The main advantage of @cachedmethod
over function decorators is that cache properties can be set at runtime. Therefore, the example class should have a constructor with a cachesize
parameters. This may also avoid questions regarding locking, though these should be addressed at some point.
It may be confusing if expired items show up when iterating over TTLCache.items()
, or in its string representation. Therefore, a custom TTLCache.__iter__
method should be defined, which fast-forwards over expired items on start.
TBD:
TTLCache.currsize
(maybe ignore for now)Changes is already - almost - in RST format, so this should be renamed to CHANGES.rst
and link in README.rst
should be updated accordingly.
Making the choice
function configurable may provide additional use cases for RRCache
.
In cachedfunc
implementation, you are providing a lock
argument, why not in cachedmethod
?
It seems that multiple thread could be sharing the same instance of an object, and require locking facility also ? Am I wrong ?
BTW, shouldn't the locking mecanism implemented into the Cache objects ? (in this case, no need of this argument in either cachedfunc
nor cachedmethod
).
Calling cache.clear()
must be locked in cache_clear()
.
We might get away with unlocked access to cache.size
and cache.maxsize
in cache_info()
, but proper locking should be added in case these are implemented as properties.
Delayed, until more hands-on experience using this is available.
Allows per-item ttl, but cannot change ttl of existing item for simplicity.
Travis allows testing with 3.5-nightly.
This module needs a cache that supports per-item time-to-live values.
@functools.lru_cache
clears the hits and misses counts in clear_cache()
. The Python 3 style decorators should do the same.
For backward compatibility, @cachedmethod
adds the method to the key function's arguments, to be easier to use with shared per-instance caches. This is deprecated in favor of custom key functions, and should be documented.
Use a collections.Counter
to keep track of usage counts. Though not the fastest way possible, Counter has some features that should make implementation quite simple. This may also serve as an example/blueprint for other cache implementors. Do not use the documented method for getting the least common elements, though, since this will perform a full item sort. Use a simple min
instead.
Since the Cache.getsizeof()
method is public, users may call it, passing a cache element.
For some derived caches, especially LFUCache
, LRUCache
and TTLCache
, getsizeof()
does not take a user-level cache element as its argument, but a (value, link)
or (value, counter)
tuple.
Less abstract, and factorial actually benefits from memoization.
Should use same wording as documentation/index.rst for introductory section.
The extra import probably outweighs any gains; also improves Python >= 3.3 unitest coverage.
Add an example showing how to use getsizeof
with sys.getsizeof()
and/or len()
for sequences/strings.
Since hashes of tuples need to be recomputed each time, objects returned by makekey
could compute the hash once and store it in a slot. See Python 3's lru_cache
implementation for details (using list
).
The base Cache
class should be made public (again), so that individual caching strategies can be built on top of it. Core features:
getsizeof
should be a static method returning 1
by defaultgetsizeof
should be called only once, at item insertion; this probably means the result has to be stored alongside the item's valueWhen trying to use a cache on a function that accepts dict args, an error is raised because dict is not hashable.
>>> import cachetools
>>> @cachetools.ttl_cache(maxsize=1, ttl=1)
... def whatever(a, b):
... pass
...
>>> whatever({1:"A"}, 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cachetools/decorators.py", line 39, in wrapper
result = cache[key]
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cachetools/ttlcache.py", line 69, in __getitem__
link = cache_getitem(self, key)
File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cachetools/cache.py", line 39, in __getitem__
return self.__mapping[key][0]
TypeError: unhashable type: 'dict'
Would have caught 732d792.
If/when a generic @cachedfunc
decorator gets added, the current function decorators should be moved to a sub module, and documented as add-ons.
Since we can't break the API before v2.0, they should still be available, i.e. imported, by the main module in v1.x.
CacheInfo.currsize
as reported by cache_info()
is set to len(cache)
.
For caches that use getsizeof()
to compute their size, this should be changed to cache.size
.
Similar to TTLCache.ttl
and TTLCache.timer
, the choice
used by an RRCache
should be available as a read-only property.
Derived from ValueError
, raised if getsizeof(value) > maxsize
.
May be caught in decorators to bypass caching. This way, maxsize=0
may be supported, for improved @lru_cache
compatibility.
The "missing" stuff is somewhat orthogonal to the rest of the library. Instead of providing this as a constructor argument, this should be made a class decorator, which could be applied to dict
as well (details TBD).
Cache classes would only have to provide basic __missing__
support, with the default method raising a KeyError
. Constructor arguments should be deprecated. Documentation should use an extra chapter, describing this as an "alternative way" of doing memoization.
Regarding performance, it might be desirable to not derive TTLCache
from LRUCache
, but to duplicate the LRU implementation.
The CacheTestMixin
class works well for cache implementations, but the state of decorator tests could be improved.
__missing__
implementation more intuitive.Cache
base class, separate cached size from value for similar reasons. If no getsizeof
is given, this can be optimized away. Implementation details TBD.Perform some code cleanup, to bring this more in line with PEP 8; for example, add underscores to functions that should not be exported.
Use default static or class methods that can be overridden per instance.
See https://github.com/tkem/cachetools/blob/master/cachetools/ttlcache.py#L45
At least with python 2.7, this throws some other exception (can't remember which) as key cannot be formatted to %r
.
The problem is in key[1]
which is ()
without any kwargs or ('kwarg1', 'value1', ...)
with kwargs. I think kwargs should go to key[0] as well with other args and key[0] should be left empty?
Some TTLCache
operations may behave erratically if a cache items expires in the middle of the operation. This especially concerns the "mixin" methods such as get
, pop
and setdefault
, which have been changed for __missing__
support to first check for item presence using __contains__
.
If, for example, a cache item expires after TTLCache.get
calls __contains__
, but before __getitem__
is called, an exception will be raised unexcpectedly.
So all operations should behave as if the timer was frozen at the start of the outermost method call. Since passing optional time
arguments to each and every method call is unwieldy, at least for "special" methods, other implementation strategies need to be evaluated.
Note this does not affect normal cache operations, i.e. __getitem__
, __setitem__
and popitem
, and does not affect decorators.
Hi,
I feel what is missing from this library is an evict callback. When an item is evicted, maybe you want to do something with the item - for example persist it to storage somewhere because it might have changed during its lifetime in the cache.
Do you have something like this planned?
See #36 for use case.
typed
; True
and False
should be accepted as magic values for backwards compatibility, but eventually get deprecated.key
in method wrapper.Hi,
Thanks for your cache implementation. I was hitting the "maximum recursion limit" when pickling a large LRU cache. I think this is due to the recursive nature of the double linked list you use for sorting the values. My workaround was to pickle only this list using a loop, instead of the list+dict (this also saves space).
The code below was added to lru.py to fix the problem. I'm posting it here in case you're interested.
Thanks!
Marco
def __getstate__(self):
c = self.__root
l = []
while c.prev!=self.__root:
l.append((c.prev.key,c.prev.value))
c = c.prev
state = {
"currsize" : self.currsize ,
"maxsize" : self.maxsize,
"ordering" : l
}
if hasattr(self,"__missing"):
state["missing"] = self.__missing
else:
state["missing"] = None
if hasattr(self,"__getsizeof"):
state["getsizeof"] = self.__getsizeof
else:
state["getsizeof"] = None
return state
def __setstate__(self,state):
self.__init__(state["maxsize"],state["missing"],state["getsizeof"])
for kv in reversed(state["ordering"]):
self.__setitem__(kv[0],kv[1])
Downloading cachetools-0.7.0.tar.gz
Running setup.py egg_info for package cachetools
warning: no files found matching 'Changes'
It seems that currently you can only pop individual items by a key.
It also looks like a cache
object isn't saved by cachedfunc
decorator into a wrapped function, it's done in cachedmethod
only.
Derived from KeyError
.
Deleting items in a normally non-mutating method may lead to unexpected behavior, e.g. a RuntimeError when iterating over the cache elements. By simply raising a KeyError for expired elements and leaving the underlying dict as is, this can be avoided. Expired items should be removed on the next mutating operation.
A generic @cachedfunc
decorator should work with any mutable mapping, similar to @cachedmethod
. One issue is with cache_info()
support for currsize
and maxsize
(which dict
does not have, but this might be handled with optional "getter" parameters, e.g.
def cachedfunc(cache_factory, typed=False, currsize=len, maxsize=None, lock=None)
Note that to be generally usable as a decorator, the first argument should be a factory function, which could be used like
@cachedfunc(dict)
def myfun(arg1, arg2)
In the @lru_cache
compatible decorators, this could be passed a lambda, e.g.
def lru_cache(maxsize=128, typed=False, getsizeof=None, lock=RLock):
return cachedfunc(lambda: LRUCache(maxsize, getsizeof),
typed,
currsize=operator.attrgetter('currsize'),
maxsize=operator.attrgetter('maxsize'),
lock=lock)
Alternatively, an info()
method could be added to the Cache
base class, tracking cache hits and misses, but this would probably conflict with how cache implementations use the base class's __getitem__
. For example, TTLCache
and LRUCache
call Cache.__getitem__
during __setitem__
, which would increment cache hits/misses.
On the other hand, providing currsize
and maxsize
for @cachedfunc
might call for similar support in @cachedmethod
, which has to be evaluated seperately.
If the cache
method provided to the @cachedmethod
decorator returns None
, the wrapped method should be called directly.
Also consider making the cache
parameter a property of the wrapper.
Having exceptions as nested classes looks somewhat unpythonic...
cachetools
now features much more than cache classes, so the introduction pycon
samples should also show use of the decorators and maybe custom keys.
Similar to the other function decorators, a @ttl_cache
decorator should be added. ttl
can have an arbitrary default value; 600
(10 minutes) looks good to me.
I.e. allow lock=None
for all decorators. Keep default for Python3 lru_cache
compatibility.
__missing__
method to Cache
base class, which will be called from __getitem__
if key
cannot be found. Make sure that __missing__
behaves as specified in collections
and elsewhere, e.g is only called from __getitem__
and not from get
. This may involve providing extra methods in the base class, and possible derived classes, e.g. __contains__
.Cache.__missing__
raises KeyError
.missing
key argument to Cache
constructor. If provided, missing(key)
will be called from Cache.__missing__
, and the return value will be added to the cache (respecting maxsize
and possibly discarding items).missing
key argument. TTLCache
will probably need to be modified the most; maybe need to provide ExpiredError
support in the Cache
base class.missing
instead of __setitem__
.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.