dgilland / cacheout Goto Github PK
View Code? Open in Web Editor NEWA caching library for Python
Home Page: https://cacheout.readthedocs.io
License: MIT License
A caching library for Python
Home Page: https://cacheout.readthedocs.io
License: MIT License
Can this library be used in Tornado or sanic ? Will it be safe?
@cache.memoize(ttl=5, typed=True)
- default time units are never explained. So I need to look all over the code to figure out what 5
means.
cache = Cache(maxsize=256, ttl=0, timer=time.time, default=None) # defaults
- timer is callable I suppose. What is the specification of the callable? What should it return?
It's a great project. In fact this is the first python cache where I see a human operatable cache.set
method instead of intricate annotated set mechanisms (which is present as well). But ttl is my second question and it is almost not described in the documentation.
Might be fixed by 937d2df, but thought I'd report just in case. Follow on issue from #4.
In [1]: import cacheout
In [2]: c = cacheout.LRUCache()
In [3]: c.add('foo', 'bar')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-3-09536a570d71> in <module>()
----> 1 c.add('foo', 'bar')
~/anaconda3/lib/python3.6/site-packages/cacheout/cache.py in add(self, key, value, ttl)
271 """
272 with self._lock:
--> 273 self._add(key, value, ttl=ttl)
274
275 def _add(self, key, value, ttl=None):
~/anaconda3/lib/python3.6/site-packages/cacheout/cache.py in _add(self, key, value, ttl)
274
275 def _add(self, key, value, ttl=None):
--> 276 if self._has(key):
277 return
278 self._set(key, value, ttl=ttl)
~/anaconda3/lib/python3.6/site-packages/cacheout/cache.py in _has(self, key)
181 def _has(self, key):
182 # Use get method since it will take care of evicting expired keys.
--> 183 return self._get(key, default=_NOTSET) is not _NOTSET
184
185 def size(self):
~/anaconda3/lib/python3.6/site-packages/cacheout/lru.py in _get(self, key, default)
14 def _get(self, key, default=None):
15 value = super()._get(key, default=default)
---> 16 self._cache.move_to_end(key)
17 return value
KeyError: 'foo'
I'm going to write the cache statistics module. Besides cache hits/misses and cache frequency, are there any statistics that need to be recorded?
I don't understand what cache frequency means. Does it mean the number of times a cache entry was accessed?
To-Do List:
I think the policy in the link should be "least-frequently-used eviction policy", not "least-recently-used eviction policy".
Line 11 in 1f4b78c
I think on_get
callback may just be simple like this:
cache = Cache(on_get=on_get)
The callback function.
Callable[[key: Hashable, value: Any], None]
In fact, I've hardly ever seen a on-get listener in cache libraries. Do we really need it? 🤷
This is the code I'm using python=3.6
from cacheout import Cache
import time
import os
import re
TIMED_WINDOW = os.getenv('TIMED_WINDOW', 3)
MAX_EVENTS = 256
c = Cache(maxsize=MAX_EVENTS, ttl=TIMED_WINDOW, timer=time.time)
c.set('v1', 1)
c.set('v2', 2)
c.set('v3', 3)
time.sleep(1)
print(c.get_many(re.compile(r"v.*")))
print(c.get_many(re.compile(r"a.*")))
c.set('a1', 1)
c.set('a2', 2)
c.set('a3', 3)
time.sleep(1)
print(c.get_many(re.compile(r"v.*")))
print(c.get_many(re.compile(r"a.*")))
time.sleep(1)
print(c.get_many(re.compile(r"v.*")))
print(c.get_many(re.compile(r"a.*")))
time.sleep(1)
print(c.get_many(re.compile(r"v.*")))
print(c.get_many(re.compile(r"a.*")))
time.sleep(1)
print(c.get_many(re.compile(r"v.*")))
print(c.get_many(re.compile(r"a.*")))
time.sleep(1)
print(c.get_many(re.compile(r"v.*")))
print(c.get_many(re.compile(r"a.*")))
Traceback (most recent call last):
File "windowed.py", line 24, in <module>
print(c.get_many(re.compile(r"v.*")))
File "C:\tools\Anaconda3\envs\py36\lib\site-packages\cacheout\cache.py", line 248, in get_many
return {key: self.get(key, default=default) for key in self._filter(iteratee)}
File "C:\tools\Anaconda3\envs\py36\lib\site-packages\cacheout\cache.py", line 248, in <dictcomp>
return {key: self.get(key, default=default) for key in self._filter(iteratee)}
File "C:\tools\Anaconda3\envs\py36\lib\site-packages\cacheout\cache.py", line 500, in _filter
yield from filter(filter_by, target)
RuntimeError: OrderedDict mutated during iteration
I think that when it try to get many it is removing the cache by ttl rule.
First of all, on-set callbcak and RemovalCause.SET
coincide.
We can replace RemovalCause.SET
with on-set callbcak.
cache = Cache(on_set=on_set)
Callable[[key: Hashable, new_value: Any, old_value: Any], None]
When are you planning to add support for layered caching? Has there already been put effort into? I'd be interested in "guided" contribution 😏
just a thought :)
Maybe I'm not understanding the semantics, but this is not what I would expect:
$ ipython
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import cacheout
In [2]: cacheout.__version__
Out[2]: '0.10.1'
In [3]: c = cacheout.LRUCache()
In [4]: c.get('foo', default='bar')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-4-09ce1ce78da4> in <module>()
----> 1 c.get('foo', default='bar')
~/anaconda3/lib/python3.6/site-packages/cacheout/cache.py in get(self, key, default)
214 """
215 with self._lock:
--> 216 return self._get(key, default=default)
217
218 def _get(self, key, default=None):
~/anaconda3/lib/python3.6/site-packages/cacheout/lru.py in _get(self, key, default)
14 def _get(self, key, default=None):
15 value = super()._get(key, default=default)
---> 16 self._cache.move_to_end(key)
17 return value
KeyError: 'foo'
Looking at the traceback, it appears that the move_to_end() call shouldn't occur if the default value was returned, but there's no way to know whether the value returned is the default or just happened to exist in the cache with the same value as the default. Maybe _get() needs to return an is_default bool?
Ubuntu 14.04
This lib looks pretty neat!
However, I found it because I was looking for persistent caching, so to disk.
Do you have any plans to implement this?
If you have a cache miss, and you want to call the default callable to get the right value for the cache, are the calls to that function (at least for a single key) single threaded? That is, if I have 100 calls all hit at once for the same key, but it's missing from the cache, is the callable only going to be called once?
This is a specific dimension of thread safety. The functions I want to use are thread-safe, but they're sometimes very expensive (several seconds). I don't want to startup 100 identical calls to replace the one cache value...
from cacheout.lfu import LFUCache
cache = LFUCache(maxsize=4, ttl=0, timer=time.time, default=None)
for i in range(4):
cache.add(1,1)
cache.add(2,1)
for i in range(3):
cache.add(3,1)
cache.add(4,1)
cache.add(5,1)
cache.add(6,1)
cache.keys()
output should be "odict_keys([1, 2, 4, 6])
"
but is instead "odict_keys([1, 2, 3, 4, 6])
"
After a cached expires (with finite ttl) the in
operator still returns True
(that the item is in the cache) until the get()
method is called and corrects the internal state. It should be consistent with get()
and return False
right after expiration happens. Observed in the latest cachout==0.11.2.
It seems that the __contains__()
method is not checking self.expired(key)
, in contrast to _get()
which does. That would be the cause.
In addition there's a method has()
which internally calls get()
and checks expiration and distinguishes non-presence from None
value. Does it mean that not checking expiration in __contains__
is intentional?
import cacheout
# works for all implementations (descendants) the Cache class
cache = cacheout.Cache(ttl=1)
Only calls to the in
operator. After expiration it returns the wrong value.
cache.set('foo', 'bar')
for i in range(3):
print('foo' in cache)
time.sleep(1.5)
# True
# True # <-- wrong
# True # <-- wrong
Call the get()
modifies the state and the in
operator returns correct value afterwards.
cache.set('foo', 'bar')
for i in range(3):
print('foo' in cache, cache.get('foo'))
time.sleep(1.5)
# True bar
# True None # <-- wrong
# False None
cache.set('foo', 'bar')
for i in range(3):
print(cache.get('foo'), 'foo' in cache)
time.sleep(1.5)
# bar True
# None False # OK
# None False
Is there any option to add an "eviction callback" when a key is deleted?
I personally looking for a cache/TTLdict structure that I can tell when keys are evicted from it (by timeout).
What I need is to add expire time for specific keys. Currently cacheout provides api "set" to create new entry or update existing entry. So a simple way is:
v = cache.get(k, None)
if v is not None:
cache.set(k, v, ttl=100)
else:
raise KeyError(k)
Will you consider to provide apis like "expire" in Redis to update expire time directly?
Hi,
I have used cacheout in my app, and it works like charm. but, later we have moved from threads to processes.
The problem is cacheout is thread-safe not process safe, so I want to share an instance of CacheManger between all processes.
is it a good idea?
Hope to add ARC algorithm, very support your work.
I have a flask
app based on gunicorn
. Can cacheout
share data among gunicorn
threads?
1、How to get a single key and how long it will expire
2、How to extend the expiration time of a single key
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.