Giter VIP home page Giter VIP logo

Comments (12)

linsomniac avatar linsomniac commented on August 12, 2024

I would support the idea of adding an option or method to allow disabling the key checking.

If you are very performance sensitive, you might want to consider testing the python-memcache2 module I'm currently developing on github. I'm expecting it will be faster, particularly for dealing with larger values retrieved, because it uses buffered reads instead of readline() (which reads a character at a time).

The API is still in flux, but the code is well tested, so it may be worth trying. But it is not a simple drop-in replacement.

from python-memcached.

benhoyt avatar benhoyt commented on August 12, 2024

Looks interesting, thanks. I was going to try porting my benchmark to it, however, my benchmark is heavy on set_multi and get_multi, which python-memcache2 doesn't seem to support, so it's almost certain to be slower for that reason (for my use case).

You mention readline() reads a character at a time, which I don't think is true, either for socket.makefile(), or for the home-made version of readline() that python-memcached uses, which buffers 4KB at a time.

from python-memcached.

linsomniac avatar linsomniac commented on August 12, 2024

Hmm, I could have sworn it was using the socket version. In that case, I can't explain why the new version is faster, but it would explain why it's only 10-30% faster. :-)

I do have a TODO to support getting multiple keys, but I'm not sure how to approach it, I hadn't looked at the existing code. It looks like the existing code hashes the keys to collect what servers they go to, then does the read/writes of the keys matching those servers. That's the approach I figured I'd have to take.

Part of why I made the MemcacheValue was because of my thinking about supporting getting multiple keys. With MemcacheValue you can tell what key a value is associated with, as opposed to returning a dictionary.

It is something that is on the list to support though.

from python-memcached.

benhoyt avatar benhoyt commented on August 12, 2024

Sounds good. I guess personally I'm not a big fan of the more complex string subclass rather than just return a str or dict of strs. In any case, I don't want to sidetrack the discussion...

So how about adding check_keys=True to the memcache.Client constructor so you can turn that off, or at least check_control_chars=True so you can turn that expensive operation off?

from python-memcached.

linsomniac avatar linsomniac commented on August 12, 2024

I'm still in the experimentation stage, and so I'm trying things. I figured "get" with CAS set needed a way to return the additional CAS data, which is why I augmented the string return type. However, for the lower level code, that's probably not very appropriate. Maybe I should add a "get_cas" and a "get_multi" or something, and have them return dictionaries or lists of dictionaries. And I guess a "get_cas_multi"? Then a higher level interface can provide the MemcacheValue return type?

I DO kind of like the new ability to do:

value = mc.get('key', get_cas=True)
value.set('new value')

To keep the CAS around and use it for the update, so I like the code pattern that results from MemcacheValue.

I'm quite interested in the feedback and discussion about the current interface.

Anyway, yes, I could see "get" growing a check_keys=True attribute to disable the checking of the keys.

from python-memcached.

linsomniac avatar linsomniac commented on August 12, 2024

What about changing the loop in check_key to:

for char_value in map(ord, key):
     if char_value <33 or char_value == 127:

Or:

reduce(lambda result,x: (result or x<33 or x==127), map(ord, key), False)

from python-memcached.

linsomniac avatar linsomniac commented on August 12, 2024

I've pushed a commit that implements check_key, and it passes the limited tests in memcache.py, but I'd like more review and testing of it. I also added the simplest of the performance improvement patches that I suggested above.

I'd love to push this out with the release I'm working on making in the next day or so, but I understand if you can't look at it that quickly.

3d9c130

from python-memcached.

benhoyt avatar benhoyt commented on August 12, 2024

These things need to be profiled, and I'm afraid the the gain going from the original to the map(ord, key) version is very minimal. What you need to do to really speed this up is to move the character-by-character loop to C. You can do this either by 1) using str.translate() to remove all the control chars and then compare that the length is the same, by 2) using regular expressions to detect that all the chars in the key are valid. As the timings below show, using str.translate() is slightly faster.

The fastest loop-in-Python version is achieved by only calling ord(c) once on each character. But it's still about an order of magnitude slower than the translate or regex versions.

The version with reduce and the lambda (apart from being hard to understand), is the slowest, because it's calling a Python function (the lambda) every time around the loop, and function calls are relatively slow.

Here's the full list below, with performance number using the "timeit" module. I'm using keys of length 36, as this is about how long the keys are in our application. I'd definitely change check_key() to use str.translate() if I were you:

C:\>python -m timeit -s "key = 'abcdefghijklmnopqrstuvwxyz0123456789'"
    "reduce(lambda result,x: (result or x<33 or x==127), map(ord, key), False)"
100000 loops, best of 3: 8.75 usec per loop

C:\>python -m timeit -s "key = 'abcdefghijklmnopqrstuvwxyz0123456789'"
    "any(ord(c) < 33 or ord(c) == 127 for c in key)"
100000 loops, best of 3: 6.85 usec per loop

C:\>python -m timeit -s "key = 'abcdefghijklmnopqrstuvwxyz0123456789'"
    "for c in key:" " if ord(c) < 33 or ord(c) == 127: break"
100000 loops, best of 3: 4.98 usec per loop

C:\>python -m timeit -s "key = 'abcdefghijklmnopqrstuvwxyz0123456789'"
    "for c in map(ord, key):" " if c < 33 or c == 127: break"
100000 loops, best of 3: 4.83 usec per loop

C:\>python -m timeit -s "key = 'abcdefghijklmnopqrstuvwxyz0123456789'"
    "for c in key:" " d = ord(c)" " if d < 33 or d == 127: break"
100000 loops, best of 3: 4.1 usec per loop

C:\>python -m timeit -s "key = 'abcdefghijklmnopqrstuvwxyz0123456789'; import re; valid_re = re.compile('[\x21-\x7e\x80-\xff]+$')"
    "valid_re.match(key)"
1000000 loops, best of 3: 0.63 usec per loop

C:\>python -m timeit -s "key = 'abcdefghijklmnopqrstuvwxyz0123456789'; ctrls = ''.join(chr(i) for i in range(33)) + chr(127)"
    "len(key) != len(key.translate(None, ctrls))"
1000000 loops, best of 3: 0.549 usec per loop

from python-memcached.

benhoyt avatar benhoyt commented on August 12, 2024

I think what you've done with adding check_key to all the functions is okay. However, I think it'd be simpler for users if you just specified check_keys on the Client() constructor, because then performance-sensitive users could just set up their client with Client(servers, check_keys=False) and then use get/set/etc as normal -- makes for a simpler API.

from python-memcached.

linsomniac avatar linsomniac commented on August 12, 2024

Ok, I've pushed a new rev which has the Client change and uses the suggested translate. Can you please review?

from python-memcached.

benhoyt avatar benhoyt commented on August 12, 2024

That's great, thanks! The code looks good, and here's are my performance results (this is a benchmark that does get_multi/set_multi with 100 keys at a time, and then 1000 of those operations):

# original
Total time for sets: 2.820885 seconds
Total time for gets: 2.926937 seconds

# new check_key() with str.translate() and default of check_key=True
Total time for sets: 2.310891 seconds
Total time for gets: 2.276601 seconds

# new check_key() with str.translate() and check_key=False
Total time for sets: 2.008033 seconds
Total time for gets: 2.013389 seconds

So the str.translate() change is a significant improvement by itself (18% faster), and then with check_key=False it's a total of 29% faster in this benchmark.

Can I make one more minor request: that the Client parameter be called check_keys (plural), as it's saying "check all my keys" and not just "check my key"?

from python-memcached.

linsomniac avatar linsomniac commented on August 12, 2024

I've pushed out a 1.50 release that includes these changes.

from python-memcached.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.