Giter VIP home page Giter VIP logo

bitmapist's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bitmapist's Issues

A Compliment and Quandary

A co-worker of mine showed me this library. I think this is an excellent use of bitmaps! I think I see another interesting extension of this bitmapist library and I wanted to get your feedback.

I work for a company that needs an almost real-time recommendation engine. I started thinking about the application of bitmaps into collaborative filtering. Basically in collaborative filtering you need a matrix that is comprised of [users * products]

The idea in this instance would basically be to have a SETBIT users:20160614:1 [product_id] 1 for each user that is representative of what product they like. You would also need to have a SETBIT products:20160614:1 [user_id] 1 this would have an index into each user.

Wait I think I see a problem with this. How would we represent an empty space in the matrix? A cooccurrence matrix has 3 states (Like, Dislike, Unknown). I guess you could probably combine them in some way or maybe my bit operations are rusty.

The main benefits in this would be the storage savings that this could have. When I did a quick calculation based on traffic a site like ours would see in a day it would be about (700 products * 300k users) 26.25MB

I understand that you are probably not interested in having this be part of this repo. I'd mainly like your feedback and advice. Thanks!

Calling `delete_all_events` when db is empty throws exception

This happens when using bitmapist-server as the backend. Possibly with redis too.

def delete_all_events(system='default'):
    """
    Delete all events from the database.
    """
    cli = get_redis(system)
    keys = cli.keys('trackist_*')  # <- None
    if len(keys) > 0:
        cli.delete(*keys)
  File "../lib/python3.7/site-packages/bitmapist/__init__.py", line 272, in delete_all_events
    if len(keys) > 0:
TypeError: object of type 'NoneType' has no len()

UniqueEvents query is very slow

I'm trying to store the user likes in redis, using bitmaps to store this question_id is liked by these users. But apparently, unique events is somehow way slow for the operation.

In [1]: from bitmapist import mark_unique
In [2]: mark_unique("question_likes:1234", 567463)
In [3]: mark_unique("question_likes:1234", 5637363)
In [4]: mark_unique("question_likes:1234", 7363)
In [5]: mark_unique("question_likes:1234", 731263)
In [6]: mark_unique("question_likes:1234", 731263)
In [7]: from bitmapist import UniqueEvents
In [38]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:uids = UniqueEvents('question_likes:1234').get_uuids()
:t = time()
:for i in uids:
:    print i
:print "elapsed time:", time() - t
:--
7363
567463
731263
5637363
elapsed time: 14.1893291473

I'm running the latest bitmapist on Redis server v=4.0.9 on linux mint.

Her is th debug object output for the key:

127.0.0.1:6379> DEBUG OBJECT trackist_question_likes:1234_u
Value at:0x7f979b6a51e0 refcount:1 encoding:raw serializedlength:8036 lru:12900027 lru_seconds_idle:252

Consider removing "BitOpNot" to avoid misusage.

TLDR;

This is not really a "code bug", but rather a potential misusage issue with the "BitOpNot" operator. The "BitOpNot" does exactly what it supposes to do (which is flipping the bit). However, since we are using bitmapist as a stat tool, we expect it to give us the result of negation of a population set, which "BitOpNot" does not provide.

Example

Suppose you have total of 100 users. You use bitmapist to mark active user with event "active". Assume you have 5 active user and the bitmap data look like this:

1011011

Now you want to count how many active user you have

print len(MonthEvents('active', now.year, now.month)) 
#print 5, correct number.

Now suppose you wanna know how many inactive user you have.

print len(BitOpNot(MonthEvents('active', now.year, now.month)))  
#print 2. 

Here the negation of the "active user set" gives us size of 2 instead of 95.

The fundamental problem is that the variable length bitmap data only contain information about "who are active", but it does not contain information about the population size.

should have a method to return an iterable of all set values

I use something like

    def yield_values(self):                                                                                                                                 
        cli = self.redis_client
        s = cli.get(self.redis_key)
        for c in s:
            bits = bin(ord(c))[2:]
            bits = '00000000'[len(bits):] + bits
            for i, b in enumerate(bits):
                v = int(b)
                if v:
                    yield i

    def get_values(self):
        return list(self.yield_values())

on yipit's class based version

Total Events

Is there currently way to determine the total number of occurrences for a given event?

Would there be a way to achieve this within scope's of Weeks|Months|YearsEvents()

Behavior of NOT operator on empty bitmaps

Hello,

I noticed a little quirk on NOT operators for empty bitmaps.

Say for example that a bitmap represents a byte of 0's (0000 0000).
When this byte is negated, it should give back (1111 1111) or a value of 255.
Instead, I am getting back another 0.

On non-empty bitmaps, the NOT operator seems to work fine, only up to the highest flagged bit.

Currently, if some bitmaps are empty, I have to manually flag/mark an event with a large dummy ID in order to make the NOT operator work.

Are there better ways of accomplishing this?

Solution for large ids by adding O(1) lookup

In your documentation it reads:

Using Redis bitmaps you can store events for millions of users in a very little amount of memory (megabytes). You should be careful about using huge ids (e.g. 2^32 or bigger) as this could require larger amounts of memory.

It might be a potential solution to create a hash table that keeps track of huge ids and maps them back down to smaller indexes.

For example, a user with an id of 192329230202 could be mapped to a smaller index 1 in the bitmap. This would require an O(1) lookup before a `SETBIT' so it shouldn't affect time performance, but it would require more space on disk.

Steps to implement:

  1. A new user performs an signup event.
  2. Find the users_counter for the current day GET "users_counter:20160614" which would respond back with something like 2.
  3. Add new User(id:192329230202) to the user_index table and reassign to User(internal_id:3). This would do something like SET "users_index:20160614:192329230202" 3
  4. Once this is completed successfully you would want to run INCR users_count:20160614
  5. SETBIT "events:search:20160614" 3 1 bit into feature signup at User(3) index instead of at the end of the bitmap which requires creating and storing empty bits.

This would most likely add complexity into how your query the data and you would have to store a reference to lookup each user. In my proposal I used a different lookup per day to reset the bitmap indexes everyday, but this might be more trouble than just maintaining one large ongoing table.

Let me know your thoughts and if this something you are interested in promoting into an enhancement.

Storing extra data

Is there a more efficient way to store extra data for scenarios like 'user x replied question y correctly|falsely in z seconds'?

I think implementations such as

mark_event("question:y:x:1/0', z)

would be neither effective nor useful for queries.

Some examples on this would be very helpful.

Populating the data

How do I populate data into Bitmapist so that I can play with your queries..

Do you have a utility to populate domain/ application specific data and run bunch of the queries as you had put and gauge, how fast it works..

That will be helpful.

Krishna

Error running cohort demo

Running

from bitmapist import mark_event
from bitmapist import cohort as bitmapist_cohort

mark_event('active', 123)
mark_event('song:add', 123)
mark_event('song:play', 123)

html_form = bitmapist_cohort.render_html_form(
    action_url='/_Cohort',
    selections1=[ ('Are Active', 'active'), ],
    selections2=[ ('Played song', 'song:play'), ],
    time_group='days',
    select1='active',
    select2='song:play'
)

dates_data = bitmapist_cohort.get_dates_data('active','song:play', 'days','default')

html_data = bitmapist_cohort.render_html_data(dates_data, 'days')

I get:

  File "/usr/local/lib/python2.7/dist-packages/bitmapist/cohort/__init__.py", line 191, in get_dates_data
    for i in range(0, date_range):
UnboundLocalError: local variable 'date_range' referenced before assignment

add mako to setup.py

ImportError: No module named mako.lookup

Your cohort package requires mako, yet it's not listed in the install_requires.

ResponseError: bit offset is not an integer or out of range

>>> myid
10204510554222024
>>> mark_event('active', myid)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/sardor/.virtualenvs/tarjimonlar/local/lib/python2.7/site-packages/bitmapist/__init__.py", line 173, in mark_event
    client.execute() 
  File "/home/sardor/.virtualenvs/tarjimonlar/local/lib/python2.7/site-packages/redis/client.py", line 2578, in execute
    return execute(conn, stack, raise_on_error)
  File "/home/sardor/.virtualenvs/tarjimonlar/local/lib/python2.7/site-packages/redis/client.py", line 2492, in _execute_transaction
    self.raise_first_error(commands, response)
  File "/home/sardor/.virtualenvs/tarjimonlar/local/lib/python2.7/site-packages/redis/client.py", line 2526, in raise_first_error
    raise r
ResponseError: Command # 1 (SETBIT trackist_active_2015-3 10204510554222024 1) of pipeline caused error: bit offset is not an integer or out of range
>>> 

Can user IDs only be integers ?

Apparently the user IDs are the offset, so whenever I mark an event, you set 1 on the offset which is the user ID. Does this mean user IDs can never be strings ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.