evandempsey / fp-growth Goto Github PK
View Code? Open in Web Editor NEWPython implementation of the Frequent Pattern Growth algorithm
License: ISC License
Python implementation of the Frequent Pattern Growth algorithm
License: ISC License
`import pyfpgrowth
item_list = [[11, 13],
[4, 12, 13],
[4, 13, 14, 17],
[7, 11, 13, 14, 17],
[2, 4, 13, 14],
[13, 14],
[2, 7, 11, 13, 14],
[4, 13, 14, 15],
[6, 11, 13],
[11, 13],
[11, 13, 14, 17],
[7, 11, 13, 14],
[2, 11, 12, 13, 14],
[7, 11, 13],
[11, 13],
[7, 8, 13, 14, 17],
[0, 7, 11, 13],
[2, 11, 13, 14, 15],
[7, 11, 12, 13, 14],
[11, 12, 13, 14]]
patterns = pyfpgrowth.find_frequent_patterns(item_list, 1)`
My code and dataset is defined as above. The problem is when I try patterns = pyfpgrowth.find_frequent_patterns(item_list[:15], 1), I can find the key (7,) in patterns, but when I use the whole item_list, key(7,) is missing. Is this a bug or I get something wrong with the algorithm?
Thanks.
I have dataset transaction, when tried wih your lib, i got confidence value more than 1.0, example 1.5, 1.2,,
link dataset -> http://pastebin.com/LTLVPGgC
Frequency for only one element is not counted in function zip_patterns. It can be fixed by adding frequency of root in the return value of that function. Btw really nice code, well organized and very clean.
Is there some way to extract antecedent and consequent from the rules...??
find_frequent_patterns(transactions, threshold)
is not affected by the threshold value. It returns the same dictionary (same itemsets) when I set it to 0.0 or 1.0.
It should return an empty dict for 0.0
Basically, find_frequent_patterns(my_transactions, 0.0) == find_frequent_patterns(my_transactions, 1.0)
is True
.
Is this how it is supposed to be used?
EDIT: Figured it out! Works just fine!
patterns = fpg.find_frequent_patterns(transactions, 399)
print "patterns:",patterns
rule = fpg.generate_association_rules(patterns,0.23)
print "rule:",rule
patterns: {(11623791,): 718, (1200977055, 1201023876, 1201116375, 1201268126, 1201562424): 805, (1201207701,): 2217, (1201451040,): 445, (1201345366,): 1844, (1201294590,): 3846, (1200745609,): 1229, (1201023876, 1201116375, 1201562424): 830, (1201532784,): 1535, (1201486603,): 2022, (11738534,): 3125, (1201350860,): 476, (1200977055, 1201268126): 962, (1201557016,): 1151, (1201542715,): 1785, (1201023876, 1201116375, 1201268126): 816, (1200419767,): 656, (1201457994,): 1677, (1201268126,): 1320, (1201113796,): 503, (1200891159,): 1929, (1201322512,): 1538, (10104273,): 861, (1201613420,): 4840, (1200354823,): 1541, (1200432941,): 497, (1201305683,): 406, (1200457081,): 501, (10814383,): 2000, (1201499030,): 1498, (1201052503,): 2485, (1201157090,): 1463, (1200163273,): 555, (1201631999,): 3614, (11249445,): 1426, (1200961536,): 2807, (1201383817,): 617, (1201467922,): 3863, (1200427431,): 658, (1200790003,): 3589, (1200982196,): 2010, (1201562424,): 3043, (1201273973,): 1382, (1201503232,): 2539, (1201394998,): 583, (1200159324,): 630, (1201570908,): 1860, (1201403614,): 3257, (1201273966,): 1145, (1201023876, 1201116375): 830, (1200977055, 1201023876, 1201116375, 1201562424): 812, (1201477137,): 2218, (1200333505,): 1286, (1200437714,): 1706, (1200743214,): 3354, (1201513338,): 2445, (1201474472,): 2813, (1201322043,): 762, (1200885766, 1201352808): 478, (10647548,): 568, (1201094551,): 686, (1201121314,): 518, (1201462793,): 2257, (1201578718,): 18168, (1201202122,): 873, (1201140837,): 499, (1201345317,): 2145, (1201462348,): 1218, (1200297191,): 4166, (1201503232, 1201613420): 1306, (1201494873,): 917, (1201494927,): 614, (1200786101,): 515, (1200974125,): 930, (1200697819,): 1062, (1201392758,): 735, (1201164589, 1201273973): 518, (1200957738,): 2042, (1201271746,): 2369, (1200385629,): 2088, (1201138670,): 2535, (1201392771,): 612, (1201054494,): 2307, (1201526185,): 1783, (11204831,): 4291, (1201395562,): 992, (10643879,): 1706, (1200335185,): 449, (1200691591,): 1967, (1201429522,): 760, (1200242349,): 2003, (1201504568,): 955, (1200283758,): 479, (1201023876, 1201116375, 1201268126, 1201562424): 816, (1201149664,): 5958, (1201349905,): 1444, (1200172387,): 1984, (1201208643,): 2195, (1200977055, 1201023876, 1201116375, 1201268126): 805, (1201352808, 1201457994): 451, (1201169598,): 2809, (1201518719,): 482, (1201522186,): 1887, (1200708773,): 1107, (11146699,): 3246, (1201470054,): 792, (1200977055, 1201023876, 1201562424): 935, (1201216728,): 1167, (1200371059,): 1981, (11708313,): 549, (1200352602,): 1639, (1201446427,): 737, (1201123510,): 536, (1200311105,): 666, (1200212508,): 837, (1201468581,): 5954, (1201578729,): 550, (1201213230,): 4866, (1200812613,): 3369, (1201027792,): 552, (1201265492,): 490, (1200301742,): 513, (1200455993,): 2665, (1201270886,): 3641, (11702895,): 1040, (1200735969,): 490, (1201116375, 1201268126, 1201562424): 823, (1201349181,): 519, (1201432931,): 778, (1200785918,): 511, (1200250761,): 402, (1201424420,): 2467, (1201332927,): 2825, (10386250,): 513, (1201343088,): 409, (1201421617,): 421, (1201023876, 1201562424): 1291, (1200777787,): 973, (1200106637,): 1013, (1201268126, 1201562424): 1314, (11006454,): 659, (1201234460,): 475, (1200799042,): 400, (1201237981,): 1253, (1201352808,): 9725, (1201079811,): 720, (1200961550,): 3528, (1201527994,): 2485, (1200977055, 1201562424): 1014, (1201393734,): 481, (1200404231,): 640, (1201088308,): 715, (1201116375, 1201562424): 844, (1201564142,): 815, (1201148486,): 1044, (11144143,): 1910, (11240917,): 1220, (1200277793,): 1722, (1200415164,): 1925, (11253335,): 8420, (1201557730,): 419, (1201116375, 1201268126): 823, (1200228384,): 1684, (1200423973,): 1601, (1201097392,): 448, (1201284505,): 1586, (1201033228,): 497, (1201461902,): 622, (1200223668,): 577, (1200977055, 1201268126, 1201562424): 959, (1201352808, 1201532784): 472, (1200450459,): 2716, (1200368622,): 631, (1201270977,): 480, (11676618,): 1816, (1201079287,): 8783, (11121832,): 571, (1201096003,): 1002, (1201406468,): 683, (1200977055, 1201116375, 1201562424): 815, (1201172110,): 399, (1201558444,): 2911, (1201023876, 1201268126, 1201562424): 1221, (1200220152,): 6894, (11161428,): 541, (1201287663,): 781, (1201466389,): 456, (1201357052,): 2881, (1201306914,): 1849, (1200084847,): 1208, (1201492862,): 1546, (1201487585,): 2581, (1201443312,): 427, (1201478539,): 978, (1201328361,): 1158, (1200301745,): 2233, (1201286776,): 2917, (1200285785,): 706, (1200977055, 1201116375, 1201268126, 1201562424): 808, (10447424,): 6067, (1200977055, 1201023876, 1201268126): 896, (1200982785,): 601, (1201390492,): 1768, (1201164589, 1201273966): 429, (1200910312,): 613, (1200290334,): 814, (1201030623,): 2313, (1200977055, 1201023876, 1201116375): 812, (1200869612,): 1898, (1201187602,): 1308, (1200885766,): 2558, (1201341673,): 1448, (1201504572,): 592, (1200377480,): 3378, (11501859,): 2479, (1201363125,): 2223, (1201577226,): 2591, (1201536832,): 1903, (1200904137,): 1108, (1201093617,): 1401, (1200977055, 1201023876, 1201268126, 1201562424): 895, (11698907,): 1880, (11223667,): 2141, (1201065998,): 527, (1200743220,): 626, (1200991695,): 646, (1200070389,): 654, (1201023876, 1201268126): 1222, (1200689921,): 3034, (1201492855,): 1386, (1201502466,): 2214, (11240936,): 650, (1201577961,): 513, (1200279428,): 456, (1200979992,): 2950, (1201338800,): 3385, (11567723,): 532, (10973190,): 494, (1200743212,): 799, (1201592045,): 5235, (1200977055, 1201116375, 1201268126): 808}
rule: {(1201023876, 1201116375, 1201268126): ((1200977055,), 0.9865196078431373), (1201116375, 1201268126, 1201562424): ((1200977055,), 0.9817739975698664), (1200977055, 1201023876, 1201268126, 1201562424): ((1201116375,), 0.8994413407821229), (1201532784,): ((1201352808,), 0.30749185667752443), (1201023876, 1201562424): ((1200977055, 1201268126), 0.6932610379550735), (1200977055, 1201268126): ((1201116375,), 0.83991683991684), (1201268126, 1201562424): ((1200977055, 1201023876), 0.6811263318112634), (1200977055, 1201023876, 1201268126): ((1201562424,), 0.9988839285714286), (1200977055, 1201116375, 1201268126, 1201562424): ((1201023876,), 0.9962871287128713), (1201268126,): ((1200977055, 1201116375), 0.6121212121212121), (1200977055, 1201023876, 1201116375): ((1201268126,), 0.9913793103448276), (1200977055, 1201562424): ((1201023876, 1201268126), 0.8826429980276134), (1201613420,): ((1201503232,), 0.26983471074380166), (1201562424,): ((1200977055, 1201023876, 1201268126), 0.29411764705882354), (1201116375, 1201562424): ((1200977055, 1201268126), 0.957345971563981), (1201273966,): ((1201164589,), 0.3746724890829694), (1201023876, 1201116375, 1201268126, 1201562424): ((1200977055,), 0.9865196078431373), (1200977055, 1201023876, 1201116375, 1201268126): ((1201562424,), 1.0), (1201273973,): ((1201164589,), 0.3748191027496382), (1201023876, 1201116375, 1201562424): ((1201268126,), 0.983132530120482), (1200977055, 1201023876, 1201562424): ((1201268126,), 0.9572192513368984), (1200977055, 1201268126, 1201562424): ((1201023876,), 0.9332638164754953), (1201116375, 1201268126): ((1200977055,), 0.9817739975698664), (1201503232,): ((1201613420,), 0.5143757384797164), (1201457994,): ((1201352808,), 0.2689326177698271), (1200977055, 1201116375, 1201562424): ((1201268126,), 0.9914110429447853), (1201023876, 1201116375): ((1200977055,), 0.9783132530120482), (1200977055, 1201023876, 1201116375, 1201562424): ((1201268126,), 0.9913793103448276), (1201023876, 1201268126, 1201562424): ((1200977055,), 0.733005733005733), (1200977055, 1201116375, 1201268126): ((1201562424,), 1.0), (1201023876, 1201268126): ((1200977055, 1201562424), 0.7324058919803601)}
in doc, you write patterns = pyfpgrowth.find_frequent_patterns(transactions, 2)
, 2 is minium support right ? in my knowledge, FP-growth has minimum support and minimum confidence in % , my question is, what scale of minimum support ? thx
OS: Ubuntu 16.04 64bit
fpgrowth version 1.0, installed by pip install pyfpgrowth in virtualenv python3.5
I want use pyfpgrowth to mining trajectory latitude data, such as 41.804068
41.802275
41.802275
41.802383
41.802383
41.802383
41.802756
41.802756
41.802756
41.802756
41.802756
41.802756
41.802756
41.80275
41.80275
41.80275
41.80275
41.80275
41.80275
input:8 pieces of gps location latitude data, with length 52, 44,39,72, 41, 40, 164, 189.
program is
patterns = pyfpgrowth.find_frequent_patterns(transactions, 2)
rules = pyfpgrowth.generate_association_rules(patterns, 0.7)
It used out 32GB memory, run in single thread, lasting 16 hours and still can not output , then I kill the process. Could anyone tell me how to do? I would appreciate for your valuable suggestion.
Hi All,
In working on this issue for my own work I found a problem with the generate_association rules module specifically with the line below
if confidence >= confidence_threshold:
rules[antecedent] = (consequent, confidence)
This only allows effectively one rule per item(antecedent) and thus the higher order rules per item where they exist are essentially over riding the simple 1 -> 1 items.
I addressed this by changing the code so that instead of setting the rules dict value each time - if a rule already exists for an item(antecendent)- I then append the new rule as a list to the dict value for that item as follows
if confidence >= confidence_threshold:
rule1=(consequent, confidence)
rule1=list(rule1)
if antecedent in rules:
rules[antecedent].append(rule1)
else:
rules[antecedent]=rule1
The rules of course then have to be unravelled in a slightly more complex manner but this worked well .
First rule is value[0] -> value[1] . Subsequent rules are stored as list in value[n] so the unravelling is as follows where prel is the antecedent list ,postl is the consequent list and confl is the confidence list
for key,value in rules.iteritems():
if len(value)> 2: #If item has more than 1 rule
for i in range(2,len(value)):# For rule 2 and subsequent rules
prel.append(key)
postl.append(value[i][0])
confl.append(value[i][1])
prel.append(key)
postl.append(value[0])
confl.append(value[1])
I'm sure there's a neater way of resolving this issue but it is an important constraint on the scope of the algorithm . The algorithm is super fast and excellent otherwise and a pity if not being used for this reason . Comments welcome - would be great if the code could be updated.
I get different rules, each time I run the code.
transaction = [['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14',\
'15', '16', '7', '17', '18', '19', '20', '21', '7', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '29', '30'], ['32', '33'],\
['34', '34', '34', '34', '34', '34', '34', '34', '35', '35', '36', '36', '34', '34', '37', '37', '37', '37',\
'37', '37', '34', '34', '29', '30', '36', '36', '36', '36', '36', '36']]
import pyfpgrowth as fg
patterns = fg.find_frequent_patterns(transaction, 3)
rules = fg.generate_association_rules(patterns, 0.1)
Can anyone see it ?
totally!
I am having the following MemoryError problem. I am running on Python 3.6. Is there any solution to this?
MemoryError Traceback (most recent call last)
in
----> 1 patterns = pyfpgrowth.find_frequent_patterns(tweets_all_day_fp, 3)
2
3 print("Patterns :", patterns)
~\PycharmProjects\Birlestirme_New_2\pyfpgrowth.py in find_frequent_patterns(transactions, support_threshold)
251 """
252 tree = FPTree(transactions, support_threshold, None, None)
--> 253 return tree.mine_patterns(support_threshold)
254
255
~\PycharmProjects\Birlestirme_New_2\pyfpgrowth.py in mine_patterns(self, threshold)
151 """
152 if self.tree_has_single_path(self.root):
--> 153 return self.generate_pattern_list()
154 else:
155 return self.zip_patterns(self.mine_sub_trees(threshold))
~\PycharmProjects\Birlestirme_New_2\pyfpgrowth.py in generate_pattern_list(self)
191 pattern = tuple(sorted(list(subset) + suffix_value))
192 patterns[pattern] =
--> 193 min([self.frequent[x] for x in subset])
194
195 return patterns
MemoryError:
When using find_frequent_patterns(), certain itemsets are missing in the patterns although they have frequence above the threshold support provided, hence also causing certain rules to be missing.
For example, consider the following dataset:
Transaction ID | Items_bought |
---|---|
T100 | {M, O, N, K, E, Y} |
T200 | {D, O, N, K, E, Y} |
T300 | {M, A, K, E} |
T400 | {M, U, C, K, Y} |
T500 | {C, O, O, K, I, E} |
I used the following code to find frequent patterns:
import pyfpgrowth as fp
items_bought = [['M','O','N','K','E','Y'], ['D','O','N','K','E','Y'], ['M','A','K','E'],
['M','U','C','K','Y'], ['C','O','O','K','I','E']]
min_support_count = 3
min_conf = 0.8
freq_patterns = fp.find_frequent_patterns(items_bought, min_support_count)
print(freq_patterns)
I get the following output:
{('M',): 3, ('K', 'M'): 3, ('Y',): 3, ('K', 'Y'): 3, ('O',): 4, ('K', 'O'): 4, ('E', 'O'): 4, ('E', 'K'): 4, ('E', 'K', 'O'): 4, ('K',): 5}
This output is missing the itemset ('E',) with frequency of 4 (> threshold of 3).
Due to this, two rules are also missing when generating association rules.
Please look into the issue!!
In your library, you just show confidence value each rules, but i thing it's usefull when you show support too
This function can trivially be rewritten to avoid recursion:
fp-growth/pyfpgrowth/pyfpgrowth.py
Lines 109 to 133 in 6bf4503
Rather than recurse, loop over the items:
def insert_tree(self, items, node, headers):
"""Grow FP tree iteratively"""
for item in items:
child = node.get_child(item)
if child is not None:
child.count += 1
else:
# Add new child.
child = node.add_child(item)
# Link it to header structure.
if headers[item] is None:
headers[item] = child
else:
current = headers[item]
while current.link is not None:
current = current.link
current.link = child
# continue inserting from the child node
node = child
All that the recursive call does is to use child
as the node
value for the next item in the items
sequence. Just set node
to child
at the end of a loop instead.
I have a list of 30 transactions with up to 50 items per transaction. I am having the following MemoryError problem. I am running on Python 3.6, my PC has 4GB of RAM.
Is there any solution to this? To make it work with this number of items per transactions?
Traceback (most recent call last): File "C:/Users/Edoniti/PycharmProjects/Project/fp-growth/fp-growth1.py", line 15, in <module> patterns = pyfpgrowth.find_frequent_patterns(transactions, 2) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 253, in find_frequent_patterns return tree.mine_patterns(support_threshold) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 155, in mine_patterns return self.zip_patterns(self.mine_sub_trees(threshold)) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 235, in mine_sub_trees subtree_patterns = subtree.mine_patterns(threshold) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 155, in mine_patterns return self.zip_patterns(self.mine_sub_trees(threshold)) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 235, in mine_sub_trees subtree_patterns = subtree.mine_patterns(threshold) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 155, in mine_patterns return self.zip_patterns(self.mine_sub_trees(threshold)) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 235, in mine_sub_trees subtree_patterns = subtree.mine_patterns(threshold) File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 153, in mine_patterns return self.generate_pattern_list() File "C:\Users\Edoniti\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pyfpgrowth\pyfpgrowth.py", line 193, in generate_pattern_list min([self.frequent[x] for x in subset]) MemoryError
missing itemsets in patterns.
Also plz add LIft
to filter rules.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.