Comments (15)
Can you provide more details how picking/unpickling results in max recursion depth errors (maybe you have a .cxt
file that you can share)?
In case this is meant to be for long-time storage, one could consider adding ex/import using e.g. a json or csv-based format.
from concepts.
From pickle docs:
Trying to pickle a highly recursive data structure may exceed the maximum recursion depth, a
RecursionError
will be raised in this case. You can carefully raise this limit withsys.setrecursionlimit()
.
Couldn't pickle
use a stack-based implementation to (un)pickle deeply linked object graphs?
from concepts.
Prototype for long-term serialization and deserialization:
from concepts import Context, EXAMPLE
def to_dict(lattice):
return {
'objects': lattice._context.objects,
'properties': lattice._context.properties,
'context': lattice._context._intents.index_sets(),
'concepts': [(tuple(c._extent.iter_set()),
tuple(c._intent.iter_set()),
tuple(u.index for u in c.upper_neighbors),
tuple(l.index for l in c.lower_neighbors),
) for c in lattice]
}
def from_dict(d):
from concepts.lattices import Lattice, Concept, Infimum, Supremum, Atom
ctx = Context(d['objects'],
d['properties'],
[tuple(i in e for i in range(len(d['properties'])))
for e in map(set, d['context'])])
l = object.__new__(Lattice)
con = [Concept(l,
ctx._Extent.fromint(sum(1 << e for e in ex)),
ctx._Intent.fromint(sum(1 << i for i in in_)),
up, lo)
for ex, in_, up, lo in d['concepts']]
atoms = [con[i] for i in d['concepts'][0][2]]
for index, cc in enumerate(con):
cc.index = index
cc.upper_neighbors = tuple(con[i] for i in cc.upper_neighbors)
cc.lower_neighbors = tuple(con[i] for i in cc.lower_neighbors)
cc.atoms = tuple(a for a in atoms
if cc._extent | a._extent == cc._extent)
for dindex, cc in enumerate(sorted(con, key=lambda c: c._extent.longlex())):
cc.dindex = dindex
mapping = {c._extent: c for c in con}
Lattice._annotate(ctx, mapping)
l._context = ctx
l._concepts = con
l._map = mapping
l.infimum = l._concepts[0]
l.infimum.__class__ = Infimum
l.supremum = l._concepts[-1]
l.supremum.__class__ = Supremum
for a in l.infimum.upper_neighbors:
a.__class__ = Atom
return l
l = Context.fromstring(EXAMPLE).lattice
d = to_dict(l)
ll = from_dict(d)
Format example (could use json
, pprint+ast.literal_eval, yaml
, etc. for transport/storage):
{
'objects': (
'1sg', '1pl', '2sg', '2pl', '3sg', '3pl'
),
'properties': (
'+1', '-1', '+2', '-2', '+3', '-3', '+sg', '+pl', '-sg', '-pl'
),
'context': [
(0, 3, 5, 6, 9),
(0, 3, 5, 7, 8),
(1, 2, 5, 6, 9),
(1, 2, 5, 7, 8),
(1, 3, 4, 6, 9),
(1, 3, 4, 7, 8)
],
'concepts': [
((), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (1, 2, 3, 4, 5, 6), ()),
((0,), (0, 3, 5, 6, 9), (7, 8, 9), (0,)),
((1,), (0, 3, 5, 7, 8), (7, 10, 11), (0,)),
((2,), (1, 2, 5, 6, 9), (8, 12, 13), (0,)),
((3,), (1, 2, 5, 7, 8), (10, 12, 14), (0,)),
((4,), (1, 3, 4, 6, 9), (9, 13, 15), (0,)),
((5,), (1, 3, 4, 7, 8), (11, 14, 15), (0,)),
((0, 1), (0, 3, 5), (18, 19), (1, 2)),
((0, 2), (5, 6, 9), (16, 18), (1, 3)),
((0, 4), (3, 6, 9), (16, 19), (1, 5)),
((1, 3), (5, 7, 8), (17, 18), (2, 4)),
((1, 5), (3, 7, 8), (17, 19), (2, 6)),
((2, 3), (1, 2, 5), (18, 20), (3, 4)),
((2, 4), (1, 6, 9), (16, 20), (3, 5)),
((3, 5), (1, 7, 8), (17, 20), (4, 6)),
((4, 5), (1, 3, 4), (19, 20), (5, 6)),
((0, 2, 4), (6, 9), (21,), (8, 9, 13)),
((1, 3, 5), (7, 8), (21,), (10, 11, 14)),
((0, 1, 2, 3), (5,), (21,), (7, 8, 10, 12)),
((0, 1, 4, 5), (3,), (21,), (7, 9, 11, 15)),
((2, 3, 4, 5), (1,), (21,), (12, 13, 14, 15)),
((0, 1, 2, 3, 4, 5), (), (), (18, 19, 20, 16, 17))
]
}
from concepts.
That response is more than I could hope for! Thanks so much.
I indeed found the same documentation on pickle. So far I tried raising the recursion limit gradually up to 50.000, but without luck. Trying 100.000 now, but testing is very slow due to testing with large concept lattices. FYI I'm running FCA on 500 objects with 30 attributes currently, which results in a lattice with 9 atoms and 443020 concepts. (But do note that pickling also fails with way smaller lattices). I am also testing with the latest pickle protocol currently, because according to the documentation it is designed to deal with very large objects.
But the work you did above is definitely the way to go. I was already considered using json format, but that solution required writing a custom format (like you elegantly did just now) and I didn't know how to proceed. I will try out your idea on a big lattice when I have time and report back to you!
from concepts.
Works like a charm! First I checked that lattices are equal to their reconstruction after deserialization. Then I tested performance on a big lattice. Computing the lattice took me 4+ hours, loading it afterwards from the json file more or less 20 seconds.
I think this is a really cool feature to integrate in your package. I'm for example writing a small "query by navigation" program for navigating lattices based on your package, and then it is very inconvenient if you have to recompute the lattice every time you run the program. Thanks for your effort!
from concepts.
Great, thanks. +1 that this will be a nice feature.
Will start to work on a branch to add this for the next version.
Considering adding 'ordered': True
to the dict to rely/depend on the order of concepts and upper/lower neighbors (to otherwise ensure the order via extra sorting)
from concepts.
merged most of the implementation, see https://github.com/xflr6/concepts/compare/f0a7069153b487aea572bca133f984c5de111056..de2993a2df9d515a84883c19ab788cc274059d3b
from concepts.
Can you maybe check the (speed) difference between these two for a big lattice in master
?
c = concepts.Context(...)
assert c.lattice
c.tojson(f)
concepts.Context.fromjson(f)
# vs.
concepts.Context.fromjson(f, raw=True)
from concepts.
Nice that you merged it! I can test that, but I only have time next week to do so. Feel free to remind me if after next week I have not responded yet :-)
from concepts.
No rush, just curious.. :) I think I'm getting close: tell me if you have any concerns, especially about the api.
from concepts.
released concepts 0.9
from concepts.
Hi there, I ran a test on the biggest concept lattice I currently have available, with the following code:
import timeit
setup = '''
from concepts import tools, contexts
import concepts
load = concepts.Context.fromjson
'''
print(min(timeit.Timer('load("lattice_sorted.json", raw=False)', setup=setup).repeat(100, 1)))
I tested on a lattice with 143906 concepts, once with raw=False and once with raw=True.
This gave the following (fastest) run times:
For raw=False
1.03071959
sec.
For raw=True
1.03416534
sec.
However, I must make a note here. Your approach is to serialize the whole context + lattice to json (which makes sense!), but in my previous work I used the code from here to serialize the lattice only.
I wanted to avoid recomputing my big lattice, only in order to serialize it together with its context.
What I did instead is sorting the json of the lattice on keys for the above testing.
I'm not sure if that is the only change you made, and whether the above results generalize to the code in master
.
The specs of the pc on which I tested are:
- 8 GB RAM
- 3.10GHz i5-4440
from concepts.
Thanks. I should have noted that the released implementation differs from the prototype above in using 'lattice'
as key for the concepts list instead of 'concepts'
, so to re-use the json-file you have created with the prototype you might need to rename 'concepts'
to 'lattice'
(judging from the numbers, you are probably only loading the context currently, hence there is no difference between the default raw=False
and raw=True
). To make this more explicit you can add require_lattice=True
to your load()
.
Btw, the order of the keys does not matter with either setting (see here and here for details).
from concepts.
Ah that explains a lot. Here are the updated times, but now with the best time for 10 repetitions rather than 100:
For raw=False
28.72975
sec.
For raw=True
354.92165
sec.
Hope that info is helpful :-)
from concepts.
Thanks, definitely helpful. That's consistent with my experiments.
I just wanted to confirm that relying on the order in the serialization significantly speeds things up (otherwise I would have considered to make the raw=True
behaviour the default and only code path). I think raw=True
might be useful for using other tools to calculate the lattice and convert it into the format here (but whithout the need to replicate the order used/expected by this library).
from concepts.
Related Issues (8)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from concepts.