Giter VIP home page Giter VIP logo

Comments (15)

xflr6 avatar xflr6 commented on July 19, 2024

Can you provide more details how picking/unpickling results in max recursion depth errors (maybe you have a .cxt file that you can share)?

In case this is meant to be for long-time storage, one could consider adding ex/import using e.g. a json or csv-based format.

from concepts.

xflr6 avatar xflr6 commented on July 19, 2024

From pickle docs:

Trying to pickle a highly recursive data structure may exceed the maximum recursion depth, a RecursionError will be raised in this case. You can carefully raise this limit with sys.setrecursionlimit().

Couldn't pickle use a stack-based implementation to (un)pickle deeply linked object graphs?

from concepts.

xflr6 avatar xflr6 commented on July 19, 2024

Prototype for long-term serialization and deserialization:

from concepts import Context, EXAMPLE

def to_dict(lattice):
    return {
        'objects': lattice._context.objects,
        'properties': lattice._context.properties,
        'context': lattice._context._intents.index_sets(),
        'concepts': [(tuple(c._extent.iter_set()),
                      tuple(c._intent.iter_set()),
                      tuple(u.index for u in c.upper_neighbors),
                      tuple(l.index for l in c.lower_neighbors),
                     ) for c in lattice]
    }

def from_dict(d):
    from concepts.lattices import Lattice, Concept, Infimum, Supremum, Atom
    ctx = Context(d['objects'],
                  d['properties'],
                  [tuple(i in e for i in range(len(d['properties'])))
                   for e in map(set, d['context'])])
    l = object.__new__(Lattice)
    con = [Concept(l,
                   ctx._Extent.fromint(sum(1 << e for e in ex)),
                   ctx._Intent.fromint(sum(1 << i for i in in_)),
                   up, lo)
           for ex, in_, up, lo in d['concepts']]
    atoms = [con[i] for i in d['concepts'][0][2]]
    for index, cc in enumerate(con):
        cc.index = index
        cc.upper_neighbors = tuple(con[i] for i in cc.upper_neighbors)
        cc.lower_neighbors = tuple(con[i] for i in cc.lower_neighbors)
        cc.atoms = tuple(a for a in atoms
                         if cc._extent | a._extent == cc._extent)
    for dindex, cc in enumerate(sorted(con, key=lambda c: c._extent.longlex())):
        cc.dindex = dindex
    mapping = {c._extent: c for c in con}
    Lattice._annotate(ctx, mapping)
    l._context = ctx
    l._concepts = con
    l._map = mapping
    l.infimum = l._concepts[0]
    l.infimum.__class__ = Infimum
    l.supremum = l._concepts[-1]
    l.supremum.__class__ = Supremum
    for a in l.infimum.upper_neighbors:
        a.__class__ = Atom  
    return l
    
l = Context.fromstring(EXAMPLE).lattice
d = to_dict(l)
ll = from_dict(d)

Format example (could use json, pprint+ast.literal_eval, yaml, etc. for transport/storage):

{
    'objects': (
        '1sg', '1pl', '2sg', '2pl', '3sg', '3pl'
    ),
    'properties': (
        '+1', '-1', '+2', '-2', '+3', '-3', '+sg', '+pl', '-sg', '-pl'
    ),
    'context': [
        (0, 3, 5, 6, 9),
        (0, 3, 5, 7, 8),
        (1, 2, 5, 6, 9),
        (1, 2, 5, 7, 8),
        (1, 3, 4, 6, 9),
        (1, 3, 4, 7, 8)
    ],
    'concepts': [
        ((), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (1, 2, 3, 4, 5, 6), ()),
        ((0,), (0, 3, 5, 6, 9), (7, 8, 9), (0,)),
        ((1,), (0, 3, 5, 7, 8), (7, 10, 11), (0,)),
        ((2,), (1, 2, 5, 6, 9), (8, 12, 13), (0,)),
        ((3,), (1, 2, 5, 7, 8), (10, 12, 14), (0,)),
        ((4,), (1, 3, 4, 6, 9), (9, 13, 15), (0,)),
        ((5,), (1, 3, 4, 7, 8), (11, 14, 15), (0,)),
        ((0, 1), (0, 3, 5), (18, 19), (1, 2)),
        ((0, 2), (5, 6, 9), (16, 18), (1, 3)),
        ((0, 4), (3, 6, 9), (16, 19), (1, 5)),
        ((1, 3), (5, 7, 8), (17, 18), (2, 4)),
        ((1, 5), (3, 7, 8), (17, 19), (2, 6)),
        ((2, 3), (1, 2, 5), (18, 20), (3, 4)),
        ((2, 4), (1, 6, 9), (16, 20), (3, 5)),
        ((3, 5), (1, 7, 8), (17, 20), (4, 6)),
        ((4, 5), (1, 3, 4), (19, 20), (5, 6)),
        ((0, 2, 4), (6, 9), (21,), (8, 9, 13)),
        ((1, 3, 5), (7, 8), (21,), (10, 11, 14)),
        ((0, 1, 2, 3), (5,), (21,), (7, 8, 10, 12)),
        ((0, 1, 4, 5), (3,), (21,), (7, 9, 11, 15)),
        ((2, 3, 4, 5), (1,), (21,), (12, 13, 14, 15)),
        ((0, 1, 2, 3, 4, 5), (), (), (18, 19, 20, 16, 17))
    ]
}

from concepts.

EdwinWenink avatar EdwinWenink commented on July 19, 2024

That response is more than I could hope for! Thanks so much.

I indeed found the same documentation on pickle. So far I tried raising the recursion limit gradually up to 50.000, but without luck. Trying 100.000 now, but testing is very slow due to testing with large concept lattices. FYI I'm running FCA on 500 objects with 30 attributes currently, which results in a lattice with 9 atoms and 443020 concepts. (But do note that pickling also fails with way smaller lattices). I am also testing with the latest pickle protocol currently, because according to the documentation it is designed to deal with very large objects.

But the work you did above is definitely the way to go. I was already considered using json format, but that solution required writing a custom format (like you elegantly did just now) and I didn't know how to proceed. I will try out your idea on a big lattice when I have time and report back to you!

from concepts.

EdwinWenink avatar EdwinWenink commented on July 19, 2024

Works like a charm! First I checked that lattices are equal to their reconstruction after deserialization. Then I tested performance on a big lattice. Computing the lattice took me 4+ hours, loading it afterwards from the json file more or less 20 seconds.

I think this is a really cool feature to integrate in your package. I'm for example writing a small "query by navigation" program for navigating lattices based on your package, and then it is very inconvenient if you have to recompute the lattice every time you run the program. Thanks for your effort!

from concepts.

xflr6 avatar xflr6 commented on July 19, 2024

Great, thanks. +1 that this will be a nice feature.

Will start to work on a branch to add this for the next version.

Considering adding 'ordered': True to the dict to rely/depend on the order of concepts and upper/lower neighbors (to otherwise ensure the order via extra sorting)

from concepts.

xflr6 avatar xflr6 commented on July 19, 2024

merged most of the implementation, see https://github.com/xflr6/concepts/compare/f0a7069153b487aea572bca133f984c5de111056..de2993a2df9d515a84883c19ab788cc274059d3b

from concepts.

xflr6 avatar xflr6 commented on July 19, 2024

Can you maybe check the (speed) difference between these two for a big lattice in master?

c = concepts.Context(...)
assert c.lattice
c.tojson(f)

concepts.Context.fromjson(f)
# vs.
concepts.Context.fromjson(f, raw=True)

from concepts.

EdwinWenink avatar EdwinWenink commented on July 19, 2024

Nice that you merged it! I can test that, but I only have time next week to do so. Feel free to remind me if after next week I have not responded yet :-)

from concepts.

xflr6 avatar xflr6 commented on July 19, 2024

No rush, just curious.. :) I think I'm getting close: tell me if you have any concerns, especially about the api.

from concepts.

xflr6 avatar xflr6 commented on July 19, 2024

released concepts 0.9

from concepts.

EdwinWenink avatar EdwinWenink commented on July 19, 2024

Hi there, I ran a test on the biggest concept lattice I currently have available, with the following code:

import timeit

setup = '''
from concepts import tools, contexts
import concepts
load = concepts.Context.fromjson
'''

print(min(timeit.Timer('load("lattice_sorted.json", raw=False)', setup=setup).repeat(100, 1)))

I tested on a lattice with 143906 concepts, once with raw=False and once with raw=True.
This gave the following (fastest) run times:

For raw=False 1.03071959 sec.

For raw=True 1.03416534 sec.

However, I must make a note here. Your approach is to serialize the whole context + lattice to json (which makes sense!), but in my previous work I used the code from here to serialize the lattice only.
I wanted to avoid recomputing my big lattice, only in order to serialize it together with its context.
What I did instead is sorting the json of the lattice on keys for the above testing.
I'm not sure if that is the only change you made, and whether the above results generalize to the code in master.

The specs of the pc on which I tested are:

  • 8 GB RAM
  • 3.10GHz i5-4440

from concepts.

xflr6 avatar xflr6 commented on July 19, 2024

Thanks. I should have noted that the released implementation differs from the prototype above in using 'lattice' as key for the concepts list instead of 'concepts', so to re-use the json-file you have created with the prototype you might need to rename 'concepts' to 'lattice' (judging from the numbers, you are probably only loading the context currently, hence there is no difference between the default raw=False and raw=True). To make this more explicit you can add require_lattice=True to your load().

Btw, the order of the keys does not matter with either setting (see here and here for details).

from concepts.

EdwinWenink avatar EdwinWenink commented on July 19, 2024

Ah that explains a lot. Here are the updated times, but now with the best time for 10 repetitions rather than 100:

For raw=False 28.72975 sec.

For raw=True 354.92165 sec.

Hope that info is helpful :-)

from concepts.

xflr6 avatar xflr6 commented on July 19, 2024

Thanks, definitely helpful. That's consistent with my experiments.

I just wanted to confirm that relying on the order in the serialization significantly speeds things up (otherwise I would have considered to make the raw=True behaviour the default and only code path). I think raw=True might be useful for using other tools to calculate the lattice and convert it into the format here (but whithout the need to replicate the order used/expected by this library).

from concepts.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.