Giter VIP home page Giter VIP logo

Comments (23)

kjetilk avatar kjetilk commented on May 10, 2024 2

Seems like both the libcds libraries are unmaintained now... :-( That seems like a problem for the long-term maintainability of the system. Are there any maintained libraries that could be used on their place?

from hdt-cpp.

jonassmedegaard avatar jonassmedegaard commented on May 10, 2024 2

Hi, I am Debian developer and work directly with @kjetilk on getting semantic web related projects into Debian.

In case there are any questions specifically about integration with Debian (and, by extension, with Ubuntu) I am now following this discussion and happy to help where I can here.

from hdt-cpp.

kjetilk avatar kjetilk commented on May 10, 2024 1

Since Debian is my main motivation for pestering you about this, I'll just note that the next relevant deadline is the Debian Import Freeze of Ubuntu on Feb 18th. If you'd like to have HDT in the next Ubuntu release, it has to go into Debian Sid ten days before that. To do that, it must be submitted by an established Debian developer into the NEW queue for approval. My own packages have gone through the NEW queue of Debian in a couple of days, but for it to be stuck there for a couple of months are known to happen too. Still, it seems there is a good chance of getting it into Ubuntu if it is in shape by early January. I have someone I can ask about packaging it.

from hdt-cpp.

kjetilk avatar kjetilk commented on May 10, 2024 1

Just to leave another heads-up, Debian has a release coming up with a freeze for new packages just in the beginning of the new year.

Would have been great for the future if the issues that are preventing packaging could be ironed out. Perhaps a label for that could be created?

from hdt-cpp.

donpellegrino avatar donpellegrino commented on May 10, 2024 1

Attempting to replace libcds with sdsl-lite, I see that one area that needs work is the compressed string dictionaries implementation. Focusing on "libhdt/src/libdcs/fmindex/SSA.h" I see the set of headers mentioned in #19 (comment). However it is not clear to me how everything might map to new elements of the SDSL Lite library. I could use some advise on how libcds compares with SDSL Lite.

Headers to change (assuming I get the class changes right):

  • <SequenceBuilder.h> -> <sdsl/vectors.hpp>
  • <Sequence.h> -> <sdsl/vectors.hpp>
  • <BitSequenceBuilder.h> -> <sdsl/vectors.hpp>
  • <BitSequence.h> -> <sdsl/vectors.hpp>
  • <Mapper.h> -> <sdsl/vectors.hpp>

The typedef uchar can just be replaced with explicit code:

  • uchar -> unsigned char

Class changes:

  • SequenceBuilder -> int_vector<?>
  • BitSequenceBuilder -> bit_vector
  • Sequence -> int_vector<?>
  • BitSequence -> bit_vector

It seems that if the class changes can be figured out, then I can focus on the methods and aligning the code to use methods of the new classes.

Is it reasonable to expect that these libcds classes could be swapped out one-for-one with SDSL Lite classes, or am I missing a more fundamental difference in how libcds and SDSL Lite approach sequences?

from hdt-cpp.

RubenVerborgh avatar RubenVerborgh commented on May 10, 2024

Hi @kjetilk,

I'd definitely be in favor of migrating to libcds2; however, @joachimvh and @laurensdv have looked at it briefly in the past, and it seemed to entail rather major changes in the code. Guys, any more specific comments on that?

Thanks,

Ruben

from hdt-cpp.

laurensdv avatar laurensdv commented on May 10, 2024

Hi yes, I remember we discussed it briefly there.

Switching this library is likely to have an impact on the entire hdt-cpp library.
I am not sure if there are any plans by the RDF-HDT team to still do so.

Since then nothing has changed as far as I know.

from hdt-cpp.

joachimvh avatar joachimvh commented on May 10, 2024

From quickly looking the the second version of the library the interfaces seem te same (except they now capitalize all function names for some reason) so using the new version might be viable. Assuming the interfaces still do the same of course and the entire library gets implemented.

from hdt-cpp.

kjetilk avatar kjetilk commented on May 10, 2024

@laurensdv Are you sure you're not confusing the Compact Data Structure library, with the Concurrent Data Structure Library? The former is the one used in HDT-CPP, the latter is something else entirely, but they both call themselves libcds. :-)

from hdt-cpp.

laurensdv avatar laurensdv commented on May 10, 2024

Yes, in the beginning I did mix them up (check this confusion and later fix here: https://github.com/laurensdv/hdt-cpp/tree/libcds2), but nonetheless:
https://github.com/rdfhdt/hdt-cpp/tree/master/libcds-v1.0.12
should be exactly the same as
https://github.com/fclaude/libcds
of which indeed
https://github.com/fclaude/libcds2
is the newer version
switching these libraries gave strange results during compilation and as the original developers announced a new version (back then) I stopped this track.

:)

and to make the confusion complete, there is also libCSD were there is a new version under development:
https://github.com/migumar2/libCSD
which was supposed to be integrated with hdt-cpp, and should have an important impact on the library as well.
Which was then unfortunately misspelled in the HDT-cpp source code as DCS:
https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (the source files inside are indeed CSD -> which is I think the StringDictionary in the new version)

from hdt-cpp.

kjetilk avatar kjetilk commented on May 10, 2024

Uh-oh, OK! This is fairly messy. ;-) So, every reason to clear it up, then, and to get into into a distro, it would be needed to do it one way or the other.

I did a diff with the upstream libcsd and the libcsd in this repo, and the diff, filtered for files that only in one or the other, is still quite substantial, see libcdsupstream.patch.txt.

I don't know what these differences are, they could be upstream changes. That's another option, I guess, bring it into sync with upstream libcds 1, so that it could be packaged separately. That's the bottom line, really, it must be possible to package different libraries from different authors separately.

from hdt-cpp.

laurensdv avatar laurensdv commented on May 10, 2024

Indeed,
but note that the libCSD which is also a library that should be separated also has a reference to libcds (version 1?) embedded with the source code rather than as a separated package.

https://github.com/migumar2/libCSD/tree/master/libcds

I did not find a repo that has the same cources as the embedded libCSD's

so most likely https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (libCSD) also depends on libcds, but as this dependency is embedded in the source code (or via the makefile) it probably wasn't necessary to include it within the subfolder of this library.

from hdt-cpp.

MarioAriasGa avatar MarioAriasGa commented on May 10, 2024

Hi guys,

First thanks for your contributions :-)

I think that the libcds and libcds2 are similar in essence but I'm not sure whether the libcds2 implements everything we need, in particular the FM-Index that is used from the linkeddatafragments to do text queries.

I'll ping Miguel Angel to see if he knows which one is better to clean.

Mario.

from hdt-cpp.

laurensdv avatar laurensdv commented on May 10, 2024

Hi Mario,

I think the FM-index implemented by the libCSD library (not libcds):

new: https://github.com/migumar2/libCSD/blob/master/StringDictionaryFMINDEX.h
current: https://github.com/rdfhdt/hdt-cpp/blob/master/hdt-lib/src/libdcs/CSD_FMIndex.h

and obviously both of these depend on libcds 1
new: https://github.com/migumar2/libCSD/tree/master/libcds
current: https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (libCSD) is in the same source folder as hdt-lib so probably linked against the same libcds (this one: https://github.com/rdfhdt/hdt-cpp/tree/master/libcds-v1.0.12)

which have in essence not much changed:
https://github.com/migumar2/libCSD/blob/master/FMIndex/SSA.h
vs
https://github.com/rdfhdt/hdt-cpp/blob/master/hdt-lib/src/libdcs/fmindex/SSA.h

both seem to be using following parts of the libcds (1) library:

#include <SequenceBuilder.h>
#include <Sequence.h>
#include <BitSequenceBuilder.h>
#include <BitSequence.h>

#include <Mapper.h>

in the new libcds 2 these are unchanged except for the capitalization:
https://github.com/fclaude/libcds2/tree/master/include/libcds2/immutable

but comparing e.g.
https://github.com/fclaude/libcds/blob/master/include/BitSequence.h
with
https://github.com/fclaude/libcds2/blob/master/include/libcds2/immutable/bitsequence.h

you can see that the entire underlying data structure and interfacing has changed
for example change from size_t to cds_word and many other things.

so if libcds 1 and 2 are similar in essence the FMIndex files in both the new and current version should be convertable to the new version no?

from hdt-cpp.

kjetilk avatar kjetilk commented on May 10, 2024

I'll just say thanks for putting this on the radar! When it gets to these details, I don't have anything more useful to contribute, so I'll leave it to you to see if you can do something sensible about it as time allows. :-)

from hdt-cpp.

RubenVerborgh avatar RubenVerborgh commented on May 10, 2024

Good suggestion; created the debian-package label for this purpose.

Perhaps we should make it a milestone as well.

from hdt-cpp.

akuckartz avatar akuckartz commented on May 10, 2024

@fclaude Do you have any suggestions regarding libcds / libcds2 ?

from hdt-cpp.

akuckartz avatar akuckartz commented on May 10, 2024

Maybe https://github.com/simongog/sdsl-lite/ is an alternative?

from hdt-cpp.

fclaude avatar fclaude commented on May 10, 2024

from hdt-cpp.

kjetilk avatar kjetilk commented on May 10, 2024

Thanks a lot for that update, @fclaude ! I see sdsl-lite is already in Debian, I think this could be a very good thing for HDT. Perhaps change the title of this to "Migrate to sdsl-lite"?

from hdt-cpp.

kjetilk avatar kjetilk commented on May 10, 2024

BTW, on the topic of Debian, they seem to have a freeze in February, so it needs to be in well before that.

from hdt-cpp.

kjetilk avatar kjetilk commented on May 10, 2024

I changed the title to reflect what seems to be the actual step forward. :-)

from hdt-cpp.

donpellegrino avatar donpellegrino commented on May 10, 2024

One thing to note here is that switching to SDSL Lite seems to require changing the license for HDT-CPP. Currently, HDT-CPP is using the LGPL license per https://github.com/rdfhdt/hdt-cpp#license. SDSL Lite uses GPLv3 per https://github.com/simongog/sdsl-lite#licensing. Per https://www.gnu.org/licenses/gpl-faq.en.html#IfLibraryIsGPL linking HDT-CPP to SDSL Lite would require that HDT-CPP be licensed under the more restrictive GPL instead of LGPL.

libcds is LGPL per https://github.com/fclaude/libcds/blob/master/COPYING

from hdt-cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.