Comments (23)
Seems like both the libcds libraries are unmaintained now... :-( That seems like a problem for the long-term maintainability of the system. Are there any maintained libraries that could be used on their place?
from hdt-cpp.
Hi, I am Debian developer and work directly with @kjetilk on getting semantic web related projects into Debian.
In case there are any questions specifically about integration with Debian (and, by extension, with Ubuntu) I am now following this discussion and happy to help where I can here.
from hdt-cpp.
Since Debian is my main motivation for pestering you about this, I'll just note that the next relevant deadline is the Debian Import Freeze of Ubuntu on Feb 18th. If you'd like to have HDT in the next Ubuntu release, it has to go into Debian Sid ten days before that. To do that, it must be submitted by an established Debian developer into the NEW queue for approval. My own packages have gone through the NEW queue of Debian in a couple of days, but for it to be stuck there for a couple of months are known to happen too. Still, it seems there is a good chance of getting it into Ubuntu if it is in shape by early January. I have someone I can ask about packaging it.
from hdt-cpp.
Just to leave another heads-up, Debian has a release coming up with a freeze for new packages just in the beginning of the new year.
Would have been great for the future if the issues that are preventing packaging could be ironed out. Perhaps a label for that could be created?
from hdt-cpp.
Attempting to replace libcds with sdsl-lite, I see that one area that needs work is the compressed string dictionaries implementation. Focusing on "libhdt/src/libdcs/fmindex/SSA.h" I see the set of headers mentioned in #19 (comment). However it is not clear to me how everything might map to new elements of the SDSL Lite library. I could use some advise on how libcds compares with SDSL Lite.
Headers to change (assuming I get the class changes right):
- <SequenceBuilder.h> -> <sdsl/vectors.hpp>
- <Sequence.h> -> <sdsl/vectors.hpp>
- <BitSequenceBuilder.h> -> <sdsl/vectors.hpp>
- <BitSequence.h> -> <sdsl/vectors.hpp>
- <Mapper.h> -> <sdsl/vectors.hpp>
The typedef uchar can just be replaced with explicit code:
- uchar -> unsigned char
Class changes:
- SequenceBuilder -> int_vector<?>
- BitSequenceBuilder -> bit_vector
- Sequence -> int_vector<?>
- BitSequence -> bit_vector
It seems that if the class changes can be figured out, then I can focus on the methods and aligning the code to use methods of the new classes.
Is it reasonable to expect that these libcds classes could be swapped out one-for-one with SDSL Lite classes, or am I missing a more fundamental difference in how libcds and SDSL Lite approach sequences?
from hdt-cpp.
Hi @kjetilk,
I'd definitely be in favor of migrating to libcds2; however, @joachimvh and @laurensdv have looked at it briefly in the past, and it seemed to entail rather major changes in the code. Guys, any more specific comments on that?
Thanks,
Ruben
from hdt-cpp.
Hi yes, I remember we discussed it briefly there.
Switching this library is likely to have an impact on the entire hdt-cpp library.
I am not sure if there are any plans by the RDF-HDT team to still do so.
Since then nothing has changed as far as I know.
from hdt-cpp.
From quickly looking the the second version of the library the interfaces seem te same (except they now capitalize all function names for some reason) so using the new version might be viable. Assuming the interfaces still do the same of course and the entire library gets implemented.
from hdt-cpp.
@laurensdv Are you sure you're not confusing the Compact Data Structure library, with the Concurrent Data Structure Library? The former is the one used in HDT-CPP, the latter is something else entirely, but they both call themselves libcds. :-)
from hdt-cpp.
Yes, in the beginning I did mix them up (check this confusion and later fix here: https://github.com/laurensdv/hdt-cpp/tree/libcds2), but nonetheless:
https://github.com/rdfhdt/hdt-cpp/tree/master/libcds-v1.0.12
should be exactly the same as
https://github.com/fclaude/libcds
of which indeed
https://github.com/fclaude/libcds2
is the newer version
switching these libraries gave strange results during compilation and as the original developers announced a new version (back then) I stopped this track.
:)
and to make the confusion complete, there is also libCSD were there is a new version under development:
https://github.com/migumar2/libCSD
which was supposed to be integrated with hdt-cpp, and should have an important impact on the library as well.
Which was then unfortunately misspelled in the HDT-cpp source code as DCS:
https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (the source files inside are indeed CSD -> which is I think the StringDictionary in the new version)
from hdt-cpp.
Uh-oh, OK! This is fairly messy. ;-) So, every reason to clear it up, then, and to get into into a distro, it would be needed to do it one way or the other.
I did a diff with the upstream libcsd and the libcsd in this repo, and the diff, filtered for files that only in one or the other, is still quite substantial, see libcdsupstream.patch.txt.
I don't know what these differences are, they could be upstream changes. That's another option, I guess, bring it into sync with upstream libcds 1, so that it could be packaged separately. That's the bottom line, really, it must be possible to package different libraries from different authors separately.
from hdt-cpp.
Indeed,
but note that the libCSD which is also a library that should be separated also has a reference to libcds (version 1?) embedded with the source code rather than as a separated package.
https://github.com/migumar2/libCSD/tree/master/libcds
I did not find a repo that has the same cources as the embedded libCSD's
so most likely https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (libCSD) also depends on libcds, but as this dependency is embedded in the source code (or via the makefile) it probably wasn't necessary to include it within the subfolder of this library.
from hdt-cpp.
Hi guys,
First thanks for your contributions :-)
I think that the libcds and libcds2 are similar in essence but I'm not sure whether the libcds2 implements everything we need, in particular the FM-Index that is used from the linkeddatafragments to do text queries.
I'll ping Miguel Angel to see if he knows which one is better to clean.
Mario.
from hdt-cpp.
Hi Mario,
I think the FM-index implemented by the libCSD library (not libcds):
new: https://github.com/migumar2/libCSD/blob/master/StringDictionaryFMINDEX.h
current: https://github.com/rdfhdt/hdt-cpp/blob/master/hdt-lib/src/libdcs/CSD_FMIndex.h
and obviously both of these depend on libcds 1
new: https://github.com/migumar2/libCSD/tree/master/libcds
current: https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (libCSD) is in the same source folder as hdt-lib so probably linked against the same libcds (this one: https://github.com/rdfhdt/hdt-cpp/tree/master/libcds-v1.0.12)
which have in essence not much changed:
https://github.com/migumar2/libCSD/blob/master/FMIndex/SSA.h
vs
https://github.com/rdfhdt/hdt-cpp/blob/master/hdt-lib/src/libdcs/fmindex/SSA.h
both seem to be using following parts of the libcds (1) library:
#include <SequenceBuilder.h>
#include <Sequence.h>
#include <BitSequenceBuilder.h>
#include <BitSequence.h>
#include <Mapper.h>
in the new libcds 2 these are unchanged except for the capitalization:
https://github.com/fclaude/libcds2/tree/master/include/libcds2/immutable
but comparing e.g.
https://github.com/fclaude/libcds/blob/master/include/BitSequence.h
with
https://github.com/fclaude/libcds2/blob/master/include/libcds2/immutable/bitsequence.h
you can see that the entire underlying data structure and interfacing has changed
for example change from size_t to cds_word and many other things.
so if libcds 1 and 2 are similar in essence the FMIndex files in both the new and current version should be convertable to the new version no?
from hdt-cpp.
I'll just say thanks for putting this on the radar! When it gets to these details, I don't have anything more useful to contribute, so I'll leave it to you to see if you can do something sensible about it as time allows. :-)
from hdt-cpp.
Good suggestion; created the debian-package
label for this purpose.
Perhaps we should make it a milestone as well.
from hdt-cpp.
@fclaude Do you have any suggestions regarding libcds / libcds2 ?
from hdt-cpp.
Maybe https://github.com/simongog/sdsl-lite/ is an alternative?
from hdt-cpp.
from hdt-cpp.
Thanks a lot for that update, @fclaude ! I see sdsl-lite is already in Debian, I think this could be a very good thing for HDT. Perhaps change the title of this to "Migrate to sdsl-lite"?
from hdt-cpp.
BTW, on the topic of Debian, they seem to have a freeze in February, so it needs to be in well before that.
from hdt-cpp.
I changed the title to reflect what seems to be the actual step forward. :-)
from hdt-cpp.
One thing to note here is that switching to SDSL Lite seems to require changing the license for HDT-CPP. Currently, HDT-CPP is using the LGPL license per https://github.com/rdfhdt/hdt-cpp#license. SDSL Lite uses GPLv3 per https://github.com/simongog/sdsl-lite#licensing. Per https://www.gnu.org/licenses/gpl-faq.en.html#IfLibraryIsGPL linking HDT-CPP to SDSL Lite would require that HDT-CPP be licensed under the more restrictive GPL instead of LGPL.
libcds is LGPL per https://github.com/fclaude/libcds/blob/master/COPYING
from hdt-cpp.
Related Issues (20)
- Unused TABLESUM and coversizes in suffixtree
- Removed unneeded exception in BasicHDT
- Consolidate rdf2hdt Windows-specific implementation and base implementation
- Replace use of deprecated ftime() HOT 2
- Resolve "delete called on non-final" warnings.
- Test dumpDictionary not being called with an input HDT file
- Test case "properties" fails HOT 1
- Code formatting / beautifier needed. HOT 1
- Evaluate Parallel Hashmap for potential performance benefits HOT 2
- Add option to ignore error instead of throwing error HOT 5
- `make install` does not install triples/ directory -- hdt-it still active? HOT 1
- clang-format of libdcs [sic]
- hdt::QueryProcessor.searchJoin() gives incorrect results HOT 6
- Compile error on macOS with "make -j2" command HOT 1
- rdf2hdt stops without error message HOT 3
- Add encryption-at-rest to libraries HOT 1
- rdf2hdt produces invalid UTF8 values? HOT 1
- undefined reference to `hdt::HDTManager::mapHDT(char const*, hdt::ProgressListener*)'
- support for quads/named graphs HOT 3
- Memcpy to nullptr in CSD_HTFC::CSD_HTFC()
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hdt-cpp.