itkach / slob Goto Github PK

View Code? Open in Web Editor NEW

237.0 237.0 32.0 92 KB

Data store for Aard 2

License: GNU General Public License v3.0

Python 100.00%

slob's People

Contributors

Stargazers

Watchers

slob's Issues

Magnet link missing for dewiki-20181104-vol-0?.slob

Magnet link is missing for dewiki-20181104-vol-0?.slob, at least I cannot see it.

Tamil Wiktionary download link is not working.

(Not Found
The requested URL was not found on this server.)
This error found to when I tried to download this file:

Tamil
file: tawiktionary-20200104.slob
size: 0.54 GiB
sha1: in tawiktionary-20200104.slob.sha
url: http://ftp.halifax.rwth-aachen.de/aarddict/tawiki/
note: Dictionary does support LZMA2 compression, external picture download, language links, geo links.
note: Maintainer: AardFeed at web.de
note: Questions: https://groups.google.com/forum/#!forum/aarddict
note: Thanks to RWTH Aachen University https://www.rwth-aachen.de/ for high speed mirroring

Please fix the problem. Kindly upload the file to Google Drive. Thank you.

Is it possible to convert slob files to stardict?

Hello folks,
Is it possible to convert slob files to stardict using python? or another tool?

[Bugs / Feature suggestion] Outdated and additional links to existing slob files

Hello @itkach, @MHBraun and @francwalter,

many of the slob-files linked to on the GitHub-webpage https://github.com/itkach/slob/wiki/Dictionaries
have newer versions available at http://ftp.halifax.rwth-aachen.de/aarddict/.
See list below for some examples; note, that this list is not exhaustive (i.e. incomplete).

Furthermore there are slob-files available at ftp.halifax/aarddict, which are not listed at all on your dictionaries list for Aard2, e.g. the alswiki.

Then there are a few outdated slob-files at ftp.halifax/aarddict, which have newer versions available on the dictionaries list. Unfortunately I am not able to contact Markus Braun (@MarkusHBraun) via his preferred communication channel (aarddict Googlegroup) in order to notify him, as I do not use Googlegroups. Maybe it makes sense for you all to communicate in order to keep both locations in sync and up-to-date.

Also note that some of the links to slob-files hosted on https://mega.co.nz/ (or https://mega.nz/) lead to unavailable files on Mega, e.g.
en-m-wikivoyage-org-20141125.lzma2.slob https://mega.co.nz/#!rdAUXJxZ!-HqEzprdigPpSplR-9AWjxrdkVKe6_OoRgRJ7PdZ0_0,
de-m-wikivoyage-org-20141125.lzma2.slob https://mega.nz/#!KN5g0AxL!U8UitqlxFGV9h09W_lSTgNHU4rSeIxK21QZYIJbK9pY and
pt-m-wikivoyage-org-20141124.lzma2.slob https://mega.nz/#!bAQ3DLTC!291ojqdjuADmbnbZvjQMeUjM2W2HSmeNc0jynxO60EM.

And a few links use an URL-shortener (bit.ly or goo.gl), which is an unnecessary indirection obscuring the original URL: Please expand them to their full URLs.
Examples are all <??wiktionary-20160526.lzma2.slob> and all <??-m-wiktionary-org-2015012?.lzma2.slob> files.
Special cases with extra caveats are:

dewikivoyage-20160212.slob on https://github.com/itkach/slob/wiki/Dictionaries and "German Wikipedia by MHBraun" on http://aarddict.org/1/ (http://bit.ly/MegaDewiki --> https://mega.co.nz/#F!8ltTTK7I!7zkGX83fWZbAm_BYSmxsHw) makes Bitly warn, but presents an empty folder on Mega, anyway!
"English Wikipedia by MHBraun" on http://aarddict.org/1/ (http://bit.ly/MegaEnwiki --> https://mega.co.nz/#F!Vx9mXbIS!OF_GG7WS5b7RAAFblnJdVQ) makes Bitly also warn, but is unavailable on Mega, anyway!

As it is crucial for the Aard2 "ecosystem" to make as many as possible and recent slob-files easily accessible for Aard2 users, keeping https://github.com/itkach/slob/wiki/Dictionaries (which fulfills this task well) up-to-date is IMO important.

Kudos for your excellent work on Aard2 (and its slobs), the best dictionary software I am aware of.

P.S.: Some examples of slob-files, which have an older version listed on https://github.com/itkach/slob/wiki/Dictionaries and a newer one available at http://ftp.halifax.rwth-aachen.de/aarddict/.
dewikibooks-20160704.slob
dewikibooks-20161118.slob
dewikivoyage-20160212.slob
dewikivoyage-20160705.slob
dewiktionary-20160708.slob
dewiktionary-20161224.slob
enwikiquote-20150214.slob
enwikiquote-20160504.slob
enwikivoyage-20150822.slob
enwikivoyage-20160215.slob
simplewiki-20150303.slob
simplewiki-20170118.slob

Unable to use eswiktionary-20170130.slob with aard2 v 0.35

Hi, I have a Huawei P85 with android 5.01 and Aard2 and cannot use the eswiktionary-20170130.slob, have tried to download the file several times but Aard doesn't detect the file. I have no problem with other .slob files just this so far.
Sorry if this isn't the right place to describe this issue.

BUG: Alias cannot point to several blobs

Description
According to your answer in another issue:

@LRN: Does SLOB allow aliases to point to multiple entries?

@itkach: Sort of. Each key points to one blob, but you can add the same key multiple times (each time pointing to different content). Some data sources may have multiple different articles for the same word, e.g. different meanings of "A". When converting to slob, you can add each one separately with the same key "A" (of course you can also chose to combine the articles into one). Others, like Wikipidea, usually have "disambiguation" pages, so a generic term that can have a lot of different meanings is associated with a single page that has a collection of links to more specific terms.

I tried to create a Slob dictionary with a lot of aliases (that can collide for different words). For example: "клея" should point to "клеить" (v.) and to "клей" (n.). But whenever I add alias via add_alias() for the same target key, only the first one works.

To Reproduce

	with slob.create(filepath) as writer:
		writer.add(
			"This is a definition for Word 1".encode("ascii"),
			"word1", "alias_that_points_to_several_blobs",
			content_type=slob.MIME_TEXT,
		)
		writer.add(
			"This is a definition for Word 2".encode("ascii"),
			*("word2", "alias_that_points_to_several_blobs"),
			content_type=slob.MIME_TEXT)

		writer.add_alias("this_alias_cannot_point_to_several_blobs", "word1")
		writer.add_alias("this_alias_cannot_point_to_several_blobs", "word2")

(CORRECT) When user enters alias_that_points_to_several_blobs both entries are shown:

(INCORRECT) When user enters this_alias_cannot_point_to_several_blobs, both aliases should be shown in the list:

Expected behavior
When user enters this_alias_cannot_point_to_several_blobs, both aliases should be shown in the list.

Environment:

OS: macOS
Python version: 3.9

cyclic reference for dict. download

go to frontpage (http://aarddict.org/) -> click 'Dictionaries' -> click 'These are dictionaries for Aard 2 .' and it sends you back to the frontpage.

Adding a title to my new dictionary

Hi,

sorry, I don't really understand from the description how am I supposed to add a title to my dictionary.

When I create it in the way described, it appears with the title "???" in Aard for Android.

When I try to add it with the suggested command line as follows:
slob tag -n label -v "My new dictionary" dict.slob

I get the following in return:

No such tag

Please explain how to do it (and possibly extend the documentation with it).

Thank you,
Binyomin

Feature request: in-app dictionary downloads

Greetings, I'm a new user of Aard 2.

For reasons I don't care to explain here, I have disabled access to all web browsers on my phone. As such, the only way for me to download dictionaries for Aard is connect my phone to a desktop and transfer the files from there. This is not a huge inconvenience, though it is still an inconvenience. And so I ask: has it been considered to allow the Aard 2 app to download dictionary files from inside the app? Perhaps it could draw upon the list from the wiki page.

I understand that my use case is unusual, so I apologize if this request comes across as self-serving. However, I could see it being a (small) benefit to other users as well, since it would simplify the setup process by not requiring any external applications as it does now.

Anyway, I got the idea from another app I use (And Bible), which operates similar to the method I'm describing. That got me thinking that perhaps it's not so far-fetched after all.

Links are dead in Arabic wiki

Hey, so i tried to download wiki Arabic with images (using torrent) and there is no peers so i think it's dead also i dont know why but i think all FTP server is empty ?

French Wiktionary link points to wrong/damaged file

In French Wiktionary section, the link points to a file with wrong SHA1 and size (285,147,136 instead of 405,635,161 bytes). The file looks damaged (GoldenDict does not detect it)

Update slob download links

Please, update these following dictionaries.

WordNet version is currently 3.1 which is last updated 2011 and now unmaintained. Open English WordNet is an actively maintained fork of Princeton WordNet, which just released its 2022 Edition in 31st December 2022.
Simple English Wiktionary link is invalid and also last updated in 2020.
Arabic Wiktionary last updated in 2020.
Bengali Wiktionary link is invalid and also last updated in 2016.
Korean and Japanese Wiktionary last updated in 2015.
Wikispecies last updated in 2020 and link also invalid.
Collaborative International Dictionary of English (GCIDE) version is 0.52, where latest version is 0.53.
CC-CEDICT last updated 2021, newer version has 2,425 more entries.

Links to dictionaries

Hi!
Could you, please, upload Russian Wiki to some cloud? Because it is impossible to download it via BT.
No peers, no hope they will appear...

Docker Image?

Is a docker image available?

Converting dictCC csv fails

I'm trying to create a slob file containing dictCC dictionary data, my "converter" looks like this:

# -*- coding: utf-8 -*-
import csv
import os
import slob
import string
import sys
with slob.create(OUTPUT_FILE) as w:
  with open(sys.argv[1], newline='') as csvfile:
    fieldnames = ['key', 'value']
    dictreader = csv.DictReader(filter(lambda row: row[0]!='#', csvfile), delimiter='\t', quotechar='"', fieldnames=fieldnames, restkey='restkey')

    for row in dictreader:
      if (row['key']):
        type = ', '.join(row['restkey']
          w.add((str(row['value']) + type).encode('utf-8'),
            row['key'], content_type=PLAIN_TEXT)

Works just fine until I try to add a line with a German Umlaut in the 'value' column, slob.py raises a ValueError exception then.

Maybe you could more complex examples to the documentation, to enable python beginners to use slob.add? :-)

Any ideas on how to solve the problem?

Add new dictionary

Please add FOLDOC computing dictionary.
Dictionary source:
https://foldoc.org/source.html

A suggestion about enwikionary20200615

Thank you for amazing work you are doing.

I just want to see the definitions or translation in english language only (the size of slob file will be small, and the design will be small and easy to use).
But with this enwikionary I get the definition of a word in many languages beside english! (why? it's enwikionary!!)

I saw an english only enwikionary in another forum but it's older than yours (2019) and also fewer number of entries than yours.

Can you just redesign it to be english only translations?

Kochwiki with images

Is it possible to create a dictionary with images included?

Feature Request: Babylon .bgl to slob

On desktop, I'm happily using GoldenDict with the free babylon dictionaries for translation, and I'd like to use these with Aaard2 on mobile too.
AIUI there is no converter yet from bgl to slob, correct?
Maybe the easiest way to write one would be based on the BGL/Babylon_BGL parsing code of goldendict.

File Not Found

Hello ;
I need this File Dictionary:
pt-m-wiktionary-org-20141127.lzma2.slob
the download link dosent work.
Thank you

Converting DSL to SLOB

I'm playing around with some code that would convert DSL dictionaries (such as the ones that Lingvo and GoldenDict can use) to SLOB for Aard2 to be used on my phone.

This presents some... interesting problems, and i've decided to look for answers here.

SLOB documentation claims that blobs are sorted using Unicode Collation Algorithm. That doesn't tell me much about the way Aard2 does lookups (normally i think of folding when i need to compare unicode strings; collation is just for ordering strings correctly; are SLOB keys collated and folded? Or am i mixing up completely unrelated things?).
- DSL can have multiple headwords (dictionary keys) associated with the same card (dictionary blob), and this feature is often used to fill the dictionary with different spellings (with/without accents, with/without to or the prefix, with/without part of the headword being a subscript/superscript, etc) of the same word or phrase. Would UCA eliminate the need to have some of these variants? Or do i need to map each one of them to an alias?
DSL has metadata about the language the headwords are in and about the language the cards are in. Should i put it into SLOB tags, maybe? If so, which tag names should i use (language.from, language.to?)? I've noticed that Aard2 doesn't have a concept of translation dictionaries (mapping things from one language to another). Would anyone be interested in introducing that?
DSL dictionaries come with a small image that a dictionary viewer can show in the UI to identify the dictionary (instead of spelling its full name), this helps when there are lots of dictionaries, as they all can be shown as small buttons instead of long labels. What should i do with these images? It should be possible to stuff such an image into a blob (named how?) and put the key to that blob into a tag (which one? dictionary.image?). Would anyone be interested in adding support for such scheme to Aard2?
DSL has a notion of "abbreviations" - a separate small dictionary that accompanies the main dictionary. In the main dictionary some parts of the card are marked as abbreviations, and their text is very short (can be as short as v). When user hovers a mouse over these abbreviations, a tooltip with the abbreviation card appears (for v that would be verb).
- AFAIU, Aard2 uses HTML in blobs, so i thought of using <div title=...> for this, but eventually decided not to, given that there are (usually) no "hovering" on touch devices. Instead i grabbed a JS for collapsing/expanding a piece of HTML on click, and i'm planning to use that to make these abbreviations expand/collapse when touched.
  - The question is: is it possible to avoid including the expanded abbreviation text verbatim into the blob, referencing it instead? Aard2 can, apparently, understand references to other blobs for CSS and JS, but these are normally used in HTML as well. What about inserting arbitrary piece of HTML from a blob into another HTML? Without iframes.
  - And, since we've touched this topic, what about images, sounds and video? DSL can include media files, and i think it should be easy to show videos and images in HTML, as long as Aard2 can slurp them from other blobs via references (<img src="~/image/foobar.jpg">). What about sounds though? GoldenDict shows an icon that pays a sound when clicked. Do i need to do that in HTML for Aard2? Would it work? If yes, then maybe i should do that for all media (images and videos), instead of showing them inline.
DSL has references to other cards of the same dictionary. I assume that Aard2 can do that as well, with HTML blobs containing links to other HTML blobs ("I assume", because i can't quite remember seeing examples of such links). But DSL also has references to other cards in other dictionaries (identified by name; each DSL has a machine-readable name string for this purpose). Does Aard2 [plan to] support that? Is it worth keeping this metadata in SLOB blobs?
Speaking of names, each DSL has description (a user-readable name, often multi-line, plus a body of text, also multiline), sometimes in more than one language. Should it be put into tags and/or blobs? If so, how should they be named?
DSL has the concept of "full translation mode". When such mode is disabled, some specially-marked parts of a card (usually examples and other things that take up a lot of space) are hidden. Is there an Aaard2 equivalent, or do i need to somehow code that in HTML/JS as well?
DSL has support for full-text search, including searching through card contents. For that purpose some parts of a card can be marked as trn (translates the headword), com (comments; context for the translated words; various metadata) and !trs (excludes the text from indexing). I take it that Aard2 does not support full-text search. Is there any point in preserving this information (how? <span class="trn">?) in case Aard2 suddenly develops a full-text search feature in the future?
AFAIU, Aard2 always searches for the user-specified text exactly (UCA aside). Would it be useful to generate extra aliases to emulate mid-word searching (i.e. generate good and afternoon aliases for the good afternoon entry, so that a search for afternoon would produce both the afternoon entry and the good afternoon entry)? Does SLOB allow aliases to point to multiple entries?

[request] script to remove images.

Something like

slob trim_img in.slob out.slob

would be useful for making smaller files for Aard 2.
Or a more generic regex filter to accomplish the same thing

slob trim --rewite_txt 's/<img[^<>]+src="[^"]+\.(jpg|png|gif)"\/?>/[image placeholder]/g' --exclude_files '.*\.(jpg|png|gif)' in.slob out.slob

or whatever expressions best sute the format it's stored in.

Sorting in storage instead of RAM

Hello @itkach

Could you allow sortinging to be somewhere in storage, but NOT in RAM (slob.py), if it's possible?

I use Pyglossary tool (which uses slob.py originally) to convert many file types to slob, so I could use Aard2 app as my default multidictionary viewer.

But during converting of large files as wikipedias or wiktionaries "which have huge number of words" to slob, but sorting fails due to low memory (inspect of I have 6 gb RAM in my device).

Iam in love with Aard2 app and slob files which I can deal with freely, so if sorting into storage is possible it will be an amazing breakthrough.

Thanks in advance.

Suggestion: Tagging older versions

Did you consider tagging older versions?

There was an increase in political editing and censorship of Wikipedia in recent years. It would be useful to have older versions easily available for download as well.

How to generate Slobe File

Hello
I want to generate a slob file from wikionary.org.

thank you

Adding dictionaries to slob wiki

@itkach
Thanks for making this awesome tool and I have been using Aard for quite sometime and telling others about this and sharing the dictionaries.:)

I have compiled a dictionary slob file for an Indian language, Kannada from wiktionary xml dump which has more than 250 thousand articles/pages. I would like to add this to the slob wiki here so that its easily accessible to all users of new Aard.

To do this, should I fork the wiki and send the pull request or it has to be done by any of the maintainers.? Please help. I would soon love to add other Indian languages as well. :)

Thanks. :)

Publish code onto PyPi

I suggest 'slob' code to be published ontro PyPi. It eases up and opens up fo its reuse and hence improvements. 'slob' code is, for example, used in 'mwscrape2slob', however direct installation can be found frustrating with Slob not being on Pypi, and hence not installed with the usual setup.py

Spanish Wiktionary

Thanks a lot for your dicts. I´ve downloaded the Spanish Wiktionary, but I see it is outdated, so How can I make a new updated one?

Temporary directory

I suggest either using pwd or target directory also for temporary files or adding a parameter for changing the temporary directory.

Let's assume that people know best where there is enough space available on their device
and they probably point there with the output file(s) parameter.

As we are often working with huge files here (e.g. Wikipedia), there might not be enough space in the system /tmp directory.

Now, when you issue a command using slob it automatically creates a temporary directory, which is not located in the current work directory or next to the slob file you are working with but in the system /tmp.
That could lead to wasted hours when /tmp runs out of memory and slob aborts the current task with an OSError: [Errno 28] No space left on device.

wikipedia20150211

forum bug

the forum ( http://aarddict.org/forum ) does not work correctly under some browsers. I only have a title without themes. please check the forum

offline using "de.wiktionary.org/wiki/gehen" and ".../wiki/Flexion:gehen"

Hello,

I wanted to use this wiki:
https://de.wiktionary.org/wiki/gehen
togehter with GoldenDict.

So I downloaded this file:
http://ftp.halifax.rwth-aachen.de/aarddict/dewiki/dewiktionary-20160708.slob
I have installed it in GoldenDict. It works well. But if I click on the page https://de.wiktionary.org/wiki/gehen (in the offline version) on the link: "Alle weiteren Formen: Flexion:gehen" it calls for the browser. I would expect, that the content of this link is also part of the *.slob - File. But it seems not to be the case. That's a pity.

So I tryed to use the file
http://ftp.halifax.rwth-aachen.de/aarddict/dewiki/dewiktionary-20160708.aar
instead of
http://ftp.halifax.rwth-aachen.de/aarddict/dewiki/dewiktionary-20160708.slob

Result: Now the link to "Alle weiteren Formen: Flexion:gehen" seems to work (offline), what means the content behind this link is part of the *.aar- File.
But unfortunately the visualizing of the *.aar- File seems not to be appropriate to GoldenDict. so I can not use the *.aar- File.

Question: Can you also include the content of sites like https://de.wiktionary.org/wiki/Flexion:gehen into the *.slob- File? Would be grat.
Maybe you can also include the pictures of the wiktionary into the *.slob- File?

Would appreciate your reply. Thank you.

Slob File format error in WIKI

Hi,
I checked the WIKI for Slob File format and it says

Element	Type	Description
content types	char-sized sequence of content types	MIME content types. Content items refer to content types by id.
		Content type id is 0-based position of content type in this sequence.

However, when I checked a sample file I saw that size of content types is not char sized but short sized.

Example freedict-eng-tur-0.3.slob

00000d8c  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000d9c  00 00 00 00 00 00 00 00  00 00 03 00 08 74 65 78  |.............tex|
00000dac  74 2f 63 73 73 00 16 61  70 70 6c 69 63 61 74 69  |t/css..applicati|
00000dbc  6f 6e 2f 6a 61 76 61 73  63 72 69 70 74 00 17 74  |on/javascript..t|
00000dcc  65 78 74 2f 68 74 6d 6c  3b 63 68 61 72 73 65 74  |ext/html;charset|
00000ddc  3d 75 74 66 2d 38 00 00  8e f0 00 00 00 00 00 0d  |=utf-8..........|

There are 3 content types. However, size of content is not 08 but instead 00 08 It is also same with others. Is there a typo in the WIKI? I checked the slob.py and it also says

 def read_text(self):
        return self._read_text(U_SHORT)

 def read_content_types():
        content_types = []
        count = f.read_byte()
        for _ in range(count):
            content_type = f.read_text()
            content_types.append(content_type)
        return tuple(content_types)

Cut off of left side in country entrys

Hello itkach!

I use the "dewiki-20220601-NS0-ENTERPRISE-HTML.slob" (https://ftp.halifax.rwth-aachen.de/aarddict/dewiki/dewiki20220601-slob/dewiki-20220601-NS0-ENTERPRISE-HTML.slob) from mhbraun with Aard 2 v0.53.

On country entries in that dictionaries, there is a cut off of left side in portrait (see attached screenshot). In landscape everything is fine.

I'm not clear if this is a thing of dictionary or Aard.

Regards

Shudushi

After icu update to 75.1 slob load failure

Termux Variables:
TERMUX_APK_RELEASE=F_DROID
TERMUX_APP_PACKAGE_MANAGER=apt
TERMUX_APP_PID=29562
TERMUX_IS_DEBUGGABLE_BUILD=0
TERMUX_MAIN_PACKAGE_FORMAT=debian
TERMUX_VERSION=0.118.0
TERMUX__USER_ID=0
Packages CPU architecture:
aarch64
Subscribed repositories:

sources.list

deb https://packages.termux.dev/apt/termux-main stable main

x11-repo (sources.list.d/x11.list)

deb https://packages.termux.dev/apt/termux-x11 x11 main

tur-repo (sources.list.d/tur.list)

deb https://tur.kcubeterm.com tur-packages tur tur-on-device tur-continuous
Updatable packages:
All packages up to date
termux-tools version:
1.42.0
Android version:
14
Kernel build information:
Linux localhost 4.19.191-28086179-abT225XXU6DWL9 #1 SMP PREEMPT Wed Dec 20 16:55:14 +07 2023 aarch64 Android
Device manufacturer:
samsung
Device model:
SM-T225
LD Variables:
LD_LIBRARY_PATH=
LD_PRELOAD=/data/data/com.termux/files/usr/lib/libtermux-exec.so
Installed termux plugins:
com.termux.styling versionCode:31

Is it possible to export aard/slob database

Dear all,
Is there a simple way to export the aard/slob database into an HTML formatted text file?

Convert dz to slob

I have an old dictionary in dz format, can I convert that to slob?

Please Update Wikitionary For Hindi and Malayalam

The slab file is almost 5 years old. It would have been great if you pull an update. Thanks

Migrate to GitHub discussion

Please, migrate aarddict forum from Google Groups to GitHub discussion.
Not all people are familiar with mailing list style. Finding info is also time consuming there.
So please consider this proposal.

Add updated dictionary

I updated two dictionaries, which can be found here

CC-CEDICT 2023.03.11
JMdict (Japanese-Multilingual Dictionary) with examples 2023.03.11

I also wanted to update GCIDE (GNU Collaborative International Dictionary of English) to latest version. Here I found an XML format of the dictionary, but don't know how to work with split XML. I would appreciate some help.

Use OmegaWiki dictionary data

OmegaWiki is a collaborative project to produce a free, multilingual dictionary in every language, with lexicological, terminological and thesaurus information.

The software is opensource and the data is free.

The key idea of OmegaWiki is to be based around concepts. This is what makes it truly multilingual.
So, by building a French-English and German-English dictionary, we are also building a German-French dictionary. If we add an Italian contributor, we build 3 more bilingual dictionaries... this is exponential.

http://www.omegawiki.org/

Not sure where this wish belongs, so I also posted it to itkach/aard2-android#24

Is it possible to customize slob shown in search?

For example, I have a verb "gotować" (готовить, to cook) and it has a form "gotowałeś" (you prepared).

I need this form to be searchable by "gotowales" (without diacritics), but at the same time I need it to be shown in the search list as "gotowałeś".

Is it possible in Slob format?

Library not loaded: libicui18n.54.dylib

Thanks for the wonderful work with aard2. I was interested in using slob.py directly to look through a downloaded enwiki.slob file, but have not been able to get it running despite setting up numerous python virtualenv with the latest versions of icu and pyicu. The error I get is:

Python 3.5.2 |Anaconda 4.1.1 (x86_64)| (default, Jul 2 2016, 17:52:12)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import slob
Traceback (most recent call last):
File "", line 1, in
File "/Users/.../anaconda/lib/python3.5/site-packages/slob.py", line 29, in
import icu
File "/Users/.../anaconda/lib/python3.5/site-packages/icu.py", line 37, in
from docs import *
File "/Users/.../anaconda/lib/python3.5/site-packages/docs.py", line 23, in
from _icu import *
ImportError: dlopen(/Users/.../anaconda/lib/python3.5/site-packages/_icu.cpython-35m-darwin.so, 2): Library not loaded: libicui18n.54.dylib
Referenced from: /Users/.../anaconda/lib/python3.5/site-packages/_icu.cpython-35m-darwin.so
Reason: image not found

Not sure if I set up the environment correctly. Here is what I have installed:

$ conda list

packages in environment at /Users/brstream/anaconda/envs/env-slob:

Using Anaconda Cloud api site https://api.anaconda.org
icu 56.1 4 conda-forge
openssl 1.0.2h 2
pip 8.1.2 py35_0
PyICU 1.9.3
python 3.5.2 0
readline 6.2 2
setuptools 25.1.6 py35_0
Slob 1.0
sqlite 3.13.0 0
tk 8.5.18 0
wheel 0.29.0 py35_0
xz 5.2.2 0
zlib 1.2.8 3
(env-slob)

Thanks for your help.

Phap Luu

Direct download links to GDrive.

Google drive links posted here can be converted into direct download links by identifiying file id part and then,

<base_url>/uc?id=<id>&export=download

Wikimedia data dumps

README.md's section Create from MediaWiki sites does not mention https://meta.wikimedia.org/wiki/Data_dumps as Wikimedia publishes database dumps for all wikis, including Wikipedia and Wiktionary, updated monthly or twice a month. Importing the dumps is faster and lighter on resources than crawling, and crawlers seem to be rate-limited.

Adding a label does not work

I'm trying to add a label to a already created slob file:

slob tag -n label -v "A Fine Dictionary" my.slob

All I get is a "No such tag" error.

slob info my.slob
TAGS
----
    created.at: 2015-03-31T10:47:43.771907+00:00
   version.icu: 54.1
version.python: 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)]
 version.pyicu: 1.8

How do I ADD new tags?

Without the "label" tags my slobs are displayed as "???" in the android client :-(

itkach / slob Goto Github PK

slob's People

Contributors

Stargazers

Watchers

Forkers

slob's Issues

sources.list

x11-repo (sources.list.d/x11.list)

tur-repo (sources.list.d/tur.list)

packages in environment at /Users/brstream/anaconda/envs/env-slob:

Recommend Projects

Recommend Topics

Recommend Org