itkach / slob Goto Github PK
View Code? Open in Web Editor NEWData store for Aard 2
License: GNU General Public License v3.0
Data store for Aard 2
License: GNU General Public License v3.0
Magnet link is missing for dewiki-20181104-vol-0?.slob, at least I cannot see it.
(Not Found
The requested URL was not found on this server.)
This error found to when I tried to download this file:
Tamil
file: tawiktionary-20200104.slob
size: 0.54 GiB
sha1: in tawiktionary-20200104.slob.sha
url: http://ftp.halifax.rwth-aachen.de/aarddict/tawiki/
note: Dictionary does support LZMA2 compression, external picture download, language links, geo links.
note: Maintainer: AardFeed at web.de
note: Questions: https://groups.google.com/forum/#!forum/aarddict
note: Thanks to RWTH Aachen University https://www.rwth-aachen.de/ for high speed mirroring
Please fix the problem. Kindly upload the file to Google Drive. Thank you.
Hello folks,
Is it possible to convert slob files to stardict using python? or another tool?
Hello @itkach, @MHBraun and @francwalter,
many of the slob-files linked to on the GitHub-webpage https://github.com/itkach/slob/wiki/Dictionaries
have newer versions available at http://ftp.halifax.rwth-aachen.de/aarddict/.
See list below for some examples; note, that this list is not exhaustive (i.e. incomplete).
Furthermore there are slob-files available at ftp.halifax/aarddict, which are not listed at all on your dictionaries list for Aard2, e.g. the alswiki.
Then there are a few outdated slob-files at ftp.halifax/aarddict, which have newer versions available on the dictionaries list. Unfortunately I am not able to contact Markus Braun (@MarkusHBraun) via his preferred communication channel (aarddict Googlegroup) in order to notify him, as I do not use Googlegroups. Maybe it makes sense for you all to communicate in order to keep both locations in sync and up-to-date.
Also note that some of the links to slob-files hosted on https://mega.co.nz/ (or https://mega.nz/) lead to unavailable files on Mega, e.g.
en-m-wikivoyage-org-20141125.lzma2.slob https://mega.co.nz/#!rdAUXJxZ!-HqEzprdigPpSplR-9AWjxrdkVKe6_OoRgRJ7PdZ0_0,
de-m-wikivoyage-org-20141125.lzma2.slob https://mega.nz/#!KN5g0AxL!U8UitqlxFGV9h09W_lSTgNHU4rSeIxK21QZYIJbK9pY and
pt-m-wikivoyage-org-20141124.lzma2.slob https://mega.nz/#!bAQ3DLTC!291ojqdjuADmbnbZvjQMeUjM2W2HSmeNc0jynxO60EM.
And a few links use an URL-shortener (bit.ly or goo.gl), which is an unnecessary indirection obscuring the original URL: Please expand them to their full URLs.
Examples are all <??wiktionary-20160526.lzma2.slob> and all <??-m-wiktionary-org-2015012?.lzma2.slob> files.
Special cases with extra caveats are:
As it is crucial for the Aard2 "ecosystem" to make as many as possible and recent slob-files easily accessible for Aard2 users, keeping https://github.com/itkach/slob/wiki/Dictionaries (which fulfills this task well) up-to-date is IMO important.
Kudos for your excellent work on Aard2 (and its slobs), the best dictionary software I am aware of.
P.S.: Some examples of slob-files, which have an older version listed on https://github.com/itkach/slob/wiki/Dictionaries and a newer one available at http://ftp.halifax.rwth-aachen.de/aarddict/.
dewikibooks-20160704.slob
dewikibooks-20161118.slob
dewikivoyage-20160212.slob
dewikivoyage-20160705.slob
dewiktionary-20160708.slob
dewiktionary-20161224.slob
enwikiquote-20150214.slob
enwikiquote-20160504.slob
enwikivoyage-20150822.slob
enwikivoyage-20160215.slob
simplewiki-20150303.slob
simplewiki-20170118.slob
Hi, I have a Huawei P85 with android 5.01 and Aard2 and cannot use the eswiktionary-20170130.slob, have tried to download the file several times but Aard doesn't detect the file. I have no problem with other .slob files just this so far.
Sorry if this isn't the right place to describe this issue.
Description
According to your answer in another issue:
@LRN: Does SLOB allow aliases to point to multiple entries?
@itkach: Sort of. Each key points to one blob, but you can add the same key multiple times (each time pointing to different content). Some data sources may have multiple different articles for the same word, e.g. different meanings of "A". When converting to slob, you can add each one separately with the same key "A" (of course you can also chose to combine the articles into one). Others, like Wikipidea, usually have "disambiguation" pages, so a generic term that can have a lot of different meanings is associated with a single page that has a collection of links to more specific terms.
I tried to create a Slob dictionary with a lot of aliases (that can collide for different words). For example: "клея" should point to "клеить" (v.) and to "клей" (n.). But whenever I add alias via add_alias()
for the same target key, only the first one works.
To Reproduce
with slob.create(filepath) as writer:
writer.add(
"This is a definition for Word 1".encode("ascii"),
"word1", "alias_that_points_to_several_blobs",
content_type=slob.MIME_TEXT,
)
writer.add(
"This is a definition for Word 2".encode("ascii"),
*("word2", "alias_that_points_to_several_blobs"),
content_type=slob.MIME_TEXT)
writer.add_alias("this_alias_cannot_point_to_several_blobs", "word1")
writer.add_alias("this_alias_cannot_point_to_several_blobs", "word2")
(CORRECT) When user enters alias_that_points_to_several_blobs
both entries are shown:
(INCORRECT) When user enters this_alias_cannot_point_to_several_blobs
, both aliases should be shown in the list:
Expected behavior
When user enters this_alias_cannot_point_to_several_blobs
, both aliases should be shown in the list.
Environment:
go to frontpage (http://aarddict.org/) -> click 'Dictionaries' -> click 'These are dictionaries for Aard 2 .' and it sends you back to the frontpage.
Hi,
sorry, I don't really understand from the description how am I supposed to add a title to my dictionary.
When I create it in the way described, it appears with the title "???" in Aard for Android.
When I try to add it with the suggested command line as follows:
slob tag -n label -v "My new dictionary" dict.slob
I get the following in return:
No such tag
Please explain how to do it (and possibly extend the documentation with it).
Thank you,
Binyomin
Greetings, I'm a new user of Aard 2.
For reasons I don't care to explain here, I have disabled access to all web browsers on my phone. As such, the only way for me to download dictionaries for Aard is connect my phone to a desktop and transfer the files from there. This is not a huge inconvenience, though it is still an inconvenience. And so I ask: has it been considered to allow the Aard 2 app to download dictionary files from inside the app? Perhaps it could draw upon the list from the wiki page.
I understand that my use case is unusual, so I apologize if this request comes across as self-serving. However, I could see it being a (small) benefit to other users as well, since it would simplify the setup process by not requiring any external applications as it does now.
Anyway, I got the idea from another app I use (And Bible), which operates similar to the method I'm describing. That got me thinking that perhaps it's not so far-fetched after all.
In French Wiktionary section, the link points to a file with wrong SHA1 and size (285,147,136 instead of 405,635,161 bytes). The file looks damaged (GoldenDict does not detect it)
Please, update these following dictionaries.
Is a docker image available?
I'm trying to create a slob file containing dictCC dictionary data, my "converter" looks like this:
# -*- coding: utf-8 -*-
import csv
import os
import slob
import string
import sys
with slob.create(OUTPUT_FILE) as w:
with open(sys.argv[1], newline='') as csvfile:
fieldnames = ['key', 'value']
dictreader = csv.DictReader(filter(lambda row: row[0]!='#', csvfile), delimiter='\t', quotechar='"', fieldnames=fieldnames, restkey='restkey')
for row in dictreader:
if (row['key']):
type = ', '.join(row['restkey']
w.add((str(row['value']) + type).encode('utf-8'),
row['key'], content_type=PLAIN_TEXT)
Works just fine until I try to add a line with a German Umlaut in the 'value' column, slob.py raises a ValueError exception then.
Maybe you could more complex examples to the documentation, to enable python beginners to use slob.add? :-)
Any ideas on how to solve the problem?
Please add FOLDOC computing dictionary.
Dictionary source:
https://foldoc.org/source.html
Thank you for amazing work you are doing.
I just want to see the definitions or translation in english language only (the size of slob file will be small, and the design will be small and easy to use).
But with this enwikionary I get the definition of a word in many languages beside english! (why? it's enwikionary!!)
I saw an english only enwikionary in another forum but it's older than yours (2019) and also fewer number of entries than yours.
Can you just redesign it to be english only translations?
Is it possible to create a dictionary with images included?
On desktop, I'm happily using GoldenDict with the free babylon dictionaries for translation, and I'd like to use these with Aaard2 on mobile too.
AIUI there is no converter yet from bgl to slob, correct?
Maybe the easiest way to write one would be based on the BGL/Babylon_BGL parsing code of goldendict.
Hello ;
I need this File Dictionary:
pt-m-wiktionary-org-20141127.lzma2.slob
the download link dosent work.
Thank you
I'm playing around with some code that would convert DSL dictionaries (such as the ones that Lingvo and GoldenDict can use) to SLOB for Aard2 to be used on my phone.
This presents some... interesting problems, and i've decided to look for answers here.
to
or the
prefix, with/without part of the headword being a subscript/superscript, etc) of the same word or phrase. Would UCA eliminate the need to have some of these variants? Or do i need to map each one of them to an alias?language.from
, language.to
?)? I've noticed that Aard2 doesn't have a concept of translation dictionaries (mapping things from one language to another). Would anyone be interested in introducing that?dictionary.image
?). Would anyone be interested in adding support for such scheme to Aard2?v
). When user hovers a mouse over these abbreviations, a tooltip with the abbreviation card appears (for v
that would be verb
).
<div title=...>
for this, but eventually decided not to, given that there are (usually) no "hovering" on touch devices. Instead i grabbed a JS for collapsing/expanding a piece of HTML on click, and i'm planning to use that to make these abbreviations expand/collapse when touched.
<img src="~/image/foobar.jpg">
). What about sounds though? GoldenDict shows an icon that pays a sound when clicked. Do i need to do that in HTML for Aard2? Would it work? If yes, then maybe i should do that for all media (images and videos), instead of showing them inline.trn
(translates the headword), com
(comments; context for the translated words; various metadata) and !trs
(excludes the text from indexing). I take it that Aard2 does not support full-text search. Is there any point in preserving this information (how? <span class="trn">
?) in case Aard2 suddenly develops a full-text search feature in the future?good
and afternoon
aliases for the good afternoon
entry, so that a search for afternoon
would produce both the afternoon
entry and the good afternoon
entry)? Does SLOB allow aliases to point to multiple entries?Something like
slob trim_img in.slob out.slob
would be useful for making smaller files for Aard 2.
Or a more generic regex filter to accomplish the same thing
slob trim --rewite_txt 's/<img[^<>]+src="[^"]+\.(jpg|png|gif)"\/?>/[image placeholder]/g' --exclude_files '.*\.(jpg|png|gif)' in.slob out.slob
or whatever expressions best sute the format it's stored in.
Hello @itkach
Could you allow sortinging to be somewhere in storage, but NOT in RAM (slob.py), if it's possible?
I use Pyglossary tool (which uses slob.py originally) to convert many file types to slob, so I could use Aard2 app as my default multidictionary viewer.
But during converting of large files as wikipedias or wiktionaries "which have huge number of words" to slob, but sorting fails due to low memory (inspect of I have 6 gb RAM in my device).
Iam in love with Aard2 app and slob files which I can deal with freely, so if sorting into storage is possible it will be an amazing breakthrough.
Thanks in advance.
Did you consider tagging older versions?
There was an increase in political editing and censorship of Wikipedia in recent years. It would be useful to have older versions easily available for download as well.
Hello
I want to generate a slob file from wikionary.org.
thank you
@itkach
Thanks for making this awesome tool and I have been using Aard for quite sometime and telling others about this and sharing the dictionaries.:)
I have compiled a dictionary slob file for an Indian language, Kannada from wiktionary xml dump which has more than 250 thousand articles/pages. I would like to add this to the slob wiki here so that its easily accessible to all users of new Aard.
To do this, should I fork the wiki and send the pull request or it has to be done by any of the maintainers.? Please help. I would soon love to add other Indian languages as well. :)
Thanks. :)
I suggest 'slob' code to be published ontro PyPi. It eases up and opens up fo its reuse and hence improvements. 'slob' code is, for example, used in 'mwscrape2slob', however direct installation can be found frustrating with Slob not being on Pypi, and hence not installed with the usual setup.py
Thanks a lot for your dicts. I´ve downloaded the Spanish Wiktionary, but I see it is outdated, so How can I make a new updated one?
I suggest either using pwd
or target directory also for temporary files or adding a parameter for changing the temporary directory.
Let's assume that people know best where there is enough space available on their device
and they probably point there with the output file(s) parameter.
As we are often working with huge files here (e.g. Wikipedia), there might not be enough space in the system /tmp
directory.
Now, when you issue a command using slob
it automatically creates a temporary directory, which is not located in the current work directory or next to the slob
file you are working with but in the system /tmp
.
That could lead to wasted hours when /tmp
runs out of memory and slob
aborts the current task with an OSError: [Errno 28] No space left on device
.
the forum ( http://aarddict.org/forum ) does not work correctly under some browsers. I only have a title without themes. please check the forum
Hello,
I wanted to use this wiki:
https://de.wiktionary.org/wiki/gehen
togehter with GoldenDict.
So I downloaded this file:
http://ftp.halifax.rwth-aachen.de/aarddict/dewiki/dewiktionary-20160708.slob
I have installed it in GoldenDict. It works well. But if I click on the page https://de.wiktionary.org/wiki/gehen (in the offline version) on the link: "Alle weiteren Formen: Flexion:gehen" it calls for the browser. I would expect, that the content of this link is also part of the *.slob - File. But it seems not to be the case. That's a pity.
So I tryed to use the file
http://ftp.halifax.rwth-aachen.de/aarddict/dewiki/dewiktionary-20160708.aar
instead of
http://ftp.halifax.rwth-aachen.de/aarddict/dewiki/dewiktionary-20160708.slob
Result: Now the link to "Alle weiteren Formen: Flexion:gehen" seems to work (offline), what means the content behind this link is part of the *.aar- File.
But unfortunately the visualizing of the *.aar- File seems not to be appropriate to GoldenDict. so I can not use the *.aar- File.
Would appreciate your reply. Thank you.
Hi,
I checked the WIKI for Slob File format and it says
Element | Type | Description |
---|---|---|
content types | char-sized sequence of content types | MIME content types. Content items refer to content types by id. |
Content type id is 0-based position of content type in this sequence. |
However, when I checked a sample file I saw that size of content types is not char sized but short sized.
Example freedict-eng-tur-0.3.slob
00000d8c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000d9c 00 00 00 00 00 00 00 00 00 00 03 00 08 74 65 78 |.............tex|
00000dac 74 2f 63 73 73 00 16 61 70 70 6c 69 63 61 74 69 |t/css..applicati|
00000dbc 6f 6e 2f 6a 61 76 61 73 63 72 69 70 74 00 17 74 |on/javascript..t|
00000dcc 65 78 74 2f 68 74 6d 6c 3b 63 68 61 72 73 65 74 |ext/html;charset|
00000ddc 3d 75 74 66 2d 38 00 00 8e f0 00 00 00 00 00 0d |=utf-8..........|
There are 3 content types. However, size of content is not 08 but instead 00 08 It is also same with others. Is there a typo in the WIKI? I checked the slob.py and it also says
def read_text(self):
return self._read_text(U_SHORT)
def read_content_types():
content_types = []
count = f.read_byte()
for _ in range(count):
content_type = f.read_text()
content_types.append(content_type)
return tuple(content_types)
Hello itkach!
I use the "dewiki-20220601-NS0-ENTERPRISE-HTML.slob" (https://ftp.halifax.rwth-aachen.de/aarddict/dewiki/dewiki20220601-slob/dewiki-20220601-NS0-ENTERPRISE-HTML.slob) from mhbraun with Aard 2 v0.53.
On country entries in that dictionaries, there is a cut off of left side in portrait (see attached screenshot). In landscape everything is fine.
I'm not clear if this is a thing of dictionary or Aard.
Regards
Shudushi
Termux Variables:
TERMUX_APK_RELEASE=F_DROID
TERMUX_APP_PACKAGE_MANAGER=apt
TERMUX_APP_PID=29562
TERMUX_IS_DEBUGGABLE_BUILD=0
TERMUX_MAIN_PACKAGE_FORMAT=debian
TERMUX_VERSION=0.118.0
TERMUX__USER_ID=0
Packages CPU architecture:
aarch64
Subscribed repositories:
deb https://packages.termux.dev/apt/termux-main stable main
deb https://packages.termux.dev/apt/termux-x11 x11 main
deb https://tur.kcubeterm.com tur-packages tur tur-on-device tur-continuous
Updatable packages:
All packages up to date
termux-tools version:
1.42.0
Android version:
14
Kernel build information:
Linux localhost 4.19.191-28086179-abT225XXU6DWL9 #1 SMP PREEMPT Wed Dec 20 16:55:14 +07 2023 aarch64 Android
Device manufacturer:
samsung
Device model:
SM-T225
LD Variables:
LD_LIBRARY_PATH=
LD_PRELOAD=/data/data/com.termux/files/usr/lib/libtermux-exec.so
Installed termux plugins:
com.termux.styling versionCode:31
Dear all,
Is there a simple way to export the aard/slob database into an HTML formatted text file?
I have an old dictionary in dz format, can I convert that to slob?
The slab file is almost 5 years old. It would have been great if you pull an update. Thanks
Please, migrate aarddict forum from Google Groups to GitHub discussion.
Not all people are familiar with mailing list style. Finding info is also time consuming there.
So please consider this proposal.
I updated two dictionaries, which can be found here
I also wanted to update GCIDE (GNU Collaborative International Dictionary of English) to latest version. Here I found an XML format of the dictionary, but don't know how to work with split XML. I would appreciate some help.
OmegaWiki is a collaborative project to produce a free, multilingual dictionary in every language, with lexicological, terminological and thesaurus information.
The software is opensource and the data is free.
The key idea of OmegaWiki is to be based around concepts. This is what makes it truly multilingual.
So, by building a French-English and German-English dictionary, we are also building a German-French dictionary. If we add an Italian contributor, we build 3 more bilingual dictionaries... this is exponential.
Not sure where this wish belongs, so I also posted it to itkach/aard2-android#24
For example, I have a verb "gotować" (готовить, to cook) and it has a form "gotowałeś" (you prepared).
I need this form to be searchable by "gotowales" (without diacritics), but at the same time I need it to be shown in the search list as "gotowałeś".
Is it possible in Slob format?
Thanks for the wonderful work with aard2. I was interested in using slob.py directly to look through a downloaded enwiki.slob file, but have not been able to get it running despite setting up numerous python virtualenv with the latest versions of icu and pyicu. The error I get is:
Python 3.5.2 |Anaconda 4.1.1 (x86_64)| (default, Jul 2 2016, 17:52:12)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.import slob
Traceback (most recent call last):
File "", line 1, in
File "/Users/.../anaconda/lib/python3.5/site-packages/slob.py", line 29, in
import icu
File "/Users/.../anaconda/lib/python3.5/site-packages/icu.py", line 37, in
from docs import *
File "/Users/.../anaconda/lib/python3.5/site-packages/docs.py", line 23, in
from _icu import *
ImportError: dlopen(/Users/.../anaconda/lib/python3.5/site-packages/_icu.cpython-35m-darwin.so, 2): Library not loaded: libicui18n.54.dylib
Referenced from: /Users/.../anaconda/lib/python3.5/site-packages/_icu.cpython-35m-darwin.so
Reason: image not found
Not sure if I set up the environment correctly. Here is what I have installed:
$ conda list
packages in environment at /Users/brstream/anaconda/envs/env-slob:
Using Anaconda Cloud api site https://api.anaconda.org
icu 56.1 4 conda-forge
openssl 1.0.2h 2
pip 8.1.2 py35_0
PyICU 1.9.3
python 3.5.2 0
readline 6.2 2
setuptools 25.1.6 py35_0
Slob 1.0
sqlite 3.13.0 0
tk 8.5.18 0
wheel 0.29.0 py35_0
xz 5.2.2 0
zlib 1.2.8 3
(env-slob)
Thanks for your help.
Phap Luu
Google drive links posted here can be converted into direct download links by identifiying file id part and then,
<base_url>/uc?id=<id>&export=download
README.md's section Create from MediaWiki sites
does not mention https://meta.wikimedia.org/wiki/Data_dumps as Wikimedia publishes database dumps for all wikis, including Wikipedia and Wiktionary, updated monthly or twice a month. Importing the dumps is faster and lighter on resources than crawling, and crawlers seem to be rate-limited.
I'm trying to add a label to a already created slob file:
slob tag -n label -v "A Fine Dictionary" my.slob
All I get is a "No such tag" error.
slob info my.slob
TAGS
----
created.at: 2015-03-31T10:47:43.771907+00:00
version.icu: 54.1
version.python: 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)]
version.pyicu: 1.8
How do I ADD new tags?
Without the "label" tags my slobs are displayed as "???" in the android client :-(
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.