Comments (8)
- Question: Can you also include the content of sites like https://de.wiktionary.org/wiki/Flexion:gehen into the *.slob- File?
It can be done. By default mwscrape
only downloads articles (MediaWiki namespace 0), but it can be directed to download any namespace with --namespace
command line option. Here namespace is specified by its id (ideally it should also take name, of course, but somehow I never got around to implementing that). For example, Flexion happens to have id 108, so it can be downloaded with
mwscrape de.m.wiktionary.org --namespace 108
To find out namespace id one can look at namespace key in siteinfo database, browse it via CouchDB admin interface at http://localhost:5984/_utils/ or grep it with curl, something like
curl http://localhost:5984/siteinfo/de-m-wiktionary-org | python -m json.tool | grep Flexion -B 1
Once documents from the namespace are downloaded, dictionary should be compiled using mwscrape2slob
with --article-namespace
command line option specifying namespaces to be treated as article namespaces (by id, 108 in this case, or name - both should work). By default, links pointing to documents from non-article namespaces will be replaces with link to online version, because typically they are not included in the dictionary.
There may be some other namespaces that should be treated as regular articles, you may want to look for more of those.
@MHBraun, do you think you can give it a try?
It is best to post dictionary/content questions/requests at the forum (http://aarddict.org/forum). Pictures were also discussed at the forum. Short answer - perhaps one day, but currently there's no usable built-in support in dictionary making tools (mwscrape and mwscape2slob) for this.
from slob.
I will look into it.
Need to code some stuff.
However I do not have the time to test. Somebody out there to test the result?
And yes Igor, you are right we should movw this topic to
http://aarddict.org/forum
Sent from my Samsung Galaxy smartphone.
-------- Original message --------
From: itkach [email protected]
Date:13/09/2016 20:06 (GMT+01:00)
To: itkach/slob [email protected]
Cc: MHBraun [email protected], Mention [email protected]
Subject: Re: [itkach/slob] offline using "de.wiktionary.org/wiki/gehen" and
".../wiki/Flexion:gehen" (#13)
- Question: Can you also include the content of sites like https://de.wiktionary.org/wiki/Flexion:gehen into the *.slob- File?
It can be done. By default mwscrape only downloads articles (MediaWiki namespace 0), but it can be directed to download any namespace with --namespace command line option. Here namespace is specified by its id (ideally it should also take name, of course, but somehow I never got around to implementing that). For example, Flexion happens to have id 108, so it can be downloaded with
mwscrape de.m.wiktionary.org --namespace 108
To find out namespace id one can look at namespace key in siteinfo database, browse it via CouchDB admin interface at http://localhost:5984/_utils/ or grep it with curl, something like
curl http://localhost:5984/siteinfo/de-m-wiktionary-org | python -m json.tool | grep Flexion -B 1
Once documents from the namespace are downloaded, dictionary should be compiled using mwscrape2slob with --article-namespace command line option specifying namespaces to be treated as article namespaces (by id, 108 in this case, or name - both should work). By default, links pointing to documents from non-article namespaces will be replaces with link to online version, because typically they are not included in the dictionary.
There may be some other namespaces that should be treated as regular articles, you may want to look for more of those.
@MHBraun, do you think you can give it a try?
It is best to post dictionary/content questions/requests at the forum (http://aarddict.org/forum). Pictures were also discussed at the forum. Short answer - perhaps one day, but currently there's no usable built-in support in dictionary making tools (mwscrape and mwscape2slob) for this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
from slob.
Thank you MHBraun
Of course I will test the resulted *.slob- File.
from slob.
@Golddouble I have just a user question here: what is the profit to use the slob in Goldendict instead of Aard2? I have both and find Aard2 more comfortable, GoldenDict is very basic, has much less functionality I find. Thank for info. frank
from slob.
@francwalter
I did not know Aard2 until today.
The way I bump into *.slob offline wikies is the following:
I have installed GoldenDict long time ago and used it for translation Words from one language in an other. By using Goldendict I discovered the possibility to use wikies online. Then I found this interesting wiki: https://de.wiktionary.org/wiki/gehen
So I thought: Would be great to use "https://de.wiktionary.org/wiki/gehen" offline. And I asked in a forum if there is a possibility to use this wiki off- line with goldendict. So I bumped into http://ftp.halifax.rwth-aachen.de/aarddict/dewiki/dewiktionary-20160708.slob
As I do not miss any functions in GoldenDict, there is no risen to change to Aard2 for me now. I am not a friend to installing as much software on my computer as I can. I like to use one software for as much purposes as I can.
from slob.
Testing the mwscrape with --name-space 108 does just show flexion updates. Is there a second scrape required without --name-space 108 to update the articles?
from slob.
from slob.
Thanks. Update is in the works now.
from slob.
Related Issues (20)
- Is it possible to convert slob files to stardict? HOT 1
- Publish code onto PyPi HOT 1
- Converting DSL to SLOB HOT 26
- Feature Request: Babylon .bgl to slob HOT 3
- forum bug HOT 1
- A suggestion about enwikionary20200615 HOT 8
- Spanish Wiktionary HOT 4
- Slob File format error in WIKI HOT 5
- Sorting in storage instead of RAM HOT 3
- Links are dead in Arabic wiki HOT 3
- Tamil Wiktionary download link is not working. HOT 3
- Please Update Wikitionary For Hindi and Malayalam HOT 4
- Wikimedia data dumps HOT 2
- Suggestion: Tagging older versions HOT 1
- Cut off of left side in country entrys HOT 1
- Update slob download links HOT 3
- Add new dictionary HOT 2
- Convert dz to slob HOT 1
- Migrate to GitHub discussion HOT 2
- Add updated dictionary HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from slob.