Giter VIP home page Giter VIP logo

Comments (8)

itkach avatar itkach commented on September 16, 2024
  1. Question: Can you also include the content of sites like https://de.wiktionary.org/wiki/Flexion:gehen into the *.slob- File?

It can be done. By default mwscrape only downloads articles (MediaWiki namespace 0), but it can be directed to download any namespace with --namespace command line option. Here namespace is specified by its id (ideally it should also take name, of course, but somehow I never got around to implementing that). For example, Flexion happens to have id 108, so it can be downloaded with

mwscrape de.m.wiktionary.org --namespace 108

To find out namespace id one can look at namespace key in siteinfo database, browse it via CouchDB admin interface at http://localhost:5984/_utils/ or grep it with curl, something like

curl http://localhost:5984/siteinfo/de-m-wiktionary-org | python -m json.tool | grep Flexion -B 1

Once documents from the namespace are downloaded, dictionary should be compiled using mwscrape2slob with --article-namespace command line option specifying namespaces to be treated as article namespaces (by id, 108 in this case, or name - both should work). By default, links pointing to documents from non-article namespaces will be replaces with link to online version, because typically they are not included in the dictionary.

There may be some other namespaces that should be treated as regular articles, you may want to look for more of those.

@MHBraun, do you think you can give it a try?

It is best to post dictionary/content questions/requests at the forum (http://aarddict.org/forum). Pictures were also discussed at the forum. Short answer - perhaps one day, but currently there's no usable built-in support in dictionary making tools (mwscrape and mwscape2slob) for this.

from slob.

MHBraun avatar MHBraun commented on September 16, 2024

I will look into it.
Need to code some stuff.
However I do not have the time to test. Somebody out there to test the result?

And yes Igor, you are right we should movw this topic to 
http://aarddict.org/forum

Sent from my Samsung Galaxy smartphone.

-------- Original message --------
From: itkach [email protected]
Date:13/09/2016 20:06 (GMT+01:00)
To: itkach/slob [email protected]
Cc: MHBraun [email protected], Mention [email protected]
Subject: Re: [itkach/slob] offline using "de.wiktionary.org/wiki/gehen" and
".../wiki/Flexion:gehen" (#13)

  1. Question: Can you also include the content of sites like https://de.wiktionary.org/wiki/Flexion:gehen into the *.slob- File?

It can be done. By default mwscrape only downloads articles (MediaWiki namespace 0), but it can be directed to download any namespace with --namespace command line option. Here namespace is specified by its id (ideally it should also take name, of course, but somehow I never got around to implementing that). For example, Flexion happens to have id 108, so it can be downloaded with

mwscrape de.m.wiktionary.org --namespace 108
To find out namespace id one can look at namespace key in siteinfo database, browse it via CouchDB admin interface at http://localhost:5984/_utils/ or grep it with curl, something like

curl http://localhost:5984/siteinfo/de-m-wiktionary-org | python -m json.tool | grep Flexion -B 1
Once documents from the namespace are downloaded, dictionary should be compiled using mwscrape2slob with --article-namespace command line option specifying namespaces to be treated as article namespaces (by id, 108 in this case, or name - both should work). By default, links pointing to documents from non-article namespaces will be replaces with link to online version, because typically they are not included in the dictionary.

There may be some other namespaces that should be treated as regular articles, you may want to look for more of those.

@MHBraun, do you think you can give it a try?

It is best to post dictionary/content questions/requests at the forum (http://aarddict.org/forum). Pictures were also discussed at the forum. Short answer - perhaps one day, but currently there's no usable built-in support in dictionary making tools (mwscrape and mwscape2slob) for this.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

from slob.

Golddouble avatar Golddouble commented on September 16, 2024

Thank you MHBraun

Of course I will test the resulted *.slob- File.

from slob.

francwalter avatar francwalter commented on September 16, 2024

@Golddouble I have just a user question here: what is the profit to use the slob in Goldendict instead of Aard2? I have both and find Aard2 more comfortable, GoldenDict is very basic, has much less functionality I find. Thank for info. frank

from slob.

Golddouble avatar Golddouble commented on September 16, 2024

@francwalter
I did not know Aard2 until today.

The way I bump into *.slob offline wikies is the following:
I have installed GoldenDict long time ago and used it for translation Words from one language in an other. By using Goldendict I discovered the possibility to use wikies online. Then I found this interesting wiki: https://de.wiktionary.org/wiki/gehen
So I thought: Would be great to use "https://de.wiktionary.org/wiki/gehen" offline. And I asked in a forum if there is a possibility to use this wiki off- line with goldendict. So I bumped into http://ftp.halifax.rwth-aachen.de/aarddict/dewiki/dewiktionary-20160708.slob

As I do not miss any functions in GoldenDict, there is no risen to change to Aard2 for me now. I am not a friend to installing as much software on my computer as I can. I like to use one software for as much purposes as I can.

from slob.

MHBraun avatar MHBraun commented on September 16, 2024

Testing the mwscrape with --name-space 108 does just show flexion updates. Is there a second scrape required without --name-space 108 to update the articles?

from slob.

itkach avatar itkach commented on September 16, 2024

from slob.

MHBraun avatar MHBraun commented on September 16, 2024

Thanks. Update is in the works now.

from slob.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.