Giter VIP home page Giter VIP logo

py-3rdparty-mediawiki's Introduction

py-3rdparty-mediawiki's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

py-3rdparty-mediawiki's Issues

add wikiedit functionality

allow looping over a set of pages and modify according to a lambda function.

Typical lambdas would be:

  • replace regex
  • edit WikiSon attribute

add html page retrieval

add duck typed version of html page retrieval so that either pywikibot or mwclient may be used (mwclient might not have this API support at this time)

add wikiundo

mark edits, pushes and the like with a tag to allow undo (signature)

Empty result list of ask query leads to error

AttributeError: 'list' object has no attribute 'keys'

is given e.g. on query:

{{#ask: [[Acronym::+]]
|mainlabel=Event
| ?Acronym = acronym
| ?_CDAT=creation date
| ?_MDAT=modification date
| limit=200 
}}

on openresearch.org

Overcome upper limit of SMW ask queries

ask results are limited e.g. to 1000/10.000 results based on the server configuration and the rights of the account being used.

When doing a query like

[[modification date::+]]

The resultsize is the number of pages in the wiki which might well be far more than 10.000

There should be an approach to workaround the limitation e.g. by batching or splitting a query into multiple queries. E.g. the above query can be modified to

[[modification date::<2020]][[modification date::>2018]]

To only select a range of dates - by then going range by range e.g. the query for the backup can be split to a point where it works e.g. by trying it out with a count result first.

Escaping of the query leads to different queries

During the query escaping a blank is replaced with a underscore see:

part = re.sub(" ", "_",part);

For example query "[[isA::Event series]]" is converted to "[[isA::Event_series]]".
On www.openresearch.org this results in two distinct results but on my own wiki not.
A quick test showed that removing the mentioned line from above seems to resolve this issue (The query is then encoded by the library requests (" " to "+"))

Should this issue be fixed or should it stay dependent on the wiki configuration?

add --withImages / -wi option

copying images should not be the default but needs to be activated since there might be many images in some pages and this holds up the initial copy.

Moving .wiki files from a local machine to a live wiki

Say one has an x.wiki file and would like to upload that to their wiki

There is a method to upload files from a target wiki to a destination one, but I don't need a source wiki in my case as the file is on my own physical machine.

Support more datatypes

The following query fromhttps://www.semantic-mediawiki.org/wiki/Demo:JSON_-_datatype_export_examples

{{#ask:
 [[Has_annotation_uri::+]]
 |?Has_annotation_uri=anu
 |?Has_boolean=boo
 |?Has_code=cod
 |?Has_date=dat
 |?Has email address=ema
 |?Has Wikidata item ID=eid
 |?Has coordinates=geo
 |?Has number=num
 |?Has mlt=mlt
 |?Has example=wpg
 |?Telephone number=tel
 |?Has temperatureExample=tem
 |?Area=qty
 |?SomeProperty=txt
 |?Has Wikidata Reference=ref_rec
 |?Soccer result=rec
 |?Has_URL=uri
 |format=json
 |limit=1
}}

should work

Add helper for MediaWiki tables

Allow to convert different data representations to MediaWiki table.
E.g. a list of dicts as input

 listofDicts=[
            {'name': 'Elizabeth Alexandra Mary Windsor', 'born': self.dob('1926-04-21'), 'numberInLine': 0, 'wikidataurl': 'https://www.wikidata.org/wiki/Q9682' },
            {'name': 'Charles, Prince of Wales',         'born': self.dob('1948-11-14'), 'numberInLine': 1, 'wikidataurl': 'https://www.wikidata.org/wiki/Q43274' },
            {'name': 'George of Cambridge',              'born': self.dob('2013-07-22'), 'numberInLine': 3, 'wikidataurl': 'https://www.wikidata.org/wiki/Q1359041'},
            {'name': 'Harry Duke of Sussex',             'born': self.dob('1984-09-15'), 'numberInLine': 6, 'wikidataurl': 'https://www.wikidata.org/wiki/Q152316'}
        ]

handle ask case where mainlabel is set

{{#ask: [[Category:City]] [[Located in::Germany]] |mainlabel=City |?Population |?Area#km² = Size in km² }}

should be valid and return a list of city records

add wikibackup

wikibackup shall copy pages and images to a backup directory and optionally use git version control for the result. Basically it is a download functionality.

Read API error with read rights given.

The user has read rights but the error message shows that there is a read rights issue.

copying 1 pages from media to wiki
1/1 ( 100%): copying Template:1toN ...:x::('readapidenied', 'You need read permission to use this module.', 'See http://media.bitplan.com/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.')```

fix handling of exists

exists messages should be marked with ✅ in ignore mode. They might still be shown (even if somewhat verbose)

copying image File:Begriff Systemkontext.png ...✅:('fileexists-no-change', 'The upload is an exact duplicate of the current version of [[:File:Begriff Systemkontext.png]].', 'See http://swa.bitplan.com/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.')

regression: wikibackup fails with TypeError

wikibackup -s wiki -q "[[modification date::>2021]]"
Start: 2021-01-02 09:15:18, End: 2021-02-09 03:47:47
wikibackup: TypeError("unsupported operand type(s) for /: 'datetime.timedelta' and 'NoneType'")
for help use --help

Empty lists should be returned as None

To make handling of simple types easier e.g. empty string - the already implemented "delisting" that return items for lists of length ones shall be extended to a list length of zero.

limit handling

in query mode the limit is ignored and a full query done. The performance is bad in these cases.

add wikiquery option

The wikiquery command line should allow to get results in a given format:

  • csv
  • json

The base functionality for this should be taken from pyLodStorage.

Acceptance criterion:

Situation
The query {{#ask: [[IsA::Event]][[Acronym::~ES*]][[start date::>2018]][[start date::<2019]] | ?Title = title | ?Event in series = series | ?ordinal=ordinal | ?Homepage = homepage |format=table }}
is given see https://www.openresearch.org/wiki/Workdocumentation_2021-02-16
Action
put the query in a file "query1.ask"

  1. wikiquery -s or --queryFile query1.ask --format csv
  2. wikiquery -s or --queryFile query1.ask --format json

Expected Result

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.