choco31415 / javamediawikibot Goto Github PK
View Code? Open in Web Editor NEWA Java bot framework for everything Mediawiki.
License: GNU General Public License v3.0
A Java bot framework for everything Mediawiki.
License: GNU General Public License v3.0
pagesNeeded has an incorrect equation.
Right now, getWikiPage() requires that the page exist. Hence, getting a page can take two API calls, one to see that it exists, then another to get it. It would be nice if getWikiPage() and the other similar methods returned null or some other indication that the page doesn't exist. doesPageExist() would then only be kept for nice interface design.
I forget why I put it in... When parsing for new lines, why are you excluding those in HTML comments?
Just so that I don't forget...
It appears Mediawiki has a Search API. It seems to replicate the PrefixSearch API. Look into this.
Something weird is going on. JavaMediawikiBot works on the https protected en wiki, but not the https protected indo wiki. For the indo wiki, I get this error:
"Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target"
Here are a few ideas regarding wiki families:
Remember that GenericBot, MediawikiDataManager and FamilyGenerator will all need updating.
Replace this line:
return !serverOutput.has("missing");
With:
return serverOutput.findValue("missing") != null;
This command handles page names incorrectly. One such page name is: Fr:Scratch Wiki Accueil/l'éditeur
As such, make the command escape HTML.
Maybe it would be nice to make all commands default to escaping HTML.
Example, when a connection can't be established, output what it can't connect to.
It would be really nice when running FamilyGenerator.java that the user is able to specify a path where the family file should be stored.
When I originally started this project, I did not known JSON or XML libraries existed, so I hacked together my own methods to read XML and JSON files. It worked well for a long while. However, currently, I am facing an issues. Some parameters that you can get from the users or image MW API's return XML/JSON objects instead of the normal String or int. My hacked together methods cannot handle this, at least not without mimicking an XML or JSON library.
As such, I would rather go with something trustworthy, proper XML and JSON libraries, then my hacked together methods.
To add and implement:
I think I will start with JSON because it will be the least used, and hence will be the easiest to implement.
The comments is GenericBot don't always describe what is happening. Plus, a few more comments to describe what is happening would be nice.
I really dislike the current system of burying numbers inside the methods themselves. Maybe a map would work. Hm
When I tried to get a non-extistant page on the Test wiki, no errors or exceptions were thrown, leading to the user going forwards with false assumptions about page existence.
While triple closing brackets ]]] may cause link text to include an extra bracket, this is not the case for interwiki links.
Title says it all. Example: Pages
Some UTF-8 streams start with a BOM character, messing up any attempts to read otherwise-fine output as JSON.
Code: serverOutput.findValue("invalid") == null
In Mediawiki, two page titles are equal if they have the same capitalization, except for the first letter. Fix equals() in PageLocation and maybe PageTitle too.
Right now, getAllPages returns a multiple of APIlimit, which could be higher then what you requested. This is not good design! Rewrite it, and any other methods that have a similar philosophy.
One of the big problems with bot work as is that every time it runs, any pages or information it queries must be downloaded again. This can be server heavy.
One workaround is for a user to write their own txt files.
Is it possible to add a built-in easy way to cache?
Right now, BotPanel and GenericBot have different throttle variables for controlling networking throttling. This isn't needed.
Honestly, it'd be so much easier to maintain and update everything
This project was originally written when I was somewhat familiar with Java. Aka I only knew about ArrayList. It would be nice if in the future, this project could use Set, Queue, Map, etc more often, where beneficial of course.
Currently, I use home-brew methods to parse XML output. It would be nice to move to a professional parser. Considering that the Jackson JSON parser is so nice, please move all XML querying to JSON querying and use Jackson for parsing.
It would be super useful if other API commands would be supported by JMB. These include, but are not limited to:
Read:
Write:
Sometimes when making server requests, you have to get a lot of data. In some cases, JMB breaks this up to multiple adjustable API calls. At other times, however, JMB simply tries to make one big API call. It would be nice if in all situations JMB would be aware of what it's doing and break huge API calls into multiple smaller calls. Take this for example:
Master does not currently parse for all HTML comment types. A patch is waiting in local InterwikiBot right now.
Tables are often used to display tabular data, and are often great resources when looking through a page. Supporting table parsing would be awesome.
Reading and writing files are common methods. If a project were to extend GenericBot or BotPanel, the extending class would not be allowed to make their own File W/R methods due to conflicts with existing File R/W methods in NetworkingBase. At least one method signature would be forbidden.
This is currently a problem in DwarfBot. Hence they cannot use snapshot.
It might be nice to allow reading and writing files outside of jar.
So far after pulling into InterwikiBot, I've so far found:
It seems like log-in is failing because I flipped two parameters in a User() object.
I have fixes available for the first two. I'll make sure the third is fixed before issuing a hot fix.
Currently, JMB does not support getting image metadata. I do not think JMB will crash if someone does attempt to get image metadata.
Getting image metadata would be useful for several purposes, including getting the license, artist, copyright, ect...
Check out the do loop of APIcommand. If there is a network error, it becomes infinite. Please fix soon.
Currently, PageLocation and PageTitle are difficult to use in HashMaps because while two objects may equal each other, they can have different hashes, messing up the very premise of HashMaps. Hence, it would be nice to provide a more consistent hashes such that two equal objects will also have the same hash.
This is a new property in MW 1.28. I'm still not sure what it does though.
Now that I know MW has an API for how it parses a page, should I use it? It gives a lot of helpful information, but does it give the same features as what JMB has? For example, will you still be able to see a page item positions or a link's display text?
Right now, PageTitle checks that incoming data is of type PageLocation. This means all PageLocation/PageTitle comparisons are broken. Oof.
When searching for PageObjects, it would be nice if these objects matching trimmed whitespace.
Currently for users to read certain properties from these classes, they have to know what the MW API property names are. For a beginner developer, I doubt they will know this. As such, in these classes, I should add methods to obscure the property API names.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.