choco31415 / javamediawikibot Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 2.0 4.18 MB

A Java bot framework for everything Mediawiki.

License: GNU General Public License v3.0

Java 100.00%

javamediawikibot's Issues

getAllPages doesn't batch calls

pagesNeeded has an incorrect equation.

Soften getWikiPage()

Right now, getWikiPage() requires that the page exist. Hence, getting a page can take two API calls, one to see that it exists, then another to get it. It would be nice if getWikiPage() and the other similar methods returned null or some other indication that the page doesn't exist. doesPageExist() would then only be kept for nice interface design.

New Line Parsing

I forget why I put it in... When parsing for new lines, why are you excluding those in HTML comments?

Search API

Just so that I don't forget...

It appears Mediawiki has a Search API. It seems to replicate the PrefixSearch API. Look into this.

HTTPS Somewhat Supported

Something weird is going on. JavaMediawikiBot works on the https protected en wiki, but not the https protected indo wiki. For the indo wiki, I get this error:

"Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target"

Use Path to Designate Wiki Family File

Here are a few ideas regarding wiki families:

Instead of a assuming all wiki families can be found in /src/main/resource/Families, let the user specify a Path object. This would enable JavaMediawikiBot to be jar friendly.

Remember that GenericBot, MediawikiDataManager and FamilyGenerator will all need updating.

Updated family files to use JSON, instead of my hacked txt encoding.

Path class for families
JSON encoding

doesPageExist broken

Replace this line:

return !serverOutput.has("missing");

With:

return serverOutput.findValue("missing") != null;

QueryPrefix

This command handles page names incorrectly. One such page name is: Fr:Scratch Wiki Accueil/l'éditeur

As such, make the command escape HTML.

Maybe it would be nice to make all commands default to escaping HTML.

Improve Debug Output

Example, when a connection can't be established, output what it can't connect to.

FamilyGenerator - Allow Custom File Path

It would be really nice when running FamilyGenerator.java that the user is able to specify a path where the family file should be stored.

Update XML and JSON Parsing

When I originally started this project, I did not known JSON or XML libraries existed, so I hacked together my own methods to read XML and JSON files. It worked well for a long while. However, currently, I am facing an issues. Some parameters that you can get from the users or image MW API's return XML/JSON objects instead of the normal String or int. My hacked together methods cannot handle this, at least not without mimicking an XML or JSON library.

As such, I would rather go with something trustworthy, proper XML and JSON libraries, then my hacked together methods.

To add and implement:

XML library
JSON library

I think I will start with JSON because it will be the least used, and hence will be the easiest to implement.

Improve GenericBot comments

The comments is GenericBot don't always describe what is happening. Plus, a few more comments to describe what is happening would be nice.

Add a way to set custom APILimits per command

I really dislike the current system of burying numbers inside the methods themselves. Maybe a map would work. Hm

Getting Non-existant Pages Does Not Cause Exceptions

When I tried to get a non-extistant page on the Test wiki, no errors or exceptions were thrown, leading to the user going forwards with false assumptions about page existence.

Parsing Interwiki Links

While triple closing brackets ]]] may cause link text to include an extra bracket, this is not the case for interwiki links.

Improve Comments in General

Title says it all. Example: Pages

Ignore BOM

Some UTF-8 streams start with a BOM character, messing up any attempts to read otherwise-fine output as JSON.

Support TLS v1.2

Update doesPageExist to Check for "invalid" Tag.

Code: serverOutput.findValue("invalid") == null

PageLocation Equality

In Mediawiki, two page titles are equal if they have the same capitalization, except for the first letter. Fix equals() in PageLocation and maybe PageTitle too.

Rewrite getAllPages Et Al

Right now, getAllPages returns a multiple of APIlimit, which could be higher then what you requested. This is not good design! Rewrite it, and any other methods that have a similar philosophy.

Support Caching

One of the big problems with bot work as is that every time it runs, any pages or information it queries must be downloaded again. This can be server heavy.

One workaround is for a user to write their own txt files.

Is it possible to add a built-in easy way to cache?

Merge BotPanel and GenericBot's Throttle Vars

Right now, BotPanel and GenericBot have different throttle variables for controlling networking throttling. This isn't needed.

Move bots outside of project

Honestly, it'd be so much easier to maintain and update everything

Upgrade Data Types

This project was originally written when I was somewhat familiar with Java. Aka I only knew about ArrayList. It would be nice if in the future, this project could use Set, Queue, Map, etc more often, where beneficial of course.

Shift Querying XML to Querying JSON, and use Jackson

Currently, I use home-brew methods to parse XML output. It would be nice to move to a professional parser. Considering that the Jackson JSON parser is so nice, please move all XML querying to JSON querying and use Jackson for parsing.

Add Other MW API Commands

It would be super useful if other API commands would be supported by JMB. These include, but are not limited to:

Read:

Write:

URI Too Long

Sometimes when making server requests, you have to get a lot of data. In some cases, JMB breaks this up to multiple adjustable API calls. At other times, however, JMB simply tries to make one big API call. It would be nice if in all situations JMB would be aware of what it's doing and break huge API calls into multiple smaller calls. Take this for example:

HTML Comment Parsing

Master does not currently parse for all HTML comment types. A patch is waiting in local InterwikiBot right now.

Page Parsing for Tables

Tables are often used to display tabular data, and are often great resources when looking through a page. Supporting table parsing would be awesome.

Move File R/W Methods

Reading and writing files are common methods. If a project were to extend GenericBot or BotPanel, the extending class would not be allowed to make their own File W/R methods due to conflicts with existing File R/W methods in NetworkingBase. At least one method signature would be forbidden.

This is currently a problem in DwarfBot. Hence they cannot use snapshot.

It might be nice to allow reading and writing files outside of jar.

Lots of Weird Bugs with GUI

So far after pulling into InterwikiBot, I've so far found:

Log-in is failing.
Caret positions aren't being set properly in the console.
Messages are being printed spastically to the console.

It seems like log-in is failing because I flipped two parameters in a User() object.

I have fixes available for the first two. I'll make sure the third is fixed before issuing a hot fix.

Support Getting Image Metadata

Currently, JMB does not support getting image metadata. I do not think JMB will crash if someone does attempt to get image metadata.

Getting image metadata would be useful for several purposes, including getting the license, artist, copyright, ect...

Infinite Loop with Network Error

Check out the do loop of APIcommand. If there is a network error, it becomes infinite. Please fix soon.

Override HashCode in PageLocation and PageTitle.

Currently, PageLocation and PageTitle are difficult to use in HashMaps because while two objects may equal each other, they can have different hashes, messing up the very premise of HashMaps. Hence, it would be nice to provide a more consistent hashes such that two equal objects will also have the same hash.

MW userInfo cancreate property

This is a new property in MW 1.28. I'm still not sure what it does though.

Use MW API Page Parsing?

Now that I know MW has an API for how it parses a page, should I use it? It gives a lot of helpful information, but does it give the same features as what JMB has? For example, will you still be able to see a page item positions or a link's display text?

PageTitle Incorrect Equal/Hash

Right now, PageTitle checks that incoming data is of type PageLocation. This means all PageLocation/PageTitle comparisons are broken. Oof.

Searching for PageObjects

When searching for PageObjects, it would be nice if these objects matching trimmed whitespace.

ImageInfo/UserInfo User Interface

Currently for users to read certain properties from these classes, they have to know what the MW API property names are. For a beginner developer, I doubt they will know this. As such, in these classes, I should add methods to obscure the property API names.

UserInfo
ImageInfo

choco31415 / javamediawikibot Goto Github PK

javamediawikibot's People

Contributors

Stargazers

Watchers

Forkers

javamediawikibot's Issues

Recommend Projects

Recommend Topics

Recommend Org