Giter VIP home page Giter VIP logo

javamediawikibot's People

Contributors

choco31415 avatar enzanki-ars avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

javamediawikibot's Issues

Soften getWikiPage()

Right now, getWikiPage() requires that the page exist. Hence, getting a page can take two API calls, one to see that it exists, then another to get it. It would be nice if getWikiPage() and the other similar methods returned null or some other indication that the page doesn't exist. doesPageExist() would then only be kept for nice interface design.

New Line Parsing

I forget why I put it in... When parsing for new lines, why are you excluding those in HTML comments?

Search API

Just so that I don't forget...

It appears Mediawiki has a Search API. It seems to replicate the PrefixSearch API. Look into this.

HTTPS Somewhat Supported

Something weird is going on. JavaMediawikiBot works on the https protected en wiki, but not the https protected indo wiki. For the indo wiki, I get this error:

"Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target"

Use Path to Designate Wiki Family File

Here are a few ideas regarding wiki families:

  1. Instead of a assuming all wiki families can be found in /src/main/resource/Families, let the user specify a Path object. This would enable JavaMediawikiBot to be jar friendly.

Remember that GenericBot, MediawikiDataManager and FamilyGenerator will all need updating.

  1. Updated family files to use JSON, instead of my hacked txt encoding.
  • Path class for families
  • JSON encoding

doesPageExist broken

Replace this line:

return !serverOutput.has("missing");

With:

return serverOutput.findValue("missing") != null;

QueryPrefix

This command handles page names incorrectly. One such page name is: Fr:Scratch Wiki Accueil/l'éditeur

As such, make the command escape HTML.

Maybe it would be nice to make all commands default to escaping HTML.

Improve Debug Output

Example, when a connection can't be established, output what it can't connect to.

Update XML and JSON Parsing

When I originally started this project, I did not known JSON or XML libraries existed, so I hacked together my own methods to read XML and JSON files. It worked well for a long while. However, currently, I am facing an issues. Some parameters that you can get from the users or image MW API's return XML/JSON objects instead of the normal String or int. My hacked together methods cannot handle this, at least not without mimicking an XML or JSON library.

As such, I would rather go with something trustworthy, proper XML and JSON libraries, then my hacked together methods.

To add and implement:

  • XML library
  • JSON library

I think I will start with JSON because it will be the least used, and hence will be the easiest to implement.

Improve GenericBot comments

The comments is GenericBot don't always describe what is happening. Plus, a few more comments to describe what is happening would be nice.

Parsing Interwiki Links

While triple closing brackets ]]] may cause link text to include an extra bracket, this is not the case for interwiki links.

Ignore BOM

Some UTF-8 streams start with a BOM character, messing up any attempts to read otherwise-fine output as JSON.

PageLocation Equality

In Mediawiki, two page titles are equal if they have the same capitalization, except for the first letter. Fix equals() in PageLocation and maybe PageTitle too.

Rewrite getAllPages Et Al

Right now, getAllPages returns a multiple of APIlimit, which could be higher then what you requested. This is not good design! Rewrite it, and any other methods that have a similar philosophy.

Support Caching

One of the big problems with bot work as is that every time it runs, any pages or information it queries must be downloaded again. This can be server heavy.

One workaround is for a user to write their own txt files.

Is it possible to add a built-in easy way to cache?

Upgrade Data Types

This project was originally written when I was somewhat familiar with Java. Aka I only knew about ArrayList. It would be nice if in the future, this project could use Set, Queue, Map, etc more often, where beneficial of course.

Shift Querying XML to Querying JSON, and use Jackson

Currently, I use home-brew methods to parse XML output. It would be nice to move to a professional parser. Considering that the Jackson JSON parser is so nice, please move all XML querying to JSON querying and use Jackson for parsing.

Add Other MW API Commands

It would be super useful if other API commands would be supported by JMB. These include, but are not limited to:

Read:

  • Compare pages
  • Get user information
  • Get user contributions
  • Search
  • Site info (aka. metadata)

Write:

  • Upload local files
  • Restoring deleted revisions
  • Protect/unprotect pages
  • Block/unblock users
  • Send email
  • Patrol changes
  • Change user group membership
  • Create accounts
  • Watch pages

URI Too Long

Sometimes when making server requests, you have to get a lot of data. In some cases, JMB breaks this up to multiple adjustable API calls. At other times, however, JMB simply tries to make one big API call. It would be nice if in all situations JMB would be aware of what it's doing and break huge API calls into multiple smaller calls. Take this for example:

screen shot 2016-10-07 at 3 49 11 pm

HTML Comment Parsing

Master does not currently parse for all HTML comment types. A patch is waiting in local InterwikiBot right now.

Page Parsing for Tables

Tables are often used to display tabular data, and are often great resources when looking through a page. Supporting table parsing would be awesome.

Move File R/W Methods

Reading and writing files are common methods. If a project were to extend GenericBot or BotPanel, the extending class would not be allowed to make their own File W/R methods due to conflicts with existing File R/W methods in NetworkingBase. At least one method signature would be forbidden.

This is currently a problem in DwarfBot. Hence they cannot use snapshot.

It might be nice to allow reading and writing files outside of jar.

Lots of Weird Bugs with GUI

So far after pulling into InterwikiBot, I've so far found:

  • Log-in is failing.
  • Caret positions aren't being set properly in the console.
  • Messages are being printed spastically to the console.

It seems like log-in is failing because I flipped two parameters in a User() object.

I have fixes available for the first two. I'll make sure the third is fixed before issuing a hot fix.

Support Getting Image Metadata

Currently, JMB does not support getting image metadata. I do not think JMB will crash if someone does attempt to get image metadata.

Getting image metadata would be useful for several purposes, including getting the license, artist, copyright, ect...

Override HashCode in PageLocation and PageTitle.

Currently, PageLocation and PageTitle are difficult to use in HashMaps because while two objects may equal each other, they can have different hashes, messing up the very premise of HashMaps. Hence, it would be nice to provide a more consistent hashes such that two equal objects will also have the same hash.

Use MW API Page Parsing?

Now that I know MW has an API for how it parses a page, should I use it? It gives a lot of helpful information, but does it give the same features as what JMB has? For example, will you still be able to see a page item positions or a link's display text?

PageTitle Incorrect Equal/Hash

Right now, PageTitle checks that incoming data is of type PageLocation. This means all PageLocation/PageTitle comparisons are broken. Oof.

ImageInfo/UserInfo User Interface

Currently for users to read certain properties from these classes, they have to know what the MW API property names are. For a beginner developer, I doubt they will know this. As such, in these classes, I should add methods to obscure the property API names.

  • UserInfo
  • ImageInfo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.