evllabs / jgaap Goto Github PK
View Code? Open in Web Editor NEWThe Java Graphical Authorship Attribution Program
Home Page: http://www.evllabs.com
The Java Graphical Authorship Attribution Program
Home Page: http://www.evllabs.com
Replace use of _34LetterWordEventDriver with MNLetterWordEventDriver set parameter M to 3 and N to 4
Replace use of HapaxLegomenaEventDriver with RareWordsEvent Diver with Parameter M set to 1 and Parameter N set to 1
Update this event driver to use the sentence event driver to break up sentences
Write the new Java FX pane for cullers
Null pointer when saving csvs in the gui
Replace use of WordBiGramEventDriver with WordNGramEventDriver set parameter N to 2
Write the new canonicizer gui
Replace use of POSBiGramEventDriver with POSNGramEventDriver set parameter N to 2
Replace use of HDLegomenaEventDriver with RareWordsEvent Diver with Parameter M set to 1 and Parameter N set to 2
There is a list of abbreviations located at /com/jgaap/resources/abbreviation.list
More abbreviations need to be added
Write the new Java FX pane for selecting Analysis Drivers and Distance Functions
Write the new pane that checks what a user has selected everything they need to run an experiment
This pane also lets people run their experiment
Update this event driver to use the new Sentence EventDriver
Replace use of V34LetterWordEventDriver with VowelMNLetterWordEventDriver set parameter M to 3 and N to 4
The probabilities generated by using the WEKA Naive Bayes classifier are only accurate to my calculated probabilities to a 99.5% confidence. I'm pretty sure this is due to my calculations being calculated slightly different that how WEKA calculates their probabilities. See my comments inside of WEKANaiveBayesTest.java for how I calculated the probabilities.
Write the new Document Pane for the GUI
Implement Burrow's Delta Analysis Method
N.B. This will be a "Distance"
Add a centroid to Burrow's Delta
Replace use of WordTetraGramEventDriver with WordNGramEventDriver set parameter N to 4
Replace use of _23LetterWordEventDriver with MNLetterWordEventDriver set parameter M to 2 and N to 3
Write the new Java FX event pane
WEKA currently uses "getNormalizedFrequency" for feature generation. Is this correct?
Replace use of WordTriGramEventDriver with WordNGramEventDriver set parameter N to 3
Add better indication of ties.
Suggested implementation:
*1. Author 0.0
*1. Author 0.0
3. Author 0.5
The WEKA Classifier Adaptor class was written in such a way as to encourage bad things (it is currently written such that the specific classifier is chosen in the constructor, and so there is no way to enforce that an actual classifier be supplied.
WEKA analysis method always results in ties.
Replace use of _24LetterWordEventDriver with MNLetterWordEventDriver set parameter M to 2 and N to 4
Create a k-fold and leave-one-out cross validation system for JGAAP.
Replace use of DisLegomenaEventDriver with RareWordsEvent Diver with Parameter M set to 2 and Parameter N set to 2
(2*(f(x)-g(x))/(f(x)+g(x)))^2
Add log4j to JGAAP and move all existing debug/error messages to the log4j framework.
It has come to light the current PoS tagger in JGAAP was designed to work on sentences not full documents.
Modify the PoS EventDriver to pass words into the PoS tagger.
Move the aaac documents into the /com/jgaap/resources package
This will allow jgaap to again be distributed as a standalone jar only
Replace use of _24LetterWordEventDriver with MNLetterWordEventDriver set parameter M to 2 and N to 4
Replace use of CharacterTetraGramEventDriver with CharacterNGramEventDriver set parameter N to 4
Rewrite the CLI so that it uses the Apache CLI Library
This will allow for help to be easier to use and make future updates easier.
Replace use of V24LetterWordEventDriver with VowelMNLetterWordEventDriver set parameter M to 2 and N to 4
Replace use of V23LetterWordEventDriver with VowelMNLetterWordEventDriver set parameter M to 2 and N to 3
The JGAAP display should account for ties, as follows:
Replace use of CharacterBiGramEventDriver with CharacterNGramEventDriver set parameter N to 2
Replace use of CharacterTriGramEventDriver with CharacterNGramEventDriver set parameter N to 3
There are a number of flat files floating around jgaap many of which have not been touched in years.
These should be looked through and decided whether or not they need to be kept around, updated, or are garbage.
EventHistogram could probably benefit from a constructor that takes an EventSet and generates the histogram. It seems like I am doing this almost every time I write a new analysis method.
Canonicizers no longer require the colour tag.
Create a new sentence EventDriver.
This is to break documents on the space after periods and account for abbreviations.
Split text on new lines
Being added at the request of Halim Sayoud
There are a number of UnitTests that have warnings about using depricated methods.
All of these unit tests should be modified to use the replacements for these methods, in general use the parameterized version.
This will let us be sure no functionality was broken or lost in the transition.
Find a new PoS tagger to use as an EventDriver.
Investigate possible cross-platform solutions for JGAAP launchers.
Proposals:
Executable jar - Current distribution method. We will probably continue to make this available no matter what.
Platform dependent executables - More work at the development end, probably more confusing to end users who will need to download the right executable. This will almost certainly end up just being an executable jar for linux, but we could make nice windows and mac installers
Java WebStart - Should be completely cross platform. Requires some form of Java to be installed to work (note this is true for executable jars, too, but it is possible to do platform dependent executables that install java themselves), but can then update automatically to the latest stable jgaap build and the appropriate JRE. Gives us desktop/start menu shortcuts without worrying about platform specific issues.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.