Comments (10)
It is supported in experiment engine
from jgaap.
Useful starting point for researching how to implement this: https://github.com/evllabs/JGAAP/blob/master/src/com/jgaap/backend/ExperimentEngine.java#L183
from jgaap.
As you can see here, you can include canonicizers and cullers on individual events. Canonicizers prefixed with @ and cullers prefixed with #. There was documentation outlining all of this but it must have been lost with the wiki.
from jgaap.
I started a branch that uses much clearer and expensive JSON for experiments if you’re looking for something fun to build on.
Here is an example experiment in that framework. It also adds validation as a first class setting.
https://github.com/evllabs/JGAAP/blob/643f677f5fc5173a99099ba82b29381922c181e3/src/test/resources/experiment.json
from jgaap.
As you can see here, you can include canonicizers and cullers on individual events. Canonicizers prefixed with @ and cullers prefixed with #. There was documentation outlining all of this but it must have been lost with the wiki.
Oh wow! I did not know about this. Even looking at an archived page of the wiki, it looks like that information is not available: http://web.archive.org/web/20100810065633/http://server8.mathcomp.duq.edu/jgaap/w/index.php/Command_line
Thank you for pointing this out so I didn't have to go hunting myself. This is a better implementation than what I was going to do, actually. I was going to have the canonicizers work the same way as they do in the UI, where they are applied to all events.
I started a branch that uses much clearer and expensive JSON for experiments if you’re looking for something fun to build on.
Here is an example experiment in that framework. It also adds validation as a first class setting.
https://github.com/evllabs/JGAAP/blob/643f677f5fc5173a99099ba82b29381922c181e3/src/test/resources/experiment.json
Ah I did not know about this either. How significant is the performance hit? I think that we should consider replacing the CSV approach with this or at least making it available in addition to the CSV format. It's far cleaner and more user-friendly.
from jgaap.
from jgaap.
https://web.archive.org/web/20160527010715/http://evllabs.com/jgaap/w/index.php/Experiment_Engine
Thanks for sharing! I did not realize that there was a more up-to-date version available out there. I'm going to scrape it and put recreating it and updating it where necessary on our list of things to do at some point.
Also, what do you think of my suggestion?
How significant is the performance hit? I think that we should consider replacing the CSV approach with this or at least making it available in addition to the CSV format. It's far cleaner and more user-friendly.
from jgaap.
Yes it takes a bit of a performance hit but it’s not too bad. It is the way to go in my opinion because it enforces best practices better and has more readable results.
If you start here you can see it’s part of an effort to rebuild around Spring https://github.com/evllabs/JGAAP/blob/643f677f5fc5173a99099ba82b29381922c181e3/src/main/java/com/jgaap/rest/JGAAPApplication.java
My thought is if you have this rest based ee you can spin up a bunch ok jgaaps and have them work like a cluster / micro service
from jgaap.
from jgaap.
Nice! I like the idea of a Spring-based JGAAP. Being able to do it like this would be much nicer than the hacky methods that I have used to do large-scale experimenting in the past. (Populate a database table with experiment configurations and use some wrapper code to pull experiment configs from the table.)
It's not something that I can play with right now (too many other, higher priorities), but I definitely would be interested in exploring this further if you do not move forward with it yourself.
Here you can see we build a confusion matrix that’s way more informative than the hard to parse text files.
Cool! I think if the plan is to phase out the hard-to-parse text files, it would still be worth making the results available in a JSON format. Whenever I have a large set of results from an experiment, I use a set of Python scripts to read them. The thing is, those scripts, while functional, have to do awkward string manipulations to work and sometimes require slight modifications to work depending on the scenario. It would be great to have that output available in a JSON format so that if necessary, these kinds of tools could be used but would not require awkward string manipulation.
from jgaap.
Related Issues (20)
- "Smash I" Regressive Cases HOT 2
- Anti-stylometry component? HOT 5
- Repetition of "null" in Output from Experiment Engine HOT 2
- Add Rhyme Event Driver
- Add Poetic Meter Driver HOT 1
- Add SVM Classification HOT 1
- Add Smart Quote Canonicizer HOT 1
- Problem in building the application HOT 3
- Fix Centroid Driver Explanation Typo
- Mac OS Big Sur problem HOT 12
- K-Nearest Neighbor Analysis Method does not provide an option to set the K parameter in the GUI. HOT 4
- Constant errors after setting specific parameters that do not go away unless fully resetting all progress HOT 6
- Sorted NGram EventDriver JUnit test
- Stanford NER JUnit test
- Newline EventDriver JUnit test
- Hellinger Distance JUnit test
- Stamatatos Distance JUnit test
- Index out of bounds errors in event cullers HOT 5
- Maven? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jgaap.