Comments (6)
Does "mvn package" work for you? Does it succeed in building the final jar file?
If you want to use/try/evaluate the system, what's stopping you? These file/directory structuring things would be nice to have, but are they actually stopping you from getting work done?
We don't know much about the right way to structure java projects, so help will be appreciated.
Why are you providing the jargs jar? Did you change something in it so you cannot use the standard version that is accessible through maven?
That was before my time, I don't know
The same goes for the gnu trove jar that you provide. Any changes made to the library?
I doubt it
Why are you separating the actual src files into the separate src folder in the root of the project while maintaining the resources in the ark-tweet-nlp folder?
It seemed easier. I hate the way maven nests the src folder really deep by default, but figured that for resources we might as well use maven's default.
Are metaphone-map2.txt and ptb_ordered_metaphone.txt that are contained in the lib directory external resources or are they created by you? If so, why are they in the lib directory?
Created by us. They should be in resources/, that would be better.
Where is the posBerkeley.jar from? Is it available to the public (e.g. from here)?
It was sent to us via email, i believe, but that was before my time. (See the licensing file.) I don't like using it for this reason, because it's not directly available online to the public anywhere I know of -- though many parts of it are included in various Berkeley NLP software on that page.
from ark-tweet-nlp.
You're right: the jar is building with mvn packaging
. I can use it now since I trained the model. So everything is fine. The reason I started digging around in the project itself was that the help of the tagger says that it uses an internal model. As I read it, it could be either a file or a resource that comes within the jar. But the default model that is hardcoded in the RunTagger
class is not included in the jar.
Thanks!
from ark-tweet-nlp.
yeah, the model can be downloaded from the website. it's the only resource
that's not checked-in.
On Mon, Oct 22, 2012 at 10:44 AM, Norman Rosner [email protected]:
You're right: the jar is building with mvn packaging. I can use it now
since I trained the model. So everything is fine. The reason I started
digging around in the project itself was that the help of the tagger says
that it uses an internal model. As I read it, it could be either a file or
a resource that comes within the jar. But the default model that is
hardcoded in the RunTagger class is not included in the jar.Thanks!
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-9666273.
from ark-tweet-nlp.
Do you have any suggestions how to make the process less painful? I added a note about the model in particular to docs/hacking.txt.
from ark-tweet-nlp.
First I recommend using the standard maven project structure although you don't like the deep nesting. Every one who uses maven is used to the specific structure. It should als simplify the pom.xml
.
Second, I believe that the jargs dependency is not used at all in the project so it could be removed. The RunTagger
and the Train
class parse the args manually and thus this dependency is not needed. Also gnu trove dependency could be fetched from a repository. As it turns out there's only one class (OWLQN
) that is using gnu trove's THashSet
.
Third I would ignore the build artifacts in the repo itself. Instead I would use maven to upload the arktweetnlp artifact to the repos Download section. Thus it stays out of the repo but is still accessible if people have problems building it.
Fourth the shell scripts to run the tagger or the tokenizer could be removed or edited so they don't confuse if they can't be run successfully. Also I don't understand the java.sh
in the scripts directory. I believe that you guys use it for setting up your dev environment like IDE and stuff?
What do you think? I could work on a PR if you need help. Let me know.
from ark-tweet-nlp.
Thanks for looking into this.
FYI, java.sh is as I described in hacking.txt -- it just makes it easy to run the tagger on the commandline when developing in an IDE, by using the version of the .class files that (e.g.) Eclipse is auto-compiling. This is very helpful for quick development -- this is how we can do things like fix #14 so fast :)
on trove and owlqn -- so it's only a training-time dependency.
I don't understand the proposal to edit the runTagger and twokenize scripts -- are we talking about comments in them, or something?
from ark-tweet-nlp.
Related Issues (20)
- how does the tokenizer work? (whitespace tokenizer?) HOT 1
- boutta: P => V
- Twokenize runs into NullPointerException for conll output format, with provided example (casual.txt) HOT 1
- "yeen" O => (Pronoun Verb)
- Port to PHP? HOT 1
- jar dependencies are not pulled correctly HOT 1
- Cannot build properly HOT 2
- ark tweet tagger fails with a conll input file with just one column
- the --input-field command option doesn't even seem to work
- Missing default model.20120919 after building from source code HOT 2
- could you explain the mean of the "model.20120919.txt"? HOT 2
- LICENSE Issue GPLv2 compatibility with GPLv3 HOT 2
- kevinzzz007/ark-tweet-nlp : WindowsError: [Error 2] The system cannot find the file specified
- Word Cluster HOT 3
- --output-file doesn't work
- Use POS without tokenizer HOT 1
- Use twitter-text to extract hashtags, mentions, and URLs HOT 1
- Cannot Train POS with Locale Other Than English HOT 2
- GPL
- Trying to get in touch regarding a security issue
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ark-tweet-nlp.