Giter VIP home page Giter VIP logo

Comments (6)

brendano avatar brendano commented on July 21, 2024

Does "mvn package" work for you? Does it succeed in building the final jar file?

If you want to use/try/evaluate the system, what's stopping you? These file/directory structuring things would be nice to have, but are they actually stopping you from getting work done?

We don't know much about the right way to structure java projects, so help will be appreciated.

Why are you providing the jargs jar? Did you change something in it so you cannot use the standard version that is accessible through maven?

That was before my time, I don't know

The same goes for the gnu trove jar that you provide. Any changes made to the library?

I doubt it

Why are you separating the actual src files into the separate src folder in the root of the project while maintaining the resources in the ark-tweet-nlp folder?

It seemed easier. I hate the way maven nests the src folder really deep by default, but figured that for resources we might as well use maven's default.

Are metaphone-map2.txt and ptb_ordered_metaphone.txt that are contained in the lib directory external resources or are they created by you? If so, why are they in the lib directory?

Created by us. They should be in resources/, that would be better.

Where is the posBerkeley.jar from? Is it available to the public (e.g. from here)?

It was sent to us via email, i believe, but that was before my time. (See the licensing file.) I don't like using it for this reason, because it's not directly available online to the public anywhere I know of -- though many parts of it are included in various Berkeley NLP software on that page.

from ark-tweet-nlp.

rosner avatar rosner commented on July 21, 2024

You're right: the jar is building with mvn packaging. I can use it now since I trained the model. So everything is fine. The reason I started digging around in the project itself was that the help of the tagger says that it uses an internal model. As I read it, it could be either a file or a resource that comes within the jar. But the default model that is hardcoded in the RunTagger class is not included in the jar.

Thanks!

from ark-tweet-nlp.

brendano avatar brendano commented on July 21, 2024

yeah, the model can be downloaded from the website. it's the only resource
that's not checked-in.

On Mon, Oct 22, 2012 at 10:44 AM, Norman Rosner [email protected]:

You're right: the jar is building with mvn packaging. I can use it now
since I trained the model. So everything is fine. The reason I started
digging around in the project itself was that the help of the tagger says
that it uses an internal model. As I read it, it could be either a file or
a resource that comes within the jar. But the default model that is
hardcoded in the RunTagger class is not included in the jar.

Thanks!


Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-9666273.

from ark-tweet-nlp.

brendano avatar brendano commented on July 21, 2024

Do you have any suggestions how to make the process less painful? I added a note about the model in particular to docs/hacking.txt.

from ark-tweet-nlp.

rosner avatar rosner commented on July 21, 2024

First I recommend using the standard maven project structure although you don't like the deep nesting. Every one who uses maven is used to the specific structure. It should als simplify the pom.xml.
Second, I believe that the jargs dependency is not used at all in the project so it could be removed. The RunTagger and the Train class parse the args manually and thus this dependency is not needed. Also gnu trove dependency could be fetched from a repository. As it turns out there's only one class (OWLQN) that is using gnu trove's THashSet.
Third I would ignore the build artifacts in the repo itself. Instead I would use maven to upload the arktweetnlp artifact to the repos Download section. Thus it stays out of the repo but is still accessible if people have problems building it.
Fourth the shell scripts to run the tagger or the tokenizer could be removed or edited so they don't confuse if they can't be run successfully. Also I don't understand the java.sh in the scripts directory. I believe that you guys use it for setting up your dev environment like IDE and stuff?

What do you think? I could work on a PR if you need help. Let me know.

from ark-tweet-nlp.

brendano avatar brendano commented on July 21, 2024

Thanks for looking into this.

FYI, java.sh is as I described in hacking.txt -- it just makes it easy to run the tagger on the commandline when developing in an IDE, by using the version of the .class files that (e.g.) Eclipse is auto-compiling. This is very helpful for quick development -- this is how we can do things like fix #14 so fast :)

on trove and owlqn -- so it's only a training-time dependency.

I don't understand the proposal to edit the runTagger and twokenize scripts -- are we talking about comments in them, or something?

from ark-tweet-nlp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.