Giter VIP home page Giter VIP logo

nlp's Introduction

A simple tool for NLP

The primary purpose of this tool is to get rid of stressful data managements with Mahout and Hadoop. Thus, it basically wraps Mahout and Hadoop with simple command line interfaces, but also provides some utilities.

Requirement

maven, jdk1.8 (other jdk cause failures), hadoop-2.6.0-cdh5.4.4, mahout-0.9-cdh5.4.4

Build

$ mvn package

Run

$ vi conf.json
$ vi run

Configure your environments

$ su {hadoop user}
$ ./run

Available commands are displayed if no arguments

Develop with Eclipse

$ mvn eclipse:eclipse

Note: you may encounter jdk.tools warnings on pom.xml if you convert the project to a Maven project.

License

MIT

TODO

  • DeleteJob

    • Deletes job results on HDFS
    • Hides HDFS from users more
  • Result decorator for Hive queries

    • Allows users to promptly analyze data by Mahout
    • Needs VectorWritable parser for Hive
  • Better logging

  • Stopping Maven directory layout

    • Moves target/ and eclipse settings out of tree for Git-friendly
    • CMake?
  • Spark movement

    • Potentially speeds up everything
    • But needs to consider high memory pressures
    • Parameter Server?
  • Job history and statistics collections

    • e.g., Hadoop job configuration, task counters (.xml and .jhist files)
    • May be useful for future uses
  • Add other data analytics

    • Machine learning, graph, etc.

Author

Takeshi Yoshimura (https://github.com/takeshi-yoshimura)

nlp's People

Contributors

takeshi-yoshimura avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.