Giter VIP home page Giter VIP logo

keisks / joshua Goto Github PK

View Code? Open in Web Editor NEW

This project forked from joshua-decoder/joshua

0.0 1.0 0.0 344.28 MB

Joshua Statistical Machine Translation Toolkit

Home Page: http://joshua-decoder.org/

License: Other

Shell 5.79% Perl 6 0.01% Erlang 0.02% Perl 4.42% Python 0.90% C++ 39.10% C 0.94% Makefile 0.19% Smalltalk 0.03% Emacs Lisp 0.01% NewLisp 0.03% Ruby 0.03% Slash 0.01% SystemVerilog 0.01% ApacheConf 0.19% Java 48.16% HTML 0.20%

joshua's Introduction

Welcome to Joshua
-----------------

Joshua is a statistical machine translation toolkit for both
phrase-based (new in version 6.0) and syntax-based decoding. It can be
run with pre-built language packs available for download, and can also
be used to build models for new language pairs. Among the many features of
Joshua are:

- Support for both phrase-based and syntax-based decoding models
- Translation of weighted input lattices
- [Thrax](http://joshua-decoder.org/6.0/thrax.html): a Hadoop-based, scalable
  grammar extractor
- A [sparse feature architecture](http://cs.jhu.edu/~post/joshua-docs/md_sparse_features.html)
  supporting an arbitrary number of features

The latest release of Joshua is 6.0, released in January of 2014.

New in 6.0
----------

Joshua 6.0 includes the following new features:

- A fast phrase-based decoder with the ability to read [Moses](http://statmt.org/moses) 
  phrase tables
- Large speed improvements compared to the previous syntax-based decoder
- Special input handling
- A host of bugfixes and stability improvements

Working with "language packs"
-----------------------------

Joshua includes a number of "language packs", which are pre-built models that
allow you to use the translation system as a black box, without worrying too
much about how machine translation works. You can browse the models available
for download on the [Joshua
website](http://joshua-decoder.org/language-packs/).

Building new models
-------------------

Joshua includes a pipeline script that allows you to build new models, provided
you have training data.  This pipeline can be run (more or less) by invoking a
single command, which handles data preparation, alignment, phrase-table or
grammar construction, and tuning of the model parameters. See [the
documentation](http://joshua-decoder.org/pipeline.html)
for a walkthrough and more information about the many available options.

Quick start
-----------

To run the decoder in any form requires setting a few basic environment
variables: `$JAVA_HOME`, `$JOSHUA`, and potentially `$MOSES`.

    export JAVA_HOME=/path/to/java  # maybe /usr/java/home
    export JOSHUA=/path/to/joshua

You might also find it helpful to set these:

    export LC_ALL=en_US.UTF-8
    export LANG=en_US.UTF-8

Then, compile Joshua by typing:

    cd $JOSHUA
    ant

The basic method for invoking the decoder looks like this:

    cat SOURCE | JOSHUA m MEM -c CONFIG OPTIONS > OUTPUT

Some example usage scenarios and scripts can be found in the `examples/`
directory.

joshua's People

Contributors

afader avatar callison-burch avatar chrismattmann avatar cnap avatar gwenniger avatar jganitkevitch avatar jweese avatar keisks avatar kpu avatar lukeorland avatar mjmartindale avatar mjpost avatar noisychannel avatar tbpalsulich avatar xuchen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.