Giter VIP home page Giter VIP logo

thumt's Introduction

THUMT: An Open Source Toolkit for Neural Machine Translation

Contents

Introduction

THUMT is a data-driven machine translation system developed by the Natural Language Processing Group at Tsinghua University.

Machine translation is a natural language processing task that aims to translate natural languages using computers automatically. Recent several years have witnessed the rapid development of end-to-end neural machine translation, which has become the new mainstream method in practical MT systems.

On top of Theano, THUMT is an open-source toolkit for neural machine translation with the following features:

  • Attention-based translation model. THUMT implements the standard attention-based encoder-decoder framework for NMT.
  • Minimum risk training. Besides standard maximum likelihood estimation (MLE), THUMT also supports minimum risk training (MRT) that aims to find a set of model parameters that minimize the expected loss calculated using evaluation metrics such as BLEU on the training data.
  • Exploiting monolingual data. THUMT provides semi-supervised training (SST) for NMT that is capable of exploiting abundant monolingual corpora to improve the learning of both source-to-target and target-to-source NMT models.
  • Visualization. To better understand the internal workings of NMT, THUMT features a visualization tool to demonstrate the relevance between each intermediate state and its relevant contextual words.

Website

http://thumt.thunlp.org

User Manual

This user manual describes how to install and use THUMT.

Documentation

This documentation provides detailed information about the functions in THUMT.

License

The source code is dual licensed. Open source licensing is under the BSD-3-Clause, which allows free use for research purposes. For commercial licensing, please email [email protected].

Citation

Please cite the following paper:

Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu. 2017. THUMT: An Open Source Toolkit for Neural Machine Translation. arXiv:1706.06415.

Development Team

Project leaders: Maosong Sun, Yang Liu, Huanbo Luan

Project members: Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng

Contact

If you have questions, suggestions and bug reports, please email [email protected].

thumt's People

Contributors

thumt avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.