Giter VIP home page Giter VIP logo

apertium-eu-fr's Introduction

Basque and French

These are the linguistic data for the Apertium Basque--French machine translator.
It translates only in the direction Basque--French. 
You need apertium-3.0 and lttoolbox-3.0 to use this translator.


To compile the linguistical data simply do:

$ ./configure

to generate a Makefile file and then

$ make

inside of this directory.


TAGGER 

To use this language-pair package with apertium YOU DO NOT NEED TO
RETRAIN THE TAGGER. Probabilities and auxiliary data are provided for
both the en-ca and the ca-en translation directions which should be
acceptable for most applications, and should work even if you change
the dictionaries in a reasonably way.

If for some reason you need to retrain the tagger (for example, you
have made really extensive changes to the dictionaries such as
creating new lexical categories), you have three alternatives:

* To perform a supervised training:

  To this end tagged corpora is provided, but tagged corpora
  (eu-tagger-data/eu.tagged ) could be
  obsolete for some words. If this is the case, the tagger training 
  program  will show you where the problems are and you will need 
  to solve them by hand. Be sure to solve the problems by modifying 
  ONLY the .tagged file, NEVER the .untagged file that is 
  automatically generated.

  The supervised training is done by typing: 

  make -f eu-es-supervised.make (for the Basque part-of-speech tagger)
* To perform an unsupervised training:

  For this purpose you will need to assemble a large (hundreds of
  thousand of words) plain-text corpus for each language (for example,
  using a robot to harvest text from online newspapers) and put them in
  the proper place, for instance eu-tagger-data/eu.crp.txt. This type
of training does not need human
  intervention but, as expected, results will be less adequate than
  those obtained with the supervised training.

  The unsupervised training is done through the iterative Baum-Welch
  algorithm. By default the number of iterations is set to 8, but you
  can change this value by editing the Makefile and changing the
  value of TAGGER_UNSUPERVISED_ITERATIONS.

  The unsupervised training is done by typing:

  make -f eu-es-unsupervised.make (for the Basque part-of-speech tagger)

  This is the training method followed to train the basque tagger.


* To perform an unsupervised training by using target-language
  information and the rest of the modules of the Apertium MT engine:

  To do so you need large plain-text corpora on both languages. Please
  download the apertium-tagger-training-tools package and follow the
  instructions provided there.

apertium-eu-fr's People

Contributors

jimregan avatar unhammer avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.