Giter VIP home page Giter VIP logo

simplex-pb's Introduction

This is the code for the highest performing lexical simplification system featured on the paper:
"SIMPLEX-PB: A Lexical Simplification Database and Benchmark for Portuguese"

It contains three files:
- lib.py: A library with the classes and functions necessary to perform simplification.
- simplifier.py: A simple script that tests the simplifier.
- dataset_propor2018.txt: The test set used for the experiments featured in the paper.

To test the simplifier, run the following command:

python simplifier.py dataset_propor2018.txt <embeddings_model> <language_model> <how_many_to_generate>

The parameters are:
- <test_corpus>: A lexical simplification corpus in the victor format, which is the format of the "dataset_propor2018.txt" file. Each line contains a sentence, a target complex word, its index in the sentence, and a series of gold substitutions accompanied by their simplicity rank. To know more about the victor format, please visit the LEXenstein manual (https://github.com/ghpaetzold/LEXenstein).
- <embeddings_model>: A word embeddings model in the binary format produced by word2vec (https://radimrehurek.com/gensim/models/word2vec.html).
- <language_model>: A language model in the binary format produced by the KenLM toolkit (https://kheafield.com/code/kenlm).
- <how_many_to_generate>: The number of candidate substitutions that the model will generate for each target complex word.

simplex-pb's People

Contributors

danillolino avatar nathanshartmann avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.