Giter VIP home page Giter VIP logo

nlp-kreyol-resources's Introduction

nlp-kreyol-resources

List of useful resources (dictionaries, corpora, papers, models, etc.) to work on NLP for french-based Caribbean creoles and similar creole languages

Datasets 📂

Ready-to-use

Open source annotated corpora.

Other corpora resources

> Written

  • Tatoeba [link]

    2 123 sentences in Guadeloupean creole with translations for some (English, French).

  • Kréyolad [link]

    All contributions of Jude Duranty to the newspaper Antilla from 2004 to 2023

> Oral

  • CREOLORAL [link] (Not open source)

    Anne Zribi-Hertz, Emmanuel Schang, Herby Glaude

    2012

    3 hours of oral data, spontaneously spoken MQ & GP creole, with transcriptions and french translations. Not open source but is (partially ?) available here (with no annotations ?) [link, 19 audio records]

  • Pawolotek [link]

    Simone Lagrand, "Titak - Panorama sonore du parler martiniquais"

    2022

    45 short audio files in Martinican creole (a few seconds/minutes long) recorded in Martinique.

> Unrealeased

Tracking unreleased corpora.

  • Data for Noun Phrases in mixed Martinican Creole and French: Evidence for an Underspecified Language Model

    Christelle Lengrai, Juliette Moustin, Pascal Vaillant

    Data recorded on radio broadcasts in Martinique in 2005-2006. Transcribed and annotated.

    2016

  • Data for How to Parse a Creole: When Martinican Creole Meets French

    Martinican creole tree bank: 240 fully annotated sentences. Not publicly available.

    2022

Models 📈

Classification

  • Guadeloupean Creole Language Identification Tool [link]

    2020, William Soto

Translation

  • CreoleM2M [link]

    2023, Raj Dabre

    Multilingual translation model built with HuggingFace. Support for 26 creoles including Saint Lucian, Seychellois, Mauritian, Haitian creoles. Online playground available here.

Speech Recognition & Query

  • ASR + Query-by-Example [link]

    2022, Cécile Macaire et al

    Guadeloupean and Mauritian creole. Goal: design linguistic tools for language documentation

Papers 📃

Building Datasets & Corpora

French-based

  • Case Study on Data Collection of Kreol Morisien, a Low-Resourced Creole Language [link]

    David Joshen Bastien, Vijay Prakash Chumroo, Johan Patrice Bastien

    2022, IST-Africa Conference

  • MorisienMT: A Dataset for Mauritian Creole Machine Translation [paper, dataset]

    Raj Dabre, Aneerav Sukhoo

    2022

  • Krik: First Steps into Crowdsourcing POS tags for Kréyòl Gwadloupéyen [link]

    Alice Millour, Karën Fort

    2018

Others

  • JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset [link]

    Ruth-Ann Armstrong, John Hewitt, Christopher Manning

    2022

Classification

French-based

  • Language Identification of Guadeloupean Creole [paper, code]

    William Soto

    2020

POS Tagging

  • Krik: First Steps into Crowdsourcing POS tags for Kréyòl Gwadloupéyen [link]

    Alice Millour, Karën Fort

    2018

  • How to Parse a Creole: When Martinican Creole Meets French [link]

    Ludovic Mompelat, Daniel Dakota, Sandra Kübler

    2022

Translation

French-based

  • Kreol Morisien to English and English to Kreol Morisien Translation System using Attention and Transformer Model [link]

    Zaheenah Boodeea, Sameerchand Pudaruth

    2020

Speech Recognition

  • Automatic Speech Recognition and Query By Example for Creole Languages Documentation [link, code]

    Cécile Macaire, Didier Schwab, Benjamin Lecouteux, Emmanuel Schang

    2022

    Guadeloupean & Mauritian Creoles

Others

General

  • On Language Models for Creoles [link]

    Heather Lent, Emanuele Bugliarello, Miryam de Lhoneux, Chen Qiu, Anders Søgaar

    2021, Conference on Computational Natural Language Learning

  • What a Creole Wants, What a Creole Needs [link]

    Heather Lent, Kelechi Ogueji, Miryam de Lhoneux, Orevaoghene Ahia, Anders Søgaard

    2022

  • Ancestor-to-Creole Transfer is Not a Walk in the Park [link]

    Heather Lent, Emanuele Bugliarello, Anders Søgaard

    2022

  • African Substrates Rather Than European Lexifiers to Augment African-diaspora Creole Translation [link]

    Nathaniel Romney Robinson, Nathaniel Romney Robinson, Matthew Dean Stutzman, Stephen D. Richardson, David R Mortensen

    2023

Other

  • Une grammaire formelle du créole martiniquais pour la génération automatique [link]

    Pascal Vaillant

    2003

nlp-kreyol-resources's People

Contributors

cibeah avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.