List of useful resources (dictionaries, corpora, papers, models, etc.) to work on NLP for french-based Caribbean creoles and similar creole languages
Open source annotated corpora.
-
Tatoeba [link]
2 123 sentences in Guadeloupean creole with translations for some (English, French).
-
Kréyolad [link]
All contributions of Jude Duranty to the newspaper Antilla from 2004 to 2023
-
CREOLORAL [link] (Not open source)
Anne Zribi-Hertz, Emmanuel Schang, Herby Glaude
2012
3 hours of oral data, spontaneously spoken MQ & GP creole, with transcriptions and french translations. Not open source but is (partially ?) available here (with no annotations ?) [link, 19 audio records]
-
Pawolotek [link]
Simone Lagrand, "Titak - Panorama sonore du parler martiniquais"
2022
45 short audio files in Martinican creole (a few seconds/minutes long) recorded in Martinique.
Tracking unreleased corpora.
-
Data for Noun Phrases in mixed Martinican Creole and French: Evidence for an Underspecified Language Model
Christelle Lengrai, Juliette Moustin, Pascal Vaillant
Data recorded on radio broadcasts in Martinique in 2005-2006. Transcribed and annotated.
2016
-
Data for How to Parse a Creole: When Martinican Creole Meets French
Martinican creole tree bank: 240 fully annotated sentences. Not publicly available.
2022
-
Guadeloupean Creole Language Identification Tool [link]
2020, William Soto
-
CreoleM2M [link]
2023, Raj Dabre
Multilingual translation model built with HuggingFace. Support for 26 creoles including Saint Lucian, Seychellois, Mauritian, Haitian creoles. Online playground available here.
-
ASR + Query-by-Example [link]
2022, Cécile Macaire et al
Guadeloupean and Mauritian creole. Goal: design linguistic tools for language documentation
-
Case Study on Data Collection of Kreol Morisien, a Low-Resourced Creole Language [link]
David Joshen Bastien, Vijay Prakash Chumroo, Johan Patrice Bastien
2022, IST-Africa Conference
-
MorisienMT: A Dataset for Mauritian Creole Machine Translation [paper, dataset]
Raj Dabre, Aneerav Sukhoo
2022
-
Krik: First Steps into Crowdsourcing POS tags for Kréyòl Gwadloupéyen [link]
Alice Millour, Karën Fort
2018
-
JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset [link]
Ruth-Ann Armstrong, John Hewitt, Christopher Manning
2022
-
Krik: First Steps into Crowdsourcing POS tags for Kréyòl Gwadloupéyen [link]
Alice Millour, Karën Fort
2018
-
How to Parse a Creole: When Martinican Creole Meets French [link]
Ludovic Mompelat, Daniel Dakota, Sandra Kübler
2022
-
Kreol Morisien to English and English to Kreol Morisien Translation System using Attention and Transformer Model [link]
Zaheenah Boodeea, Sameerchand Pudaruth
2020
-
Automatic Speech Recognition and Query By Example for Creole Languages Documentation [link, code]
Cécile Macaire, Didier Schwab, Benjamin Lecouteux, Emmanuel Schang
2022
Guadeloupean & Mauritian Creoles
-
On Language Models for Creoles [link]
Heather Lent, Emanuele Bugliarello, Miryam de Lhoneux, Chen Qiu, Anders Søgaar
2021, Conference on Computational Natural Language Learning
-
What a Creole Wants, What a Creole Needs [link]
Heather Lent, Kelechi Ogueji, Miryam de Lhoneux, Orevaoghene Ahia, Anders Søgaard
2022
-
Ancestor-to-Creole Transfer is Not a Walk in the Park [link]
Heather Lent, Emanuele Bugliarello, Anders Søgaard
2022
-
African Substrates Rather Than European Lexifiers to Augment African-diaspora Creole Translation [link]
Nathaniel Romney Robinson, Nathaniel Romney Robinson, Matthew Dean Stutzman, Stephen D. Richardson, David R Mortensen
2023
-
Une grammaire formelle du créole martiniquais pour la génération automatique [link]
Pascal Vaillant
2003