Comments (7)
Reading up on the Tlingit orthography on Wikipedia (https://en.wikipedia.org/wiki/Tlingit_alphabet), it appears ʼ
(\u02BC) would be more appropriate, the ejective stop character (https://en.wikipedia.org/wiki/Dental_and_alveolar_ejective_stops).
Panphon does not know it, though, so I'm guessing in tli-ipa -> eng-ipa, we should manually map \u02BC to the full glottal stop, so that we get HH
in eng-arpabet.
from g2p.
Hi @joanise , thanks for this. Yes, you're right, it marks ejectives - frustratingly, I believe periods mark glottal stops. Ejectives should get mapped to their plain unvoiced stop equivalents, so k\u02BC
should go to k
. Panphon does recognize the character, but will just remove it in the generate-mapping step:
>>> import panphon.distance
>>> dst = panphon.distance.Distance()
>>> dst.weighted_feature_edit_distance("kʼ", "k")
0.125
I suppose the thing to do would be either to replace all '
with \u02BC
, or to add individual rules for each ejective (if we thought '
might be used legitimately in punctuation). I think the individual rules option might be the best.
from g2p.
Ah, I see! thanks for the explanations. We need to have ' somewhere in the g2p, because otherwise the tokenizer will not consider it a word character. g2p generate-mapping
does not like having \u02BC in a rule on its own, though. I think a reasonable solution is to have a rule for each ejective plosive, as you suggest, in tli-ipa -> eng-ipa, and I would do it manually rather than through generate-mapping
, for simplicity's sake.
As for the tokenizer, maybe the simplest solution is to have the no-op rule ','
in tli_equiv.csv
mapping '
to itself. Then the tokenizer would include the ASCII '
in the alphabet it's looking for.
from g2p.
I take it back, g2p generate-mapping
was quite happy to generate this rule:
{
"in": "\u02bc",
"out": "",
"context_before": "",
"context_after": ""
}
and maybe that's just the simple solution we should implement, and add the rule(s) mapping '
to \u02BC
as appropriate in tli_to_ipa.csv
.
from g2p.
About the dot mapping to the glottal stop, how does the language mark it's end of sentence punctuation? Is there ambiguity?
from g2p.
Next question, can the glottal stop occur at the end of a word? I could disambiguate by saying only map the . if it's not word-final.
from g2p.
Fixed by PR #82
from g2p.
Related Issues (20)
- Update the screen shots in https://roedoejet.github.io/g2p/latest/studio/ HOT 1
- Update the blog post for g2p 2.0 HOT 2
- g2p-studio current can't map "hello" correctly from eng to eng-arpabet HOT 1
- version selector doesn't seem to recognize version aliases HOT 2
- running g2p convert on "large" files takes forever... HOT 5
- g2p convert added blank line at end. HOT 2
- `g2p generate-mapping` gives useless and cryptic warning for null outputs HOT 2
- `g2p generate-mapping` generates incorrect configuration HOT 3
- g2p-studio can't handle more than one word in English
- Add unit test to make sure schema is up to date. HOT 2
- Update contact e-mail address in both versions of the API
- Put text size limit on non-tokenizing api/v2 convert calls
- Mapping.as_is has been deprecated for a long time, we should be able to delete it
- normalize "ό" in mohawk HOT 1
- Inconsistent behavior in moh_equiv HOT 1
- Failing on python 3.8 HOT 4
- panphon >= 0.21 may give better mappings
- Revamp the schema update tests
- RAM usage: replace networkx elements used by the web API server
- RAM usage: find a way to memory map the English lexicon
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from g2p.