Giter VIP home page Giter VIP logo

Comments (7)

joanise avatar joanise commented on September 15, 2024

Reading up on the Tlingit orthography on Wikipedia (https://en.wikipedia.org/wiki/Tlingit_alphabet), it appears ʼ (\u02BC) would be more appropriate, the ejective stop character (https://en.wikipedia.org/wiki/Dental_and_alveolar_ejective_stops).
Panphon does not know it, though, so I'm guessing in tli-ipa -> eng-ipa, we should manually map \u02BC to the full glottal stop, so that we get HH in eng-arpabet.

from g2p.

roedoejet avatar roedoejet commented on September 15, 2024

Hi @joanise , thanks for this. Yes, you're right, it marks ejectives - frustratingly, I believe periods mark glottal stops. Ejectives should get mapped to their plain unvoiced stop equivalents, so k\u02BC should go to k. Panphon does recognize the character, but will just remove it in the generate-mapping step:

>>> import panphon.distance
>>> dst = panphon.distance.Distance()
>>> dst.weighted_feature_edit_distance("kʼ", "k")
0.125

I suppose the thing to do would be either to replace all ' with \u02BC, or to add individual rules for each ejective (if we thought ' might be used legitimately in punctuation). I think the individual rules option might be the best.

from g2p.

joanise avatar joanise commented on September 15, 2024

Ah, I see! thanks for the explanations. We need to have ' somewhere in the g2p, because otherwise the tokenizer will not consider it a word character. g2p generate-mapping does not like having \u02BC in a rule on its own, though. I think a reasonable solution is to have a rule for each ejective plosive, as you suggest, in tli-ipa -> eng-ipa, and I would do it manually rather than through generate-mapping, for simplicity's sake.

As for the tokenizer, maybe the simplest solution is to have the no-op rule ',' in tli_equiv.csv mapping ' to itself. Then the tokenizer would include the ASCII ' in the alphabet it's looking for.

from g2p.

joanise avatar joanise commented on September 15, 2024

I take it back, g2p generate-mapping was quite happy to generate this rule:

    {
        "in": "\u02bc",
        "out": "",
        "context_before": "",
        "context_after": ""
    }

and maybe that's just the simple solution we should implement, and add the rule(s) mapping ' to \u02BC as appropriate in tli_to_ipa.csv.

from g2p.

joanise avatar joanise commented on September 15, 2024

About the dot mapping to the glottal stop, how does the language mark it's end of sentence punctuation? Is there ambiguity?

from g2p.

joanise avatar joanise commented on September 15, 2024

Next question, can the glottal stop occur at the end of a word? I could disambiguate by saying only map the . if it's not word-final.

from g2p.

joanise avatar joanise commented on September 15, 2024

Fixed by PR #82

from g2p.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.