Giter VIP home page Giter VIP logo

Comments (5)

LinguList avatar LinguList commented on July 27, 2024

Yes, but two vowel analysis would require them to do so in their orthography profile. This is where it has to happen. CLTS is just saying: if you place them already into one sound slot, please do not do so. I mean, otherwise, it would be a diphthong, like from_i to_i, which is also not good, right?

from clts.

cormacanderson avatar cormacanderson commented on July 27, 2024

I agree that the orthography profile is the best place for these things to be determined. A problem with having this as an alias is diagnosis: when I was checking through the list of characters in PHOIBLE, or this time around in IE-CoR, I didn't notice it: it's very difficult when you are eyeballing an orthography profile of over 1000 lines and it isn't flagged. It was only later, looking through the inventories in https://digling.org/phonobank/, that I noticed it (this shows, btw, how useful the browser is as a tool).

I think that it should be a general principle that when a character (combination) is typically used for more than one meaning we should not normalise it or have it as an alias. We should force people to specify a value in the othography profile. This will stop errors like the ones we find from occurring. Here e.g. ee can be either or e.e and we don't know which one it is, so we shouldn't automatically parse it as one rather than the other. We had this recently also with ł , which is similarly ambiguous as sometimes people use it for ɫ (i.e. ), sometimes for ɬ.

In one of the examples I give above, the original was e.g. uu̯, which is indeed a kind of diphthong, although I agree it's weird because it's homorganic. I think I would be in favour of allowing things like this though: especially for the purposes of an aligned dataset, I think it makes a lot of sense. It's a bit weird, but in the Arabic case I gave in the example above, we can be pretty sure that this is what the original author intended.

It is certainly better than the u̯ː which we get and which I actually think should not be allowed at all. This is semantically pretty much identical to , which feels to me to be quite a long way from uu̯. Is there some way we can stop "long" and the "non-syllabic" diacritic occurring together?

from clts.

LinguList avatar LinguList commented on July 27, 2024

from clts.

cormacanderson avatar cormacanderson commented on July 27, 2024

I think this is a different case than the voiced aspirates, because there is a genuine ambiguity here that makes the semantics of VV unclear, rather than just a convention that uses a non-principled character (e.g. ) to stand for something slightly different.

My preference would be to remove the aliases and I will go ahead and remove the double vowel aliases now from vowels.tsv.

As for allowing combinations of two identical vowels to occur as diphthongs, I think probably that this is the principled approach, particularly if one of them is clearly marked as non-syllabic, e.g. uu̯. I am very happy to help out with any checks needed in coding if you let me know how I can contribute there.

As a corollary to what I propose above, I will happily set up a condition that explicitly blocks a long non-syllabic vowel, e.g. u̯ː. I think w is more principled for something like this.

from clts.

LinguList avatar LinguList commented on July 27, 2024

from clts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.