Comments (5)
Yes, but two vowel analysis would require them to do so in their orthography profile. This is where it has to happen. CLTS is just saying: if you place them already into one sound slot, please do not do so. I mean, otherwise, it would be a diphthong, like from_i to_i, which is also not good, right?
from clts.
I agree that the orthography profile is the best place for these things to be determined. A problem with having this as an alias is diagnosis: when I was checking through the list of characters in PHOIBLE, or this time around in IE-CoR, I didn't notice it: it's very difficult when you are eyeballing an orthography profile of over 1000 lines and it isn't flagged. It was only later, looking through the inventories in https://digling.org/phonobank/, that I noticed it (this shows, btw, how useful the browser is as a tool).
I think that it should be a general principle that when a character (combination) is typically used for more than one meaning we should not normalise it or have it as an alias. We should force people to specify a value in the othography profile. This will stop errors like the ones we find from occurring. Here e.g. ee
can be either eː
or e.e
and we don't know which one it is, so we shouldn't automatically parse it as one rather than the other. We had this recently also with ł
, which is similarly ambiguous as sometimes people use it for ɫ
(i.e. lˠ
), sometimes for ɬ
.
In one of the examples I give above, the original was e.g. uu̯
, which is indeed a kind of diphthong, although I agree it's weird because it's homorganic. I think I would be in favour of allowing things like this though: especially for the purposes of an aligned dataset, I think it makes a lot of sense. It's a bit weird, but in the Arabic case I gave in the example above, we can be pretty sure that this is what the original author intended.
It is certainly better than the u̯ː
which we get and which I actually think should not be allowed at all. This is semantically pretty much identical to wː
, which feels to me to be quite a long way from uu̯
. Is there some way we can stop "long" and the "non-syllabic" diacritic occurring together?
from clts.
from clts.
I think this is a different case than the voiced aspirates, because there is a genuine ambiguity here that makes the semantics of VV
unclear, rather than just a convention that uses a non-principled character (e.g. bʰ
) to stand for something slightly different.
My preference would be to remove the aliases and I will go ahead and remove the double vowel aliases now from vowels.tsv.
As for allowing combinations of two identical vowels to occur as diphthongs, I think probably that this is the principled approach, particularly if one of them is clearly marked as non-syllabic, e.g. uu̯
. I am very happy to help out with any checks needed in coding if you let me know how I can contribute there.
As a corollary to what I propose above, I will happily set up a condition that explicitly blocks a long non-syllabic vowel, e.g. u̯ː
. I think w
is more principled for something like this.
from clts.
from clts.
Related Issues (20)
- Johansson et al. 2020: color and sound symbolism
- Nikolaev's "europhon" database
- Feature Data in Dellert's Northeuralex
- Dummy Marker Ø HOT 52
- Add ID column to sounds.tsv HOT 28
- data/metadata.json out of sync
- Invalid feature appears in sound's featuredict HOT 28
- More standard CLDF HOT 1
- Tone on syllabic consonants HOT 6
- Line 105 in BIPA diacritics lacks last columns
- Minor changes to app
- Output that does not pass our checks for devoicing: needs hard-coding
- Voiced aspirates HOT 26
- Repos content inconsistent HOT 2
- Syllabic modifier is empty in bipa/diacritics.tsv HOT 6
- Add @xachab as Contributor for the next version
- Typos in clts/pkg/transcriptionsystems/bipa/consonants.tsv for ultra-long consonants HOT 1
- AlloPhon Database and Primary Consonants HOT 2
- Postvocalic aspiration in Urarina HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clts.