Giter VIP home page Giter VIP logo

Comments (10)

dmort27 avatar dmort27 commented on July 17, 2024 1

These clearly are bugs—bugs which should have been caught by my tests. Let me look into this.

from epitran.

dmort27 avatar dmort27 commented on July 17, 2024 1

These are both due, to some degree, to the same problem (a rule introduced by a PR I should have vetted more carefully). I think I have it fixed, but I have to do more testing.

from epitran.

dmort27 avatar dmort27 commented on July 17, 2024

I have uploaded a new version of Epitran to PyPI. I have fixed the bugs you mentioned. However, I believe that there are other bugs in the German modules (dealing, for example, with vowel length). If you are willing to check this out, I will try to fix them.

from epitran.

trenslow avatar trenslow commented on July 17, 2024

I just checked through a decent amount of examples and it looks like the /s/ is fixed for the environments I mentioned above.

I went through some examples with the vowel length, it's ok when there's an /h/ in the orthography. But it has a bug when it comes to the letter /ß/. Here are a couple examples (epi2 was instantiated with 'deu-Latn-nar'):

In [21]: epi2.transliterate('Busse')
Out[21]: 'busə'
In [22]: epi2.transliterate('Buße')
Out[22]: 'busə'

In [13]: epi2.transliterate('Massen')
Out[13]: 'masən'
In [14]: epi2.transliterate('Maßen')
Out[14]: 'masən'

Here in both pairs, the second should have a long vowel.

Here are a couple example in the other direction:

In [25]: epi2.transliterate('so')
Out[25]: 'zoː'
In [26]: epi2.transliterate('also')
Out[26]: 'alsoː'
In [30]: epi2.transliterate('nanu')
Out[30]: 'naːnuː'

In the first two, I'd expect the final vowel to be short and in the last one I would expect the first vowel to be short. I'd also expect the /s/ to be transcribed as [z] in the second example.

Thanks for all your help so far. Let me know if you need some more examples and I can go digging!

from epitran.

trenslow avatar trenslow commented on July 17, 2024

A couple more examples that might be helpful:

In [33]: epi2.transliterate('kreativ')
Out[33]: 'kʁeaːtif'
In [34]: epi2.transliterate('sozial')
Out[34]: 'zoːt͡sial'
In [37]: epi2.transliterate('Lokomotive')
Out[37]: 'loːkomoːtifə'

In these examples, the last vowel is the one that should be long, and the others short.

In [35]: epi2.transliterate('platzen')
Out[35]: 'plaːt͡sən'
In [36]: epi2.transliterate('knöpfen')
Out[36]: 'knøːp͡fən'
In [38]: epi2.transliterate('Knochen')
Out[38]: 'knoːxən'
In [40]: epi2.transliterate('stricken')
Out[40]: 'ʃtʁiːkən'

In these examples, all the initial vowels should be short, as they are followed by a consonant cluster. This follows the same rule that vowels before double written consonants are always short, but German doesn't double /z/, /k/, /ch/, /pf/ in orthography, instead opting for /tz/, /ck/, /ch/ and /pf/.

from epitran.

dmort27 avatar dmort27 commented on July 17, 2024

Thank you. This is helpful. Is vowel length in German better stated as lengthening or shortening?

from epitran.

dmort27 avatar dmort27 commented on July 17, 2024

The most helpful way of sharing this information would be in terms of tests: <input, correct_output> pairs. For what I gather, the right pairs for the examples you provided would be as follows:

Busse → busə
Buße → buːsə
Massen → masən
Maßen → maːsən
so → zo # I'm confused about so and also; The is in a open syllable, so shouldn't it be long?
also → alzo
nanu → nanuː
kreativ → kʁeatiːf # These, too, are a little confusing. Can you give me a rule?
sozial → zot͡siaːl
Lokomotive → lokomotiːfə
platzen → plat͡sən # The following all make sense to me
knöpfen → knøp͡fən
Knochen → knoxən
stricken → ʃtʁikən

Any more examples you could provide would be good, as well as rules that describe why vowels are long or short in a particular context. Thanks!

from epitran.

trenslow avatar trenslow commented on July 17, 2024

Thank you. This is helpful. Is vowel length in German better stated as lengthening or shortening?

In the literature they talk more often about a tense/lax distinction, which is conflated with the long/short distinction, as short, tense vowels occur so infrequently. You'd then have tense(long) vowels as the default, with a 'laxing'(shortening) process triggered by the different orthographic contexts.

so → zo # I'm confused about so and also; The is in a open syllable, so shouldn't it be long?

You're right about so that it should be long. My mistake there. But the /o/ should definitely be short in also This makes the rule generalization a little harder. The more I think about it, there seems to be a lot of exceptions to the rules. My gut feeling tells me that since the stress falls on the /a/, the /o/ is not 'allowed' to be long.

This same rule could then apply to kreativ, sozial and Lokomotive, since the stress falls on the last syllable. This seems to align with what this document is saying.

Now that I think about it, a lot of exceptions to the rules could be explained by the frequency of the word's occurrence in daily speech, but I guess that transliteration logic is out of the scope of Epitran.

As I continue with my research and come across more interesting cases, I'll report back asap.

from epitran.

dmort27 avatar dmort27 commented on July 17, 2024

Sorry to have dropped this. It seems as if the situation in German is not unlike that of English—there are tense and lax vowels; the tense vowels are long and the lax vowels are short—except that the correlation is imperfect in German. Is this correct? I was working with sources that described the German distinction in terms of length rather than vowel quality, but I'd be willing change how this works in Epitran if you can point me to the literature I should follow.

from epitran.

trenslow avatar trenslow commented on July 17, 2024

Hi @dmort27 sorry for the late response. I got caught up with other topics and am finally now making the rounds back to German transliteration.

According to sections 1.3 and 1.4 in the document in the last comment I made, it seems like the tense/lax distinction is often conflated with the long/short distinction, which is different than English because you can have a long, lax vowel and a short, tense vowel (if my memory isn't failing me.). In the document, there can be short, tense vowels, but no long, lax vowels in German (section 1.4). What's interesting to me is that all the examples of short, tense vowels they give are words of foreign origin, but I can't seem to find a rule anywhere saying this is the case across the board.

I also stumbled across an interesting post here which would explain the cases where the short-vowel-before-double-consonant rule doesn't apply. You can see it in the second comment in the link. Even though I'm not sure how he came to the conclusion that the 'd' in Mond is a suffix, the idea that if a syllable's coda is long, then the nucleus is short and vice-versa is a better rule than simply relying on the orthography.

from epitran.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.