Giter VIP home page Giter VIP logo

beidasinitic's People

Contributors

chrzyki avatar fredericblum avatar lingulist avatar macyl avatar simongreenhill avatar wu-urbanek avatar xrotwang avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

beidasinitic's Issues

Orthography profile: is i/j i/j always an error ?

Today, as we were working on the sound comparison study in lexitools, we encountered an error with the form "i/j i/j u ⁴²" (Meixian), resulting in alignments problems. To solve it, you updated the orthography profile, adding ^jiu j iu

Now when playing with the threshold, I encountered another similar case, causing the same error: "i/j i/j ɛ ŋ ²¹". I am guessing that it may be another version of the same problem ? This is from the following entry:

Yangjiang-771_itchy-1,,Yangjiang,771_itchy,jiɛŋ²¹,jiɛŋ²¹,i/j i/j ɛ ŋ ²¹,,Cihui,,,,,癢

which was originally row 29518 of words.tsv. The string jiɛ matches 9 times in that file, two of which are word initial. We find "i/j i/j" only twice in cldf/forms.csv, so it looks like only the initial jiɛ result in problems. I would guess that the solution is to add a rule ^jiɛ j iɛ in the profile, but I could be wrong.

What do you think ?

[SEALLABLE] errors

Length Errors

Number Segments Structure Examples
1 s ŋ̍ ⁵¹/¹¹ i t 1
2 m ŋ̍ ³³ i t 1
3 h ŋ ³³ i t 1
4 ʈʂ n ²¹³ i t 1
5 tsʰ ŋ̍ ²⁴ i t 1
6 k ŋ̍ ⁵⁵ i t 3
7 tsʰ ŋ̍ ⁵⁵ i t 1
8 m ŋ̍ ³³/²⁴ i t 2
9 kʰ ŋ̍ ⁵⁵ i t 1
10 p ŋ̍ ³³ i t 2
11 ts ŋ̍ ⁵¹/¹¹ i t 1
12 n ŋ̍ ³³ i t 1
13 h ŋ̍ ⁵⁵ i t 2
14 m ŋ̍ ²⁴ i t 2
15 ts ŋ̍ ⁵⁵ i t 1
16 ts ŋ̍ ⁵⁵/⁵¹ i t 2
17 ts ŋ̍ ³³/⁵⁵ i t 1
18 s ŋ̍ ⁵⁵ i t 4
19 s u/w æ̃ i ? 1
20 kʰ ŋ̍ ¹¹ i t 3
21 tʰ ŋ̍ ¹¹ i t 2
22 s ŋ̍ ³³/⁵⁵ i t 2
23 n n̩ ⁴² i t 1
24 t ŋ̍ ²⁴ i t 3
25 s n ²¹/⁴⁴ i t 1
26 m m̩ ³⁴ i t 1
27 t ŋ̍ ⁵¹/¹¹ i t 1
28 ŋ ŋ̍ ⁵¹ i t 1
29 ŋ ŋ̍ ²¹/⁵³ i t 1
30 ŋ ŋ̍ ⁵² i t 1
31 n ŋ̍ ⁵¹ i t 1
32 tʰ ŋ̍ ⁵⁵ i t 1
33 tʰ ŋ̍ ³³/⁵⁵ i t 1
34 k ŋ̍ ¹¹ i t 1
35 tʰ ŋ̍ ²⁴ i t 1
36 h ŋ̍ ³³ i t 1
37 k j ã i ? 1
38 ŋ ŋ̍ ⁰ i t 1
39 s ŋ̍ ⁵¹ i t 1
40 k ŋ̍ ⁵¹ i t 1
41 n n̩ ²²/³¹ i t 1
42 n n̩ ²¹³ i t 3

Missing Values

Number Segments Structure Examples
1 ŋ̍ ²⁴ ? t 11
2 i/j ? 15
3 m̩ ³³/²⁴ ? t 2
4 ŋ̍ ³¹ ? t 13
5 m̩ ²²/²⁴ ? t 2
6 m̩ ⁵¹ ? t 2
7 m̩ ⁵³ ? t 2
8 ŋ̍ ⁰ ? t 54
9 m̩ ¹² ? t 6
10 m̩ ²¹ ? t 9
11 m̩ ¹¹/³³ ? t 4
12 m̩ ¹²/¹¹ ? t 3
13 ŋ̍ ⁵²/⁴⁴ ? t 2
14 s n ɔ ŋ ⁵²/⁴⁴ i ? n c t 1
15 tʰ h õ ⁴¹ i ? ? t 1
16 ŋ̍ ²¹/³¹ ? t 7
17 ŋ̍ ³³/²⁴ ? t 2
18 ŋ̍ ¹² ? t 5
19 ŋ̍ ⁵⁵ ? t 1
20 ŋ̍ ³³/³¹ ? t 1
21 m a ? ? 1
22 ts ə ? ? 2
23 x n ə ⁵⁵ i ? ? t 1
24 ts ɿ ? ? 2
25 m̩ ⁴⁴ ? t 1
26 m̩ ⁴² ? t 1
27 m̩ ³³ ? t 2
28 m̩ ³⁵ ? t 1
29 ŋ̍ ⁴⁴ ? t 3
30 z ɔ ? ? 1
31 ts ɛ ? ? 1
32 β ei ŋ i n ? 1
33 tʃʰ ɐu ? ? 1
34 p a ? ? 1
35 ŋ̍ ²²/³¹ ? t 1
36 ŋ̍ ²³ ? t 2
37 ȵ i ? ? 1
38 x y/ɥ ɔ ŋ i m n ? 1
39 m a n i n ? 1
40 ŋ̍ ¹¹ ? t 1
41 x ẽ ? ? 1
42 ŋ̍ ²¹³/⁵⁵ ? t 1
43 n̩ ⁴¹ ? t 1
44 tʃ ɐi ? ? 1

Syllable Errors

Number Segments Structure Examples
1 u n 1
2 ɛ n 1

Value/form issues

I noticed a couple of small issues for the value/form output in CLDF:

  • there are a number of " strings in values and forms (also in the source material)
  • there are a number of split values separated by (i.e. full-width comma, unicode FF0C, not unicode 02 comma) where value == form for the CLDF output (e.g. uə²¹³,ə²¹³, tʰiəp⁴,tʰaʔ⁴, etc.)
  • unicode 2777 (❷) appears in the source material and thus also in CLDF values/forms

title in metadata.json

I updated the title, but I have not yet re-run the script.
I want to ask if the contributors should be updated to the latest format as well.

Diphthongs not accepted by CLTS

Many diphthongs found in this dataset are not accepted by CLTS:

Dataset Sound Token ID
beidasinitic ae 'u/w ae ²¹²' Hefei-719_bend-1
beidasinitic ai 's a ⁴⁴ + z ai ¹² + ȵ i/j ai ¹² + ɦ a ³¹ + ɕ y ⁴⁴';'ŋ ai ¹²' Wenzhou-48_chinesenewyearseve-1;Meixian-806_ifirstpersonsingular-1
beidasinitic au 's a n ³³ + s ɿ ²⁴ + u/w a n ⁴¹ + s au ²¹';'l i ³³ + l i/j au ⁵⁵/⁵¹ + m ĩ ²⁴' Changsha-48_chinesenewyearseve-1;Xiamen-48_chinesenewyearseve-1
beidasinitic 'm a ŋ ³¹/⁵² + m aɔ ⁴⁴' Fuzhou-43_evening-1
beidasinitic 't aə n ³³ + u ⁴¹';'tsʰ ɿ ²¹² + t aə ⁰' Changsha-40_noon-1;Hefei-330_stairsladder-1
beidasinitic ei 's a ŋ ⁴⁴ + n ei ʔ ⁴ + m a ŋ ⁴⁴/⁵² + m u/w ɔ ⁴⁴';'ʂ ɔ ²¹³ + u/w ei ⁵⁵' Fuzhou-48_chinesenewyearseve-1;Jinan-882_alittle-1
beidasinitic eu 'n eu ⁴⁴';'t eu ⁵²' Fuzhou-511_lickwithtongue-2;Fuzhou-607_mix-1
beidasinitic eɿ 'tʰ i ⁴² + t eɿ ⁰';'l u/w ə ⁴² + t eɿ ⁰' Jinan-126_hoof-1;Jinan-90_mule-1
beidasinitic iu 'j iu ⁴²';'j iu ⁴²' Meixian-887_again-1;Meixian-699_fine-1
beidasinitic 't ɑ ⁵⁵ + l iĩ/ĩ ³⁴ + iɪ ⁵⁵';'ȵ iɪ ²²/²⁴ + s ᴇ/ɛ ⁴⁴ + s ɤ ʔ ²¹' Yangzhou-48_chinesenewyearseve-2;Suzhou-48_chinesenewyearseve-1
beidasinitic oi 'k oi ʔ ⁴/²¹ + h j ũ ⁵⁵';'i/j ə m ²³/³³ + k oi ³³' Chaozhou-223_jiaaocostume-1;Chaozhou-94_capon-1
beidasinitic ou 's ou ŋ ⁴⁴/²¹³ + m u/w a ŋ ⁵²';'x ou ⁵¹ + p a n ⁰ + ʂ */a ŋ ²¹⁴' Fuzhou-312_abacus-1;Beijing-41_afternoon-1
beidasinitic ãi 'h w ãi ³³/²⁴ + t i t ³²/⁵';'pʰ ãi ⁵¹' Xiamen-890_anyhow-1;Xiamen-751_bad-1
beidasinitic ãu 'n j ãu ⁵⁵';'n j ãu ⁵⁵/⁵¹ + ã ⁵¹' Xiamen-92_cat-1;Xiamen-127_claw-1
beidasinitic õi 'm u ŋ ²¹³/⁵⁵ + t õi ³⁵';'m̩ ¹²/¹¹ + õi ⁵⁵' Chaozhou-253_buckettrap-1;Chaozhou-773_busy-1
beidasinitic õu 'ŋ e ²¹/³⁵ + h õu ⁵³' Chaozhou-892_certainly-1
beidasinitic øi 'k ɔ ʔ ³¹/²³ + n øi ŋ ⁵²';'k øi ŋ ²⁴²' Fuzhou-813_alleverybody-2;Fuzhou-904_and-1
beidasinitic ũi 'm ũi ²⁴';'tsʰ ɯ ŋ ²¹³/⁵⁵ + k ũi ¹¹' Xiamen-56_coal-1;Chaozhou-265_desk-2
beidasinitic ɐi 'n a p ⁴⁵⁴ + tʃ ɐi ²¹';'h ɐi ⁴⁵⁴ + n ɐ ŋ ²¹ + i/j + i/j ɛ ŋ ⁴⁵⁴' Yangjiang-223_jiaaocostume-1;Yangjiang-882_alittle-1
beidasinitic ɐu 'h a ²² + tʃ ɐu ³³';'i/j ɐu ²²' Guangzhou-41_afternoon-1;Guangzhou-887_again-1
beidasinitic ɑo 'i/j ɑo ²⁴ + y ²⁴' XiAn-152_potato-1
beidasinitic ɑu 'tɕ i/j a ²¹ + ŋ ɑu ⁵³';'s ɑu ⁵³ + v ei ²⁴' XiAn-223_jiaaocostume-1;XiAn-882_alittle-1
beidasinitic ɔi 'tʃ ɔi ³³';'p ɔi ⁴² + h ɛu ⁴⁴' Guangzhou-888_again-1;Meixian-468_backbackside-1
beidasinitic ɔu 'k ɔu ⁴² + tɕʰ i n ³¹';'l ɔ ²¹ + l ɔu ²¹ + pʰ ɔ ⁴⁴³' Nanchang-740_clean-1;Yangjiang-637_marryofaman-1
beidasinitic əi 's ɔ ³¹ + u/w əi ³⁴';'pʰ əi ³⁴' Yangzhou-882_alittle-1;Yangzhou-627_accompany-1
beidasinitic əu 'd əu ²²/³¹ + ȵ iɪ ³³/²⁴ + i/j ɒ ²¹/⁵¹³';'ŋ əu ³¹' Suzhou-48_chinesenewyearseve-2;Suzhou-806_ifirstpersonsingular-1
beidasinitic ɛa 'k i m ⁴⁴ + ȵ i/j ɛa ¹²' Meixian-23_thisyear-1
beidasinitic ɛu 'h a ⁴⁴ + ts u ⁴² + tʰ ɛu ¹²';'k i/j ɛu ⁴⁵⁴ + n i/j ɛ ŋ ⁴⁴³' Meixian-41_afternoon-1;Yangjiang-437_auntmotherssister-1
beidasinitic ɤɯ 'i/j ɤɯ ⁵⁵';'tɕ i/j ɤɯ ⁴²' Yangzhou-887_again-1;Yangzhou-213_alcoholicbeverage-1
beidasinitic ɵy 'x ɵy ŋ ²¹³';'s i ⁵²/²¹³ + k ɵy ʔ ²³' Fuzhou-358_alleylane-1;Fuzhou-708_angularhavingtheformofasquare-2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.