ud_xibe-xdt's People
Forkers
universaldependenciesud_xibe-xdt's Issues
Encoding of 0092
There are some sentences in the .conllu that have some strange unicode characters in, for example,
# sent_id = grammarbook_sjo_p1_102
# sent = ᠰᡠᠸᡝ ᡠᠮᡝ ᡩᡠᠷᡤᡝᠷᡝ , ᠰᡞᠨᡞ ᠶᡝᠶᡝ ᠠᠮᡤᠠᡣᡞᠨᡞ 。
# sent[phon] =
# sent[eng] = Don�t make noise, let your grandpa sleep well
1 ᠰᡠᠸᡝ _ _ _ _ _ _ _ _
2 ᡠᠮᡝ _ _ _ _ _ _ _ _
3 ᡩᡠᠷᡤᡝᠷᡝ _ _ _ _ _ _ _ _
4 , _ _ _ _ _ _ _ _
5 ᠰᡞᠨᡞ _ _ _ _ _ _ _ _
6 ᠶᡝᠶᡝ _ _ _ _ _ _ _ _
7 ᠠᠮᡤᠠᡣᡞᠨᡞ _ _ _ _ _ _ _ _
8 。 _ _ _ _ _ _ _ _
Between the n and the t of Don�t is the private use area character U+0092 <control>
. This should probably be replaced by the apostrophe, '
.
Part of speech of 'ᠪᡝ'
In the following sentence, the word ᠪᡝ
is tagged with the PART
tag. I think that absent of any indications to the contrary, it would be better to tag it with ADP
, given its syntactic function. The number of tokens tagged with PART
should be very limited, and explicit.
# sent_id = grammarbook_sjo_p1_1
# sent = ᡧᡠᠨ ᡨᡠᠴᡞᡫᡞ , ᡫᠠᠷᡥᡡᠨ ᡩᠣᠪᠣᠷᡞ ᠪᡝ ᠪᠣᡧᠣᡥᠣ 。
# sent[phon] = šun tucifi , farhvn dobori be bošoho.
# sent[eng] = The sun rises, drives the dark away.
1 ᡧᡠᠨ ᡧᡠᠨ NOUN _ Case=Nom 2 nsubj _ Translit=šun
2 ᡨᡠᠴᡞᡫᡞ ᡨᡠᠴᡞᠮᠪᡞ VERB _ Tense=Pres|VerbForm=Conv 0 root _ Translit=tucifi
3 , , PUNCT _ _ _ 2 punct Translit=,
4 ᡫᠠᠷᡥᡡᠨ ᡫᠠᠷᡥᡡᠨ ADJ _ _ 5 amod _ Translit=farhvn
5 ᡩᠣᠪᠣᠷᡞ ᡩᠣᠪᠣᠷᡞ NOUN _ _ 7 dobj _ Translit=dobori
6 ᠪᡝ ᠪᡝ PART _ Case=Acc 5 case _ Translit=be
7 ᠪᠣᡧᠣᡥᠣ ᠪᠣᡧᠣᠮᠪᡞ VERB _ Tense=Past _ 2 conj Translit=bošoho
8 。 。 PUNCT _ _ _ 7 punct Translit=.
@jonorthwash what do you think?
determine whether Verb+Verb sequences are Conv+Fin or Inf+Aux
Sequences of two verbs in a row could be a converb and a finite verb or a sequence of an infinitive and an auxiliary. Tests need to be established to determine how to distinguish the two in Sibe.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.