phylostar / udtelugu Goto Github PK
View Code? Open in Web Editor NEWUniversal Dependency Tagging for Telugu
License: Apache License 2.0
Universal Dependency Tagging for Telugu
License: Apache License 2.0
My understanding of "case" relation is that we should use it for post-positions and the likes in Telugu. UD documentation also seems to say the same: "The case relation is used for any case-marking element which is treated as a separate syntactic word (including prepositions, postpositions, and clitic case markers). Case-marking elements are treated as dependents of the noun or clause they attach to or introduce."
In some examples, e.g., 9.1,
meem iNTiki weLLEEm
iMTiki is actually the object of weLLEEM and "ki" would have been connected to "iMTi" with a "case" relation. However, since "iNTiki" is a single word here, I think the relation is not case.. but obj. There are some such examples in 9.1.
Take sentence 9: here, Sitaku is has a object relation with head, not case. I think this is the right annotation.
waccinawaaru maa annagaaru
Here, waccina is the non-finite form the verb and waaru is a 3rd person honorific suffix that causes the word in focus to act as a clause. I suppose we have to annotate the relation between waccinawaaru and annagaaru as some clausal relation.
idi deeni muuta
I analyzed deeni as the root which is wrong I think. What does @nishkalavallabhi think?
chapter 26 POS tagging finished.
There are a few issues to discuss though. Such as - how to tag reflexive pronominal expressions like తనలో తాను etc, and there is a little bit of inconsistency with these.
11.1 sentence 8
In chapter 12.13, BhK calls abstract nouns such as poDugu, tiipi as nouns.
caala is it a quantifier for poDugu? If so, then mark poDugu as NOUN. Then, its just a NP+NP construction which can be analyzed as atanu being a nsubj dependent of poDugu.
@nishkalavallabhi What do you think?
In sentences such as:
Kamala potti, adi chouka etc., these are basically "qualities" (i.e., adjective) but they are nouns because they are not "qualifying" any noun or pronoun
chapter 29 to go....
Should we consistently tag enduku as DET?
If there is a sentence like "adi cowka" (It is cheap), should we consider it as a noun? or adj?
I am somewhat uncomfortable with the current state of words such as మద్రాసునుంచి.
I think they should be cast as మద్రాసు నుంచి. I suppose we need to make a list of sentences where adpositions can occur as independent morphemes but are typed in the grammar book as a single word with NP as head. We can hand edit the sentence at the end of treebanking process. @nishkalavallabhi What do you say?
I am starting one here:
26.6.1
26.7.3
26.8.11
26.9.42
As of now, chapter 13 has words such as manciwaaDu as PRON. Have to change to NOUN following the discussion with @nishkalavallabhi where NOUN is a open class whereas PRON is closed and can be problematic. Also, such maciwaaDu can inflect for genitive, dative cases and hence tagging it as NOUN is suitable.
I am tagging verbal adjectives as VERB. What is the relation between the verbal adjective and the nominal that follows?
ఇంటికి రాని అబ్బాయి
Now, a day since the last discussion, I am wondering if we should split the words if they are not split in the text.
e.g., waaLLu maMciwaaLLu . - waaLLu is tagged as a Pronoun and maMciwaaLLu is tagged as a noun, and it is a single word. If we see a sentence: waaLLu maMci waaLLu - perhaps we need to tag it as: Pronoun Adjective Pronoun. What do you think?
endaru --> how many
enta mandi --> how many
I am analyzing mandi as "clf" tag whereas endaru is which tag?
ekkuwa, takkuwa
I am tagging them as adjectives.
Some negations are tagged as verbs in the current annotations.
But are they? Since the negation words are closed class, may be we should choose AUX, or PART?
in English, they seem to consider "not" as PART (Predicate negation: not, n’t, nt) but rest seem to go into ADV (no examples given). in Tamil, "illai", the word for "ledu" in Telugu is tagged as AUX.
Additionally, I think negation should have a neg (negation modifier) relationship with the noun it is negating. (http://universaldependencies.org/docs/u/dep/neg.html) However, I don't find it in the list of relationships in the annotation interface. So, I tagged it as nmod for now.
Finished POS tagging for coordination chapter.
I did not realize there is a "compound" dependency.
I think these "kooragayala dukaanam", "pustakala beeruva" etc should be tied by a "compound" dependency relation, not tied by a nmod relation. What do you think?
BhK has a chapter on clitics which are mainly emphatic. I think mark these as emphatic in the morphology while following the same rules of POS tags.
Just a clarification on NUM tag usage again.
మూడు రోజులు - mudu is NUM.
మూడో రోజు - mudo is ADJ.
ఒక ఊళ్ళో - oka is NUM?
నూటికి పదిమార్కులు - nootiki, padimarkulu are both NOUNs.
ఏడువందల ఆరవై ఏడు - all the three words are numbers?
ఏడు వందల ఆరవై ఏడు - vandala is a NOUN or NUM?
Numerals: are nouns generally, but adj when they appear before a noun. e.g., iddaru abbayilu.
numerals are marked using NUM tag @nishkalavallabhi and the nominal they modify gets a nummod tag
Create markdown sheets for pos, dep, and features. I tried to fork and do things but it seems to be a lot of manual work. Have to figure out another way
Add nmod:poss to annotation.
in normal cases where such phrases like daani peru )Pron NN or PRON PRON or NN) appeared so far, they were like:
idi aayana kalam (aayana kalam) and (idi) are the chunks and we had dependencies as:
nsubj(kalam, aayana), nmod(aayana, idi).
However, in this example, I think the relations are:
nsubj(kamala, peru), nmod(peru, daani) and kamala and dani are not related (unlike the previous example). What do you say?
tag nmod:poss wherever required.
change acl to acl:relcl for relative clauses.
tag nmod:tmod wherever important for temporal relations.
Change the POS tag for participle (verbal adjective) from VERB to ADJ.
Change the POS tag for participle (verbal noun) from VERB to NOUN.
Check for consistency of abstract nouns vs. adjectives tags.
Change dative subject relations to nsubj:nc
@nishkalavallabhi add more things that should be done.
BhK calls pronominals such as naadi, maadi, waaridi which can take up predicate position as adjectives. We tag these as ADJ.
In contrast, kottawi, nallawi, errawi are adjectives that are pronomilaized. We tag these as PRON.
Words such as biidawaaLLu are also analyzed by BhK as pronominals that function as adjective. I think that these kind of words are not different from the examples in 2 since biida is an adjective that when combined with third person plural marker behaves as PRON since waaLLu is the head of biidawaaLLu.
@nishkalavallabhi What do you think?
నువ్వు తప్ప వేరెవరూ లేరు, ఒక్క ప్రాణం తప్ప. - What is tappa in these sentences? BhK calls them adverbial particle. I tagged them as PART. All instances are in 30.19
@nishkalavallabhi @Viswanathaa @coltekin
I added all sentences from BhK's grammar book. The statistics are as followed.
I suppose it is a decent treebank.
We need to add compound:redup in deprel.
http://coltekin.net/brat/#/telugu/chapter_11.9?focus=sent~1
How should this be analyzed?
I am having troubles understanding the difference between obj, obl and iobj dependency relations. Any Telugu sentence examples with explanations? or Should I read anything specific from BhK book?
Pronominal adjectives such as manciwaaDu, mancidi are tagged as DET in UD.
Sentences like:
adi caalaa mancidi.
maa uuru peddadi.
How to analyze them?
Is it a particle?
Like in "emoo ceepaaDu".
How to mark Emiti? Is it DET or NOUN?
I mark it as Determiner.
UniversalDependencies/docs#401
examples are kharcu peTTu, penDLi ceyyi. Here, it is a compund of NOUN + VERB. @viswanath pointed that it is better to have a compound dependency with VERB as head. This is what is known as light verb construciton in Turkish, Persian, and Japanese.
Words like ewaru (who), ekkada (where) - shouldn't they be determiners?
They are Pronouns in English, and I saw that such words (yār for who) were tagged PRON in Tamil UD too.
iwweela diipaawali. How to analyze this sentence?
@nishkalavallabhi
ikkaDi niiLLu elaaga unnayi ?
How to tag elaaga? It does modify niiLu but also modifies unnayi as a manner adverbial. How to?
Change obl_tmod to nmod_tmod.
Sentences like
ninna ratri inTiki waccEEnu.
ninna ratri would be the nmod_tmod modifier dependent of waccEEnu
Marking dative subject in Telugu. Example:
naaku oka ruupaayi kaawaali
naaku is first person singular + dative. Is it a nsubj or a obj?
@viswanath says it is a nsubj whereas Hindi Treebank marks it as obj.
What should we tag as PART?
Words like: kadA, gadA, kAdA, kadU - BhK calls them question particles in Chapter 24 on Clitics.
However, they are not restricted to questions. They can be used as interjections as well (e.g., అవును కదా!!)
Question particles are a part of PART in UD, but what about when they are used as INTJ? Should we just have them as PART and have a closed list of INTJ words?
@nishkalavallabhi refers iwi as PRON. But, I think it is DET. Have to resolve...
What is the relation between the elements in a sentence such as:
ii puwwulu kottawi
aa gaayakuDu manciwaaDu
maa uuru peddadi
UD has this statement while talking about Determiners
http://universaldependencies.org/u/pos/DET.html
http://universaldependencies.org/u/overview/morphology.html#pronominal-words
"Ideally, language-specific documentation should list pronominal words and their category. These are all closed classes so it should not be difficult."
I think we should come up with some consensus on which "Pronoun" should be called a pronoun and which should be called a "determiner" with some "pronType". They are closed class words, so we should be able to form these guidelines relatively easily. Should we start a doc in the guidelines about this?
Check sentences with iobj.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.