Giter VIP home page Giter VIP logo

udtelugu's People

Stargazers

Chengxi Li avatar

Watchers

Çağrı Çöltekin avatar James Cloos avatar Sowmya avatar Taraka Rama avatar

udtelugu's Issues

use of "case" relation

My understanding of "case" relation is that we should use it for post-positions and the likes in Telugu. UD documentation also seems to say the same: "The case relation is used for any case-marking element which is treated as a separate syntactic word (including prepositions, postpositions, and clitic case markers). Case-marking elements are treated as dependents of the noun or clause they attach to or introduce."

In some examples, e.g., 9.1,
meem iNTiki weLLEEm

iMTiki is actually the object of weLLEEM and "ki" would have been connected to "iMTi" with a "case" relation. However, since "iNTiki" is a single word here, I think the relation is not case.. but obj. There are some such examples in 9.1.

Take sentence 9: here, Sitaku is has a object relation with head, not case. I think this is the right annotation.

pronominalized verbal adjective

waccinawaaru maa annagaaru

Here, waccina is the non-finite form the verb and waaru is a 3rd person honorific suffix that causes the word in focus to act as a clause. I suppose we have to annotate the relation between waccinawaaru and annagaaru as some clausal relation.

Chapter 30 done.

There are a few issues to discuss though. Such as - how to tag reflexive pronominal expressions like తనలో తాను etc, and there is a little bit of inconsistency with these.

analyzing abstract nouns that work as adjectives

In chapter 12.13, BhK calls abstract nouns such as poDugu, tiipi as nouns.

  • atanu caala poDugu

caala is it a quantifier for poDugu? If so, then mark poDugu as NOUN. Then, its just a NP+NP construction which can be analyzed as atanu being a nsubj dependent of poDugu.
@nishkalavallabhi What do you think?

adjectives functioning as nouns

In sentences such as:
Kamala potti, adi chouka etc., these are basically "qualities" (i.e., adjective) but they are nouns because they are not "qualifying" any noun or pronoun

  • is this correct? I think we need to write this up in the documentation because, in the following 3 sentences:
    కమల పొట్టి
    కమల పొట్టిది
    కమల పొట్టి అమ్మాయి
  • The first and third sentence has the same uninflected form of పొట్టి but in the first one, it is a NOUN and in the last one, it is an ADJ.

enduku tag

Should we consistently tag enduku as DET?

Splitting words

I am somewhat uncomfortable with the current state of words such as మద్రాసునుంచి.

I think they should be cast as మద్రాసు నుంచి. I suppose we need to make a list of sentences where adpositions can occur as independent morphemes but are typed in the grammar book as a single word with NP as head. We can hand edit the sentence at the end of treebanking process. @nishkalavallabhi What do you say?

I am starting one here:

26.6.1
26.7.3
26.8.11
26.9.42

Change adjectives with pronominal suffixes to NOUN tag

As of now, chapter 13 has words such as manciwaaDu as PRON. Have to change to NOUN following the discussion with @nishkalavallabhi where NOUN is a open class whereas PRON is closed and can be problematic. Also, such maciwaaDu can inflect for genitive, dative cases and hence tagging it as NOUN is suitable.

Tagging verbal adjectives

I am tagging verbal adjectives as VERB. What is the relation between the verbal adjective and the nominal that follows?

ఇంటికి రాని అబ్బాయి

About splitting and not splitting

Now, a day since the last discussion, I am wondering if we should split the words if they are not split in the text.

e.g., waaLLu maMciwaaLLu . - waaLLu is tagged as a Pronoun and maMciwaaLLu is tagged as a noun, and it is a single word. If we see a sentence: waaLLu maMci waaLLu - perhaps we need to tag it as: Pronoun Adjective Pronoun. What do you think?

quantifiers

endaru --> how many
enta mandi --> how many

I am analyzing mandi as "clf" tag whereas endaru is which tag?

How should we treat a negation?

Some negations are tagged as verbs in the current annotations.
But are they? Since the negation words are closed class, may be we should choose AUX, or PART?
in English, they seem to consider "not" as PART (Predicate negation: not, n’t, nt) but rest seem to go into ADV (no examples given). in Tamil, "illai", the word for "ledu" in Telugu is tagged as AUX.

Additionally, I think negation should have a neg (negation modifier) relationship with the noun it is negating. (http://universaldependencies.org/docs/u/dep/neg.html) However, I don't find it in the list of relationships in the annotation interface. So, I tagged it as nmod for now.

compound nouns

I did not realize there is a "compound" dependency.
I think these "kooragayala dukaanam", "pustakala beeruva" etc should be tied by a "compound" dependency relation, not tied by a nmod relation. What do you think?

Clitics

BhK has a chapter on clitics which are mainly emphatic. I think mark these as emphatic in the morphology while following the same rules of POS tags.

clarification about numbers

Just a clarification on NUM tag usage again.

మూడు రోజులు - mudu is NUM.
మూడో రోజు - mudo is ADJ.
ఒక ఊళ్ళో - oka is NUM?
నూటికి పదిమార్కులు - nootiki, padimarkulu are both NOUNs.
ఏడువందల ఆరవై ఏడు - all the three words are numbers?
ఏడు వందల ఆరవై ఏడు - vandala is a NOUN or NUM?

Note on numerals

Numerals: are nouns generally, but adj when they appear before a noun. e.g., iddaru abbayilu.

numerals are marked using NUM tag @nishkalavallabhi and the nominal they modify gets a nummod tag

Creating markdown sheets

Create markdown sheets for pos, dep, and features. I tried to fork and do things but it seems to be a lot of manual work. Have to figure out another way

daani peru kamala - 8.11, sentence 8

in normal cases where such phrases like daani peru )Pron NN or PRON PRON or NN) appeared so far, they were like:
idi aayana kalam (aayana kalam) and (idi) are the chunks and we had dependencies as:
nsubj(kalam, aayana), nmod(aayana, idi).

However, in this example, I think the relations are:
nsubj(kamala, peru), nmod(peru, daani) and kamala and dani are not related (unlike the previous example). What do you say?

Additional Modifications to dependency relations

  • tag nmod:poss wherever required.

  • change acl to acl:relcl for relative clauses.

  • tag nmod:tmod wherever important for temporal relations.

  • Change the POS tag for participle (verbal adjective) from VERB to ADJ.

  • Change the POS tag for participle (verbal noun) from VERB to NOUN.

  • Check for consistency of abstract nouns vs. adjectives tags.

  • Change dative subject relations to nsubj:nc

@nishkalavallabhi add more things that should be done.

tagging adjectives from 13.4

  1. BhK calls pronominals such as naadi, maadi, waaridi which can take up predicate position as adjectives. We tag these as ADJ.

  2. In contrast, kottawi, nallawi, errawi are adjectives that are pronomilaized. We tag these as PRON.

Words such as biidawaaLLu are also analyzed by BhK as pronominals that function as adjective. I think that these kind of words are not different from the examples in 2 since biida is an adjective that when combined with third person plural marker behaves as PRON since waaLLu is the head of biidawaaLLu.

@nishkalavallabhi What do you think?

tappa - PART or ADV

నువ్వు తప్ప వేరెవరూ లేరు, ఒక్క ప్రాణం తప్ప. - What is tappa in these sentences? BhK calls them adverbial particle. I tagged them as PART. All instances are in 30.19

obj vs obl vs iobj

I am having troubles understanding the difference between obj, obl and iobj dependency relations. Any Telugu sentence examples with explanations? or Should I read anything specific from BhK book?

pronominal adjectives

Pronominal adjectives such as manciwaaDu, mancidi are tagged as DET in UD.
Sentences like:

  • adi caalaa mancidi.

  • maa uuru peddadi.

How to analyze them?

eemoo

Is it a particle?

Like in "emoo ceepaaDu".

ewaru - pronoun, instead of determiner

Words like ewaru (who), ekkada (where) - shouldn't they be determiners?
They are Pronouns in English, and I saw that such words (yār for who) were tagged PRON in Tamil UD too.

Tagging elaaga

ikkaDi niiLLu elaaga unnayi ?

How to tag elaaga? It does modify niiLu but also modifies unnayi as a manner adverbial. How to?

change obl_tmod to nmod_tmod

Change obl_tmod to nmod_tmod.

Sentences like
ninna ratri inTiki waccEEnu.

ninna ratri would be the nmod_tmod modifier dependent of waccEEnu

Dative subject

Marking dative subject in Telugu. Example:
naaku oka ruupaayi kaawaali

naaku is first person singular + dative. Is it a nsubj or a obj?

@viswanath says it is a nsubj whereas Hindi Treebank marks it as obj.

INTJ vs PART

What should we tag as PART?

Words like: kadA, gadA, kAdA, kadU - BhK calls them question particles in Chapter 24 on Clitics.
However, they are not restricted to questions. They can be used as interjections as well (e.g., అవును కదా!!)

Question particles are a part of PART in UD, but what about when they are used as INTJ? Should we just have them as PART and have a closed list of INTJ words?

Listing pronominal words, and the PronType

UD has this statement while talking about Determiners
http://universaldependencies.org/u/pos/DET.html
http://universaldependencies.org/u/overview/morphology.html#pronominal-words
"Ideally, language-specific documentation should list pronominal words and their category. These are all closed classes so it should not be difficult."

I think we should come up with some consensus on which "Pronoun" should be called a pronoun and which should be called a "determiner" with some "pronType". They are closed class words, so we should be able to form these guidelines relatively easily. Should we start a doc in the guidelines about this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.