Giter VIP home page Giter VIP logo

Comments (12)

reckart avatar reckart commented on June 5, 2024

Hi @aggarwalpiush :)

The issue is that cassis is meant to be a generic type-system agnostic library. I.e. it should support any UIMA type system. In fact, we have users which use e.g. the cTAKES type system and may not work at all with DKPro Core. So we would need some way of

@jcklie and I have beed throwing around a number of ideas, e.g.

  • passing a strategy to the constructor of the CAS constructor which would monkey-patch the CAS instance and add convenience methods: cas = CAS(DKPro_Core); cas.get_tokens() - but there would be no IDE auto-completion support
  • using some kind generic typing, e.g. cas = CAS(); cas.$(DKPro_Core).get_tokens() - where $ would be a method returning the type passed to it as an argument; but apparently Python doesn't support this kind of trick (Java does) and there would be no auto-completion in the IDE
  • an extension mechanism like Pandas has it; but again no auto-complete support
  • simply using static functions: import dkpro_core.accessors; get_tokens(cas)- at least some IDE auto-complete support, but not necessarily a nice API
  • subclassing the CAS: cas = DKProCoreCAS()- has IDE auto-complete support, but honestly I don't like it because IMHO it doesn't separate concerns sufficiently. E.g. what if you want to use a CAS object with different type systems, e.g. DKPro Core plus you own type system. Nah...
  • wrapping the CAS with an accessor which implements the same interface as the CAS: cas = DKPro_Core(CAS()); cas.get_tokens() - has IDE auto-completion support and also you could wrap the same CAS object with different accessors if you wanted to work with multiple type systems

... so the wrapper approach seems to us the most promising one for the moment. Also, cassis doesn't need to be extended to support it.

That said ...

cas.select(TOKEN).as_text()

This is something which I think would be really nice to have.

from dkpro-cassis.

reckart avatar reckart commented on June 5, 2024

Hi @aggarwalpiush :)

The issue is that cassis is meant to be a generic type-system agnostic library. I.e. it should support any UIMA type system. In fact, we have users which use e.g. the cTAKES type system and may not work at all with DKPro Core. So we would need some way of

@jcklie and I have beed throwing around a number of ideas, e.g.

  • passing a strategy to the constructor of the CAS constructor which would monkey-patch the CAS instance and add convenience methods: cas = CAS(DKPro_Core); cas.get_tokens() - but there would be no IDE auto-completion support
  • using some kind generic typing, e.g. cas = CAS(); cas.$(DKPro_Core).get_tokens() - where $ would be a method returning the type passed to it as an argument; but apparently Python doesn't support this kind of trick (Java does) and there would be no auto-completion in the IDE
  • an extension mechanism like Pandas has it; but again no auto-complete support
  • simply using static functions: import dkpro_core.accessors; get_tokens(cas)- at least some IDE auto-complete support, but not necessarily a nice API
  • subclassing the CAS: cas = DKProCoreCAS()- has IDE auto-complete support, but honestly I don't like it because IMHO it doesn't separate concerns sufficiently. E.g. what if you want to use a CAS object with different type systems, e.g. DKPro Core plus you own type system. Nah...
  • wrapping the CAS with an accessor which implements the same interface as the CAS: cas = DKPro_Core(CAS()); cas.get_tokens() - has IDE auto-completion support and also you could wrap the same CAS object with different accessors if you wanted to work with multiple type systems

... so the wrapper approach seems to us the most promising one for the moment. Also, cassis doesn't need to be extended to support it.

That said ...

cas.select(TOKEN).as_text()

This is something which I think would be really nice to have.

from dkpro-cassis.

zesch avatar zesch commented on June 5, 2024

Wouldn't that also be type system specific?

cas.select(TOKEN).as_text() # token.getCoveredText()
cas.select(LEMMA).as_text() # lemma.getValue()

from dkpro-cassis.

reckart avatar reckart commented on June 5, 2024

If we imagine TOKEN and LEMMA to be type name string constants - no.

from dkpro-cassis.

jcklie avatar jcklie commented on June 5, 2024

How would cassis know what feature to use for as_text()?

from dkpro-cassis.

jcklie avatar jcklie commented on June 5, 2024

In Python, one would normally just use a list comprehension for that, e.g.

values = [x.value for x in cas.select(LEMMA)]

from dkpro-cassis.

reckart avatar reckart commented on June 5, 2024

For as_text(), we would use get_covered_text(), not a feature value.

from dkpro-cassis.

zesch avatar zesch commented on June 5, 2024

This would somewhat diminish the usefulness, as many types beyond token would not return useful results. If we use an accessor, couldn't it decide to return different feature values depending on the type?

from dkpro-cassis.

reckart avatar reckart commented on June 5, 2024

It probably could, but it could be confusing. E.g. if as_text() returns the covered text for tokens but say the entity type for entities, I would find that confusing. How would I get the covered text of an entity? If you wanted to introduce a convenience accessor for "the most commonly used feature value", I would find it sensible for it to have a different name, e.g. as_value() - this could e.g. return the "value" feature for named entities (instead of the "identifier" feature) or the "PosValue" feature for POS tags (instead of the "CoarseValue").

from dkpro-cassis.

zesch avatar zesch commented on June 5, 2024
  1. There should be a way to access feature values of annotations.
  2. I would find it confusing if cas.select(TOKEN).as_text() and cas.select(POS).as_text() would return the same values (as they would do now, right?)

from dkpro-cassis.

reckart avatar reckart commented on June 5, 2024

There is a way to access feature values, e.g. as @jcklie illustrated:

values = [x.value for x in cas.select(LEMMA)]

x.value reads the feature value on the feature structure x. You can also write to the feature x.value = "value".

Right now, as_text() does not exist. cas.select(XXX) returns a "Generator", i..e not a list - so evaluation is lazy. That is why we currently cannot easily add methods to it - we can also not easily figure out if the result is none-empty. We have been looking e.g. at https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.peekable or considered to return a list ... no final decision for the time being. I think it would be good if cas.select(xxx) returned something we can define methods on - some kind of lazily evaluated iterable maybe to allow eventually mirroring the UIMAv3 select API - or at least do a Pythonista version of it.

from dkpro-cassis.

jcklie avatar jcklie commented on June 5, 2024

I will track the extension mechanism in #83 and the extension methods you want here so that we do not mix up the issues.

from dkpro-cassis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.