Giter VIP home page Giter VIP logo

Comments (37)

jcklie avatar jcklie commented on May 29, 2024

Do you have a typesystem xml which I can use for that?

from dkpro-cassis.

zesch avatar zesch commented on May 29, 2024

You mean for testing purposes?

I think it would be the job of the other library to provide the initializer. Although for DKPro, Cassis could also provide it directly :)

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

The idea with the initializer is a bit more than just load type system X.

The idea is that the initializer patches the CAS instance with additional methods, e.g.

cas.get_tokens()
cas.get_tokens_as_text()
cas.get_sentences()
cas.get_sentences_as_text()
cas.get_pos_tags()
cas.get_named_entities()
...

... and that we could e.g. have an for DKPro Core and another one for say cTAKES and both would patch the CAS with the same convenience methods but internally resorting to different select statements.

The initializer would work like a visitor, e.g.

Cas(DKProCoreTypeSystem()) triggers a call to DKProCoreTypeSystem.apply(cas)).

@jcklie such a thing works with Python, right?

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

We can provide these methods, I am not sure about the implementation though. My question was:Is there some official DKPro typesystem XML which I can use or can you provide me with some Java Code to generate it to keep it in sync with DKPro?

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

The "best" solution for this would probably be to use DKPro Meta :)

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

Well, basically what you do is create a Maven Project which has a dependency on all dkpro-core-api-** modules and then call

        TypeSystemDescription dkproCoreTS = TypeSystemDescriptionFactory
                .createTypeSystemDescription();
        try (FileOutputStream out = new FileOutputStream("target/dkpro-core-aggregated-ts.xml")) {
            dkproCoreTS.toXML(out);
        }

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

I would implement it as extending Cas, the constructor loads the DKPro sype system then. Simple and not so magic.

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

But then we'd end up having to import CAS from different libraries...

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

I would add it to cassis, so it is from cassis import DKProCas. DKPro to me is an important enough part of the UIMA world to add it to cassis itself.

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

For the moment, I don't feel very comfortable with this. I don't like the idea of the CAS becoming something new just because it contains certain types. The idea of the CAS is that it is a generic data structure. If we subclass it for a particular framework, I feel it goes against this idea.

Actually, the strategy you have shown me OTR for the Pandas accessors looked nice. It makes very clear that there is one generic data structure and there are separately different ways of accessing it.

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

Is it ok if we implement this in cassis or should it be part of pydkpro?

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

I understand that IDEs may not support auto-complete for such extensions. But I wonder if IDEs like PyCharm really only do static code analysis or also consider whether a method has actually been called somewhere before. E.g. if I call method x.foo() once and later I type y.f... (where y is of the same type as x), then it would be reasonable to offer foo() in the auto complete (without documentation at least) - I wonder if there are hints one can provide to the IDEs to fine-tune the autocomplete, e.g. for scenarios like the extension methods suggested here.

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

Pycharm offers some auto completion based on what was called before (the typing is limited then) and there are stub files where you can maybe add more information: https://mypy.readthedocs.io/en/latest/stubs.html . But it does not know that there is an extension, as it is added at run time (except when I just add it as a field to cassis and throw an error if it is not compatible).

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

The idea of involving cassis came to me because I though we should/could pass the type system "strategy" to the constructor - i.e. cassis would somehow have to understand the strategy and react to it. If we use a completely different mechanism which does not require cassis to be aware of the mechanism, it could be done elsewhere.

A compromise between subtyping and adding dynamically might be a generic type (if such a thing is possible?), e.g.

cas = CAS[DKPro_Core]()
cas.access <= must return an instance of the generic type, e.g. DKPro_Core
cas.access.XXX <= IDE could theoretically know which methods the generic type provides

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

I think we need features from Python 3.8 for that and even then I am unsure. So what we have now is:

  1. Use the pandas extensions style and have no type hints, let pydkpro implement this. Other people can add nice cas extensions
  2. Hardcode dkpro, ctakes and more as cas extensions into cassis so that we have type support. Throw an error if the Cas does not conform when using these
  3. Why not both

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

I think this issue contains two things, the DKPro type system and extension. I will track the type system stuff in #9.

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

I did some quick and dirty script to convert a typesystem XMI to Python classes for the DKPro Core type system. One can get type hints for the wrapped CAS, the accessor and does not need to redefine all cas methods:

image

image

The code basically is

class DKProAccessor:

    def __init__(self, cas: Cas):
        self._cas = cas

    def __getattr__(self, name: str):
        """ If the method is not found on the accessor, then we just delegate to the cas. """
        return getattr(self._cas, name)

    def get_tokens(self) -> Iterator[Token]:
        return self._cas.select("de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token")

    def get_named_entities(self) -> Iterator[NamedEntity]:
        return self._cas.select("de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity")


def build_dkpro_cas() -> Union[Cas, DKProAccessor]:
    cas = Cas(typesystem=load_dkpro_core_typesystem())
    dkpro = DKProAccessor(cas)
    return dkpro

I can write a decorator for init and __getattr__ so that these are added automatically to extensions.

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

I do not know whether I want to keep the type hints for the extension methods, but I like how to define extensions.

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

So the IDE dynamically evaluates the DKProAccessor to discover the fields?

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

We tell the IDE that build_dkpro_cas can either return a cas or an accessor.

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

How does the IDE know that e.g. Token has the field form? I don't see anything in your code that would do that?

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

I generate type descriptions Python code from the XML. If you have a fixed type system, then you can do that and check the generate python code in your source control. I will later push the code for that; this issue should maybe focus on the extension only.

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

Generating classes from the type system description - so a "pycasgen" - an equivalent of the "jcasgen" we have in Java which generates Java classes from the type system. Why not? :)

I think such a "pycasgen" script could be part of cassis and projects like DKPro Core or cTAKES could pre-generate the classes and push them to pypi as separate packages. WDYT?

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

We can do that. My question right now is where to put the extensions, I like to have them in cassis itself, as they are related to CAS/XMI stuff. Also, I need them sometimes for my own code and dont want to install pydkpro just for the extensions and types.

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

If by extensions you mean e.g. the generated types - I think these should be released separately and with the same version numbers as the corresponding DKPro Core / cTAKES / etc versions. They do not follow the same release cycle as cassis.

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

I mean the dkpro/ctakes accessor and util functions that were requested.

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

@zesch @aggarwalpiush WDYT? Type-system-specific accessors and Python classes generated from type systems should probably be kept together and have a release cycle mirroring the release cycle of the type system they mirror. Have them as a separate project under DKPro already now (which I think would be nice since we could already make use of them in INCEpTION)? Have them with your pipelining code later?

from dkpro-cassis.

zesch avatar zesch commented on May 29, 2024

Not sure I really understand the implications. Whatever works best on your side.

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

I would create a new repository and Python package dkpro-typeshed where we add the extension methods and generated types to get a nice API. This would then only depend on cassis. pykdkpro then can use it to make its API nicer. We use a seperate package in order to track the dkpro version and respective types new/different types.

from dkpro-cassis.

zesch avatar zesch commented on May 29, 2024

Sounds good

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

We have various DKPro projects and they all have different release cycles. I think the type system is generated for a particular version of a particular project. Thus having a single repo where all generated types are located doesn't seem sensible to me. We would always have to release all types at the same time and it would be impossible for users to choose a version combination they would care for. I think having a type companion repo for each DKPro project would make sense, e.g. dkpro-core-python-api and dkpro-keyphrases-python-api etc.

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

This sounds like a lot of work and maintenance nightmare, right now it also works without (type unsafe in the same way the raw Java cas interface has no safety and type information). So I would then just add the accessor which returns the right FeaturesStructures but gives no IDE support, i.e. changing

def get_tokens(self) -> Iterator[Token]:
    return self._cas.select("de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token")

to

def get_tokens(self) -> Iterator[FeatureStructure]:
    return self._cas.select("de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token")

as a first step.

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

This sounds like a lot of work and maintenance nightmare

What's a maintenance nightmare?

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

Having a repo for each would mean to set up many repositories and pypi packages. I would rather not do that right now.

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

We only need to set up one for DKPro Core. I even thought about putting the generated Python classes directly into the "dkpro-core" repository along with all the Java stuff. But considering that the Python stuff is still "young", we might care to refine/release it more often than the Java stuff, so it might have a faster release cycle (e.g. "2.0.0, then 2.0.0.1 because we fix a bug in the code generator, then 2.0.0.2 because we fix another bug, etc.").

from dkpro-cassis.

reckart avatar reckart commented on May 29, 2024

I have added a repo here and you should all have proper access to it: https://github.com/dkpro/dkpro-core-python-api

We can still rename it / move around things later if we decide to change anything. For now, we'll only create types for DKPro Core anyway.

from dkpro-cassis.

jcklie avatar jcklie commented on May 29, 2024

I will come back to this after the ACL deadline.

from dkpro-cassis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.