Giter VIP home page Giter VIP logo

Comments (6)

bmcfee avatar bmcfee commented on August 22, 2024 1
  1. The two counterarguments I see are: I understand from looking at the source how a mutable list makes this so much easier to implement; and, if this isn't a standard case, it's easy enough to lambda-map the results into a hashable type after and then do this.

I'm leaning toward leaving it as is. Tuples, being immutable, can be a little unwieldy for a lot of the things we want to use values outputs for (eg slicing down to a fixed vocab).

2. Thoughts on adding a fill_value field to the method's interface? In my case, only positive intervals are labeled, and so I get back empty lists where there is no range. It'd be great to backfill the null class at sample time, and at first blush this seems like an easy feature ... the only issue I see is, what default parameter would give the current result

I like this in theory, but as you say, the API for it seems awkward, especially when you consider that it should be consistent across all namespaces. You could do it in two steps by having a flag to control backfill, and a separate fill_value parameter to handle the data itself.

from jams.

ejhumphrey avatar ejhumphrey commented on August 22, 2024

yea, flag + fill_value seems a little unwieldy? it's not terrible to do these things on the user side .. i'm happy to punt for now, and if this ends up becoming a more common use case / pattern, we can figure it out then.

from jams.

bmcfee avatar bmcfee commented on August 22, 2024

No, but you raise a valid point about the semantics of annotation sampling.

It's presently written from the perspective of positive-only annotations, and null/empty labels are only generated by sampling if there's an observation to that effect. This is the most conservative form of sampling, and it's not incorrect per se, but it's also not exactly what you want when integrating with sklearn (or whatever) where every input should have an output.

from jams.

bmcfee avatar bmcfee commented on August 22, 2024

Resurfacing this one too see if anyone's perspective has changed. Should we try to implement a fill parameter? Or leave it as is?

It might be possible to check the fill value against the namespace schema at runtime, but that might get ugly moving forward if we unify all the namespace schemas into one master schema going forward.

Alternately, we could just not validate fill values.

from jams.

urinieto avatar urinieto commented on August 22, 2024

My 2 cents: given that the current implementation returns a list of lists, instead of fill_value, we could have empty_values as an argument. This would represent whatever you want to do with empty values, with [] as default. It'd be the same thing, but semantically makes potentially more sense.

And then, I would simply not validate these custom empty values, let the user take care of it if needed.

from jams.

bmcfee avatar bmcfee commented on August 22, 2024

And then, I would simply not validate these custom empty values, let the user take care of it if needed.

I guess that's valid. If a user supplies a bad fill value, that's on them.

So to recap, here's the current logic:

  1. Generate an array of sample positions
  2. Initialize a value for each sample position as an empty list. (Repeat for confidences.)
  3. For each observation, get its value, and append it to each list corresponding to a sample time that lands within the observed interval. (Repeat for confidences.)

The reason for all the list hackery is that observations can overlap, so the to_samples definition for a value at time t is the union of value fields. Any sample that's outside of any labeled interval retains the empty list as its values.

The proposed change would allow a user to change this by providing a list of default values that it initializes with instead of the empty list. In writing this up, I see two problems with this idea that had escaped my attention before:

  1. Are fill values retained for non-empty samples? Or do we only fill when the output would be otherwise empty? The former is easier programmatically, but the latter might be more what the user would expect. I really don't know here.
  2. What do we do with confidences? Another parameter with the same kind of logic?

I'm beginning to think this not worth implementing. It's easy enough for a user to post-process the values array as follows:

values = ann.to_samples(...)
for v in values:
    if not v:
        v.extend(default_values)
# and repeat for confidences

and then get on with their life. I think I prefer this solution over trying to implement something general-purpose that leads to awkward and confusing API decisions.

from jams.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.