Giter VIP home page Giter VIP logo

Comments (14)

grst avatar grst commented on September 27, 2024 1

I met with @jpivarski and @agoose77 yesterday.

We came to the conclusion that we might not need any copy-on-write behavior for views of awkward arrays, because a slice of an awkward array is always a (shallow) copy anyway. There's therefore no risk of modifying the original awkward array when setting a record on an awkward array within an AnnDataView. We could therefore get rid of all the custom code around awkward array views which is, ultimately, a hack that abuses custom behaviors for something they were not designed for.

Only downside is that AnnDataView behaves differently for awkward arrays than for other backends. Updating the awkward array will not trigger an init_as_actual.

In any case I'll first come up with a bunch of test cases to make sure everything works as expected.

from anndata.

agoose77 avatar agoose77 commented on September 27, 2024

I took another look at

try:
from ..compat import awkward as ak
import weakref
# Registry to store weak references from AwkwardArrayViews to their parent AnnData container
_registry = weakref.WeakValueDictionary()
_PARAM_NAME = "_view_args"
class AwkwardArrayView(_ViewMixin, AwkArray):
@property
def _view_args(self):
"""Override _view_args to retrieve the values from awkward arrays parameters.
Awkward arrays cannot be subclassed like other python objects. Instead subclasses need
to be attached as "behavior". These "behaviors" cannot take any additional parameters (as we do
for other data types to store `_view_args`). Therefore, we need to store `_view_args` using awkward's
parameter mechanism. These parameters need to be json-serializable, which is why we can't store
ElementRef directly, but need to replace the reference to the parent AnnDataView container with a weak
reference.
"""
parent_key, attrname, keys = self.layout.parameter(_PARAM_NAME)
parent = _registry[parent_key]
return ElementRef(parent, attrname, keys)
def __copy__(self) -> AwkArray:
"""
Turn the AwkwardArrayView into an actual AwkwardArray with no special behavior.
Need to override __copy__ instead of `.copy()` as awkward arrays don't implement `.copy()`
and are copied using python's standard copy mechanism in `aligned_mapping.py`.
"""
array = self
# makes a shallow copy and removes the reference to the original AnnData object
array = ak.with_parameter(self, _PARAM_NAME, None)
array = ak.with_parameter(array, "__array__", None)
return array
@as_view.register(AwkArray)
def as_view_awkarray(array, view_args):
parent, attrname, keys = view_args
parent_key = f"target-{id(parent)}"
_registry[parent_key] = parent
# TODO: See https://github.com/scverse/anndata/pull/647#discussion_r963494798_ for more details and
# possible strategies to stack behaviors.
# A better solution might be based on xarray-style "attrs", once this is implemented
# https://github.com/scikit-hep/awkward/issues/1391#issuecomment-1412297114
if type(array).__name__ != "Array":
raise NotImplementedError(
"Cannot create a view of an awkward array with __array__ parameter. "
"Please open an issue in the AnnData repo and describe your use-case."
)
array = ak.with_parameter(array, _PARAM_NAME, (parent_key, attrname, keys))
array = ak.with_parameter(array, "__array__", "AwkwardArrayView")
return array
ak.behavior["AwkwardArrayView"] = AwkwardArrayView

It seems like the point of this array class is to intercept __setitem__ for copy-on-write semantics. Could you clarify whether my understanding is correct. If so, then we don't need the array class at all? Awkward Arrays are (mostly) immutable, as the structure of the array is stored in a separate object to the high-level interface. I'm thinking that we just need to ensure that any view produces a new outer ak.Array via ak.Array(array).

from anndata.

agoose77 avatar agoose77 commented on September 27, 2024

Tagging @ivirshup for visibility :)

from anndata.

ivirshup avatar ivirshup commented on September 27, 2024

Hey, sorry about the delayed response.

It seems like the point of this array class is to intercept setitem for copy-on-write semantics.

Yes.

Awkward Arrays are (mostly) immutable, as the structure of the array is stored in a separate object to the high-level interface. I'm thinking that we just need to ensure that any view produces a new outer ak.Array via ak.Array(array).

Yes, but... It's not just copy-on-write for the ak.Array, it's also copy on write for the parent AnnData object.

cc: @grst

from anndata.

ivirshup avatar ivirshup commented on September 27, 2024

Also @grst, for some context I believe the change mentioned here happened in awkward 2.3 and is causing test failures. I think the fix is super easy, but would appreciate your eyes on it (#1040).


@agoose77 could you also take a look at the linked PR?

from anndata.

agoose77 avatar agoose77 commented on September 27, 2024

Yes, but... It's not just copy-on-write for the ak.Array, it's also copy on write for the parent AnnData object.

@ivirshup Is this required? Would it break your invariants if accessing an ak.Array always returned a shallow copy?

from anndata.

ivirshup avatar ivirshup commented on September 27, 2024

The overall idea is that subsetting an anndata does not actually subset all of its contents until you need them. E.g. subsetting is lazy. However, we track this at the AnnData object level, not at each element level.

Since we don't want to write back updates, we just want it to act like you took a subset, we have copy on write behavior for each element of the AnnData. Since this is just tracked at the AnnData level, making one element "actual" means we have to make all elements actual. So we need to have a way for the parent anndata to know a modification has been made to a child element – hence our view classes.

Is this required? Would it break your invariants if accessing an ak.Array always returned a shallow copy?

How would you set any values on an awkward array? Or do you mean only when it's an anndata style view?

from anndata.

agoose77 avatar agoose77 commented on September 27, 2024

I can see that this is confusing! I'll reply exclusively on #1040 from now on.

from anndata.

ivirshup avatar ivirshup commented on September 27, 2024

Going to keep this open until we are happy with a solution. Current status summarized here: #1040 (comment)

from anndata.

github-actions avatar github-actions commented on September 27, 2024

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

from anndata.

grst avatar grst commented on September 27, 2024

This is still an issue. The proposed fix is in #1070, but pending some more fundamental decision on how to cache views.

from anndata.

agoose77 avatar agoose77 commented on September 27, 2024

@grst I'm a little lost with where things are! My understanding from #1035 is that we don't need to create array classes, and instead can simply shallow copy arrays where appropriate. Is that no longer the case?

from anndata.

grst avatar grst commented on September 27, 2024

@agoose77, it is still the case and this is what is proposed in #1070.

However, @ivirshup pointed out in #1070 (comment) that when creating the view only when accessing the data, it is impossible to assign any data to a record, e.g. using

v.obsm["awk"]["b"] = [5, 6]

because a new shallow copy will be created every time v.obsm["awk"] is accessed.

This is why in the most recent version I suggested to create the shallow copy already during creating of the AnnDataView, i.e. on

v = a[:2]

in the same example, which Isaac referred to as "caching" the view.

But apparently this might conflict with the proposed fix for an issue with garbage collection (#1119).

In any case, I don't think we need anything from the awkward side, but just need to find a way how to reconcile cached views with garbage collection.

from anndata.

github-actions avatar github-actions commented on September 27, 2024

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

from anndata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.