This is not an existing bug, but I'm raising it now so that we're on the same page.</p

I met with <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

I took another look at <div class="Box-hea

Tagging <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

Also <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

I can see that this is confusing! I'll reply exclusively on <a class="issue-link js-is

Future changes to Awkward Array behavior class resolution about anndata HOT 14 OPEN

agoose77 commented on September 27, 2024

Future changes to Awkward Array behavior class resolution

from anndata.

Comments (14)

grst commented on September 27, 2024 1

I met with @jpivarski and @agoose77 yesterday.

We came to the conclusion that we might not need any copy-on-write behavior for views of awkward arrays, because a slice of an awkward array is always a (shallow) copy anyway. There's therefore no risk of modifying the original awkward array when setting a record on an awkward array within an AnnDataView. We could therefore get rid of all the custom code around awkward array views which is, ultimately, a hack that abuses custom behaviors for something they were not designed for.

Only downside is that AnnDataView behaves differently for awkward arrays than for other backends. Updating the awkward array will not trigger an init_as_actual.

In any case I'll first come up with a bunch of test cases to make sure everything works as expected.

from anndata.

agoose77 commented on September 27, 2024

I took another look at

anndata/anndata/_core/views.py

Lines 202 to 257 in b2965fc

 try: 

 from ..compat import awkward as ak 

 import weakref 

 # Registry to store weak references from AwkwardArrayViews to their parent AnnData container 

 _registry = weakref.WeakValueDictionary() 

 _PARAM_NAME = "_view_args" 

 class AwkwardArrayView(_ViewMixin, AwkArray): 

 @property 

 def _view_args(self): 

 """Override _view_args to retrieve the values from awkward arrays parameters. 

  Awkward arrays cannot be subclassed like other python objects. Instead subclasses need 

  to be attached as "behavior". These "behaviors" cannot take any additional parameters (as we do 

  for other data types to store `_view_args`). Therefore, we need to store `_view_args` using awkward's 

  parameter mechanism. These parameters need to be json-serializable, which is why we can't store 

  ElementRef directly, but need to replace the reference to the parent AnnDataView container with a weak 

  reference. 

  """ 

 parent_key, attrname, keys = self.layout.parameter(_PARAM_NAME) 

 parent = _registry[parent_key] 

 return ElementRef(parent, attrname, keys) 

 def __copy__(self) -> AwkArray: 

 """ 

  Turn the AwkwardArrayView into an actual AwkwardArray with no special behavior. 

  Need to override __copy__ instead of `.copy()` as awkward arrays don't implement `.copy()` 

  and are copied using python's standard copy mechanism in `aligned_mapping.py`. 

  """ 

 array = self 

 # makes a shallow copy and removes the reference to the original AnnData object 

 array = ak.with_parameter(self, _PARAM_NAME, None) 

 array = ak.with_parameter(array, "__array__", None) 

 return array 

 @as_view.register(AwkArray) 

 def as_view_awkarray(array, view_args): 

 parent, attrname, keys = view_args 

 parent_key = f"target-{id(parent)}" 

 _registry[parent_key] = parent 

 # TODO: See https://github.com/scverse/anndata/pull/647#discussion_r963494798_ for more details and 

 # possible strategies to stack behaviors. 

 # A better solution might be based on xarray-style "attrs", once this is implemented 

 # https://github.com/scikit-hep/awkward/issues/1391#issuecomment-1412297114 

 if type(array).__name__ != "Array": 

 raise NotImplementedError( 

 "Cannot create a view of an awkward array with __array__ parameter. " 

 "Please open an issue in the AnnData repo and describe your use-case." 

 ) 

 array = ak.with_parameter(array, _PARAM_NAME, (parent_key, attrname, keys)) 

 array = ak.with_parameter(array, "__array__", "AwkwardArrayView") 

 return array 

 ak.behavior["AwkwardArrayView"] = AwkwardArrayView

It seems like the point of this array class is to intercept __setitem__ for copy-on-write semantics. Could you clarify whether my understanding is correct. If so, then we don't need the array class at all? Awkward Arrays are (mostly) immutable, as the structure of the array is stored in a separate object to the high-level interface. I'm thinking that we just need to ensure that any view produces a new outer ak.Array via ak.Array(array).

from anndata.

agoose77 commented on September 27, 2024

Tagging @ivirshup for visibility :)

from anndata.

ivirshup commented on September 27, 2024

Hey, sorry about the delayed response.

It seems like the point of this array class is to intercept setitem for copy-on-write semantics.

Yes.

Awkward Arrays are (mostly) immutable, as the structure of the array is stored in a separate object to the high-level interface. I'm thinking that we just need to ensure that any view produces a new outer ak.Array via ak.Array(array).

Yes, but... It's not just copy-on-write for the ak.Array, it's also copy on write for the parent AnnData object.

cc: @grst

from anndata.

ivirshup commented on September 27, 2024

Also @grst, for some context I believe the change mentioned here happened in awkward 2.3 and is causing test failures. I think the fix is super easy, but would appreciate your eyes on it (#1040).

@agoose77 could you also take a look at the linked PR?

from anndata.

agoose77 commented on September 27, 2024

Yes, but... It's not just copy-on-write for the ak.Array, it's also copy on write for the parent AnnData object.

@ivirshup Is this required? Would it break your invariants if accessing an ak.Array always returned a shallow copy?

from anndata.

ivirshup commented on September 27, 2024

The overall idea is that subsetting an anndata does not actually subset all of its contents until you need them. E.g. subsetting is lazy. However, we track this at the AnnData object level, not at each element level.

Since we don't want to write back updates, we just want it to act like you took a subset, we have copy on write behavior for each element of the AnnData. Since this is just tracked at the AnnData level, making one element "actual" means we have to make all elements actual. So we need to have a way for the parent anndata to know a modification has been made to a child element – hence our view classes.

Is this required? Would it break your invariants if accessing an ak.Array always returned a shallow copy?

How would you set any values on an awkward array? Or do you mean only when it's an anndata style view?

from anndata.

agoose77 commented on September 27, 2024

I can see that this is confusing! I'll reply exclusively on #1040 from now on.

from anndata.

ivirshup commented on September 27, 2024

Going to keep this open until we are happy with a solution. Current status summarized here: #1040 (comment)

from anndata.

github-actions commented on September 27, 2024

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

from anndata.

grst commented on September 27, 2024

This is still an issue. The proposed fix is in #1070, but pending some more fundamental decision on how to cache views.

from anndata.

agoose77 commented on September 27, 2024

@grst I'm a little lost with where things are! My understanding from #1035 is that we don't need to create array classes, and instead can simply shallow copy arrays where appropriate. Is that no longer the case?

from anndata.

grst commented on September 27, 2024

@agoose77, it is still the case and this is what is proposed in #1070.

However, @ivirshup pointed out in #1070 (comment) that when creating the view only when accessing the data, it is impossible to assign any data to a record, e.g. using

v.obsm["awk"]["b"] = [5, 6]

because a new shallow copy will be created every time v.obsm["awk"] is accessed.

This is why in the most recent version I suggested to create the shallow copy already during creating of the AnnDataView, i.e. on

v = a[:2]

in the same example, which Isaac referred to as "caching" the view.

But apparently this might conflict with the proposed fix for an issue with garbage collection (#1119).

In any case, I don't think we need anything from the awkward side, but just need to find a way how to reconcile cached views with garbage collection.

from anndata.

github-actions commented on September 27, 2024

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

from anndata.

Future changes to Awkward Array behavior class resolution about anndata HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	try:
	from ..compat import awkward as ak
	import weakref

	# Registry to store weak references from AwkwardArrayViews to their parent AnnData container
	_registry = weakref.WeakValueDictionary()
	_PARAM_NAME = "_view_args"

	class AwkwardArrayView(_ViewMixin, AwkArray):
	@property
	def _view_args(self):
	"""Override _view_args to retrieve the values from awkward arrays parameters.

	Awkward arrays cannot be subclassed like other python objects. Instead subclasses need
	to be attached as "behavior". These "behaviors" cannot take any additional parameters (as we do
	for other data types to store `_view_args`). Therefore, we need to store `_view_args` using awkward's
	parameter mechanism. These parameters need to be json-serializable, which is why we can't store
	ElementRef directly, but need to replace the reference to the parent AnnDataView container with a weak
	reference.
	"""
	parent_key, attrname, keys = self.layout.parameter(_PARAM_NAME)
	parent = _registry[parent_key]
	return ElementRef(parent, attrname, keys)

	def __copy__(self) -> AwkArray:
	"""
	Turn the AwkwardArrayView into an actual AwkwardArray with no special behavior.

	Need to override __copy__ instead of `.copy()` as awkward arrays don't implement `.copy()`
	and are copied using python's standard copy mechanism in `aligned_mapping.py`.
	"""
	array = self
	# makes a shallow copy and removes the reference to the original AnnData object
	array = ak.with_parameter(self, _PARAM_NAME, None)
	array = ak.with_parameter(array, "__array__", None)
	return array

	@as_view.register(AwkArray)
	def as_view_awkarray(array, view_args):
	parent, attrname, keys = view_args
	parent_key = f"target-{id(parent)}"
	_registry[parent_key] = parent
	# TODO: See https://github.com/scverse/anndata/pull/647#discussion_r963494798_ for more details and
	# possible strategies to stack behaviors.
	# A better solution might be based on xarray-style "attrs", once this is implemented
	# https://github.com/scikit-hep/awkward/issues/1391#issuecomment-1412297114
	if type(array).__name__ != "Array":
	raise NotImplementedError(
	"Cannot create a view of an awkward array with __array__ parameter. "
	"Please open an issue in the AnnData repo and describe your use-case."
	)
	array = ak.with_parameter(array, _PARAM_NAME, (parent_key, attrname, keys))
	array = ak.with_parameter(array, "__array__", "AwkwardArrayView")
	return array

	ak.behavior["AwkwardArrayView"] = AwkwardArrayView