Comments (14)
I met with @jpivarski and @agoose77 yesterday.
We came to the conclusion that we might not need any copy-on-write behavior for views of awkward arrays, because a slice of an awkward array is always a (shallow) copy anyway. There's therefore no risk of modifying the original awkward array when setting a record on an awkward array within an AnnDataView
. We could therefore get rid of all the custom code around awkward array views which is, ultimately, a hack that abuses custom behaviors for something they were not designed for.
Only downside is that AnnDataView behaves differently for awkward arrays than for other backends. Updating the awkward array will not trigger an init_as_actual
.
In any case I'll first come up with a bunch of test cases to make sure everything works as expected.
from anndata.
I took another look at
anndata/anndata/_core/views.py
Lines 202 to 257 in b2965fc
It seems like the point of this array class is to intercept __setitem__
for copy-on-write semantics. Could you clarify whether my understanding is correct. If so, then we don't need the array class at all? Awkward Arrays are (mostly) immutable, as the structure of the array is stored in a separate object to the high-level interface. I'm thinking that we just need to ensure that any view produces a new outer ak.Array
via ak.Array(array)
.
from anndata.
Tagging @ivirshup for visibility :)
from anndata.
Hey, sorry about the delayed response.
It seems like the point of this array class is to intercept setitem for copy-on-write semantics.
Yes.
Awkward Arrays are (mostly) immutable, as the structure of the array is stored in a separate object to the high-level interface. I'm thinking that we just need to ensure that any view produces a new outer ak.Array via ak.Array(array).
Yes, but... It's not just copy-on-write for the ak.Array
, it's also copy on write for the parent AnnData object.
cc: @grst
from anndata.
Also @grst, for some context I believe the change mentioned here happened in awkward 2.3 and is causing test failures. I think the fix is super easy, but would appreciate your eyes on it (#1040).
@agoose77 could you also take a look at the linked PR?
from anndata.
Yes, but... It's not just copy-on-write for the ak.Array, it's also copy on write for the parent AnnData object.
@ivirshup Is this required? Would it break your invariants if accessing an ak.Array
always returned a shallow copy?
from anndata.
The overall idea is that subsetting an anndata does not actually subset all of its contents until you need them. E.g. subsetting is lazy. However, we track this at the AnnData
object level, not at each element level.
Since we don't want to write back updates, we just want it to act like you took a subset, we have copy on write behavior for each element of the AnnData
. Since this is just tracked at the AnnData
level, making one element "actual" means we have to make all elements actual. So we need to have a way for the parent anndata to know a modification has been made to a child element – hence our view classes.
Is this required? Would it break your invariants if accessing an ak.Array always returned a shallow copy?
How would you set any values on an awkward array? Or do you mean only when it's an anndata style view?
from anndata.
I can see that this is confusing! I'll reply exclusively on #1040 from now on.
from anndata.
Going to keep this open until we are happy with a solution. Current status summarized here: #1040 (comment)
from anndata.
This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!
from anndata.
This is still an issue. The proposed fix is in #1070, but pending some more fundamental decision on how to cache views.
from anndata.
@grst I'm a little lost with where things are! My understanding from #1035 is that we don't need to create array classes, and instead can simply shallow copy arrays where appropriate. Is that no longer the case?
from anndata.
@agoose77, it is still the case and this is what is proposed in #1070.
However, @ivirshup pointed out in #1070 (comment) that when creating the view only when accessing the data, it is impossible to assign any data to a record, e.g. using
v.obsm["awk"]["b"] = [5, 6]
because a new shallow copy will be created every time v.obsm["awk"]
is accessed.
This is why in the most recent version I suggested to create the shallow copy already during creating of the AnnDataView, i.e. on
v = a[:2]
in the same example, which Isaac referred to as "caching" the view.
But apparently this might conflict with the proposed fix for an issue with garbage collection (#1119).
In any case, I don't think we need anything from the awkward side, but just need to find a way how to reconcile cached views with garbage collection.
from anndata.
This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!
from anndata.
Related Issues (20)
- Saving .h5ad with pd.Series in .uns results in IORegistryError HOT 3
- `read_elem_as_dask` HOT 8
- `FileNotFoundError: [Errno 2] No such file or directory: 'llvm-config'` on CI HOT 2
- Upcoming Scipy 1.13 incompatible with file-backed sparse matrices
- CZI files' `var` count changd
- `/` in column names makes AnnData Zarr object unreadable on windows HOT 12
- String indexes in var lead to UserWarning HOT 2
- Move tests out of the package
- Refactor `BaseCompressedSparseDataset` and subclasses to remove `backed_{csr,csc}_matrix`
- concat_on_disk should use int64 indptrs by default so that it can concatenate large files HOT 4
- TypeError: Can't implicitly convert non-string objects to strings HOT 4
- Error concatenating scRNA with visium dataset
- Error: Expected a python object, received a character HOT 1
- memory usage of concat HOT 3
- Using `Pint` for units HOT 9
- UMAP of gene subset of adata HOT 2
- Error using sc.pl.stacked_violin HOT 1
- NotImplementedError with concat_on_disk
- Issue when setting anndata.X to numpy array HOT 3
- Does anndata.raw can be automatically modified after define it?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anndata.