scverse / squidpy Goto Github PK

Spatial Single Cell Analysis in Python

Home Page: https://squidpy.readthedocs.io/en/stable/

License: BSD 3-Clause "New" or "Revised" License

Python 99.92% Shell 0.08%

single-cell-rna-seq single-cell-genomics spatial-transcriptomics spatial-analysis image-analysis data-visualization squidpy

squidpy's Introduction

Squidpy - Spatial Single Cell Analysis in Python

Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools that leverages the spatial coordinates of the data, as well as tissue images if available.

Visit our documentation for installation, tutorials, examples and more.

Squidpy is part of the scverse project (website, governance) and is fiscally sponsored by NumFOCUS. Please consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.

Manuscript

Please see our manuscript Palla, Spitzer et al. (2022) in Nature Methods to learn more.

Squidpy's key applications

Build and analyze the neighborhood graph from spatial coordinates.
Compute spatial statistics for cell-types and genes.
Efficiently store, analyze and visualize large tissue images, leveraging skimage.
Interactively explore anndata and large tissue images in napari.

Installation

Install Squidpy via PyPI by running:

pip install squidpy
# or with napari included
pip install 'squidpy[interactive]'

or via Conda as:

conda install -c conda-forge squidpy

Contributing to Squidpy

We are happy about any contributions! Before you start, check out our contributing guide.

squidpy's People

Contributors

Stargazers

Watchers

squidpy's Issues

unit test and cleaning spatial_tools/graph/nhood.py

tutorial on insights of spatial graph

notebook on developed functionalities, explaining how to use them and analysis highlights

feature_evaluation grid search

Description

The cropping function in manipulate.py and get_image_feature.py file now take **kwargs for easy passing of multiple cropping parameters to perform grid search. Remaining is the evaluation script in hs_feature_evaluation.ipynb.

Add function to compute cluster quality score for grid search comparisons
finish the evaluation loop in hs_feature_evaluation.ipynb computing features, clusters and cluster quality score for each crop setting. See notebook for more detail.

NOTE: most likely one need multithreading for feasability of all features

Counting by edges inflating z-scores for high number of n_rings.

The function permtest_leiden_pairs works well for nodes so far. However, it reports inflated number of observed edges between leiden pairs when n_rings is 2 or higher. This can be visually inspected in visium cases where the top selected leiden pairs actually separated by many nodes.

spatial_connectivity(adata, n_rings=2)
permtest_leiden_pairs(adata, count_option='edges',
                              print_log_each=25, n_permutations=n_permutations)

I can attach images for this issue. It needs to be however tackled to allow the mode edges to be accepted.

unit test and cleaning spatial_tools/image/tools.py

define setup file

would like to know which packages are optional and should go into functions and which are not

unit test and cleaning spatial_tools/graph/clustering.py

calculate features from image crop

choose sensible features to calculate e.g. from https://scikit-image.org/docs/dev/api/skimage.feature.html
Interesting features to start with could be these ones:
- https://scikit-image.org/docs/0.7.0/api/skimage.feature.texture.html
- Various statistics on channel images (mean, std, percentiles etc.)
implement functions calculating features from image crop

is normalization (e.g. histogram normalization) needed ?

Multi threading of feature calculation

Implement a multi-threaded/processed version of the feature calculation code.

Description:

Current function "get_features_abt.py" in ./spatial_tools/image/tool.py accumulates feature results from multiple spot ids into one data frame by looping trough the spot_ids and adding them sequentially.
For certain features, e.g. HOG, this takes to long. Currently in the notebook hs_extract_image_features_table-multi-threading.ipynb there is a multi-threaded version implemented but its not does improve speed much.

Possible reasons for not working:

currently all threads/processes are accessing the same image to extract crop and features in parallel. This might cause queuing times. A possible solution would be to pre-compute the crops and then apply parallelization.

clustering accounting for spatial coordinates

Not very clear idea, but something along these lines: https://www.biorxiv.org/content/10.1101/2020.09.04.283812v1
Maybe a way to achieve similar results without explicit modelling and inference. It's essentially a smoothing of cluster assignments on spatial coordinates.

tutorial / analysis of image features

write a clean tutorial on how to use the implemented features
analysis: combine genes + features / compare clusters, overlap, etc. Look if we can find interesting things in the data with the new image features!

unit test and cleaning spatial_tools/image/_utils.py

calculate features for all spots in one image

desired output: .obsm or layer in adata object.
consider efficiency / parallelisation?

relates image features to discrete annotation (e.g. clusters) or continuous annotations (genes)

@LuckyMD point during GM: provide ways to relate image features to discrete annotation, such as clusters, found in either gene expression space or image feature space, or continuous annotations, such as genes. Ways that this could be done:

correlates e.g. marker genes with image features @LouisK92
regression framework: Y= AX + B where X are image features, and Y are discrete or continuous annotations.
others

unit test and cleaning spatial_tools/graph/build.py

Assortativity measures and others

measures of assortativity, also called homophily, which evaluates similarity of connections based on node attributes see networkx docs

evaluation of different features

Some ideas:

Without ground truth: look at how good a clustering based on the features is (eg. silhouette score?)
With ground truth: Use gene expression space as GT (e.g. different cell types in brain have different morphology), and compare feature clustering to cell type clusters

Account for cavities in crops and spatial graph

basically, if you extract features on a crop larger than spot sizes, how do you account for crop that doesn't have tissue anymore?
Also, for spatial graph the same, maybe sometime the cavity is a blood vessel and you want to keep the spots close to each other

unit test and cleaning spatial_tools/graph/ppatterns.py

implement methods for segmenting cells / nucleii from histopathological tissue images / fluorescence images

Benefit: knowledge of cell count in each spot is very important for simulation for deconvolution, and for analysing data in general. On top of cell segmentations could calculate shape / size statistics as additional features

look into different methods for cell segmentation:
- startdist GitHub - mpicbg-csbd/stardist: StarDist - Object Detection with Star-convex Shapes
- watershed https://scikit-image.org/docs/dev/auto_examples/segmentation/plot_watershed.html
calculate features based on cell segmentation

apply feature extraction functions to seqfish data

make sure that functions implemented for visium make sense for other modalities as well
evaluate image features for seqfish

extract image crop centered around spot

allow crop to have variable size
allow crop to have variable scale
optional masking of crop with spot circle

test_get_image_features is failing

the errors occurs in this line: spot_diameter = adata.uns['spatial'][dataset_name]['scalefactors']['spot_diameter_fullres']
Its failing because the dummy adata does not have `.uns['spatial'] containing the spot_diamteres.

Function to build graph from spatial coordinates

Core function to compute adjancency matrix from spatial coordinates array.

It should account for the type of spatial data (hex Visium spots or general spatial coordinates like FISH/IMC).
It should also allow flexibility to select the number of neighborhood (for e.g. visium, the number of surrounding circles to the spot, for others the total number of nhoods).

You can look at an initial prototype from Isaac here: scverse/scanpy@c117508

However, we might want to incorporate a notion of radius (in pixel coordinates) that would express better distances in space.

improvement graphs

Agenda 1-10

Points to be discussed

images:
- features?
spatial graph:
- Integrate cellphoneDb/omnipath to neighborhood enrichment to provide ligand receptor pair in a straighforward way (Giovanni)
- Ripley function relies on Astropy: include it as rquirements or re-implement from scratch ?
- Clustering with spatial coordinate: remove or adapt?
plotting:
- move outside of both graph and image?
General:
- Include other people?

Permutation based test

Permutation based test as in histoCAT to compute neighborhood enrichment based on clusters in gene expression space. For reference, see paper

Ripley measure

see #10

Other measures of nhood enrichment analysis

Find and implement other measures of nhood enrichment (e.g. Ripley's K value, from here )

Also ideas from previous @LuckyMD works are very interesting: https://academic.oup.com/bioinformatics/article/34/6/994/4590025

Additional features for pl.spatial function

Additional function parameters / changed functionality / changed defaults?
New analysis tool: A simple analysis tool you have been using and are missing in sc.tools?
New plotting function: A kind of plot you would like to see in sc.pl?
External tools: Do you know an existing package that should go into sc.external.*?
Other?

Here's some features that we should add after the scatterplot module is refactored:

Add scalebar on the tissue image as suggested on Twitter .
Change default to “plot hires if available, else lowres if available, else nothing” .
Add alpha smoothing for continuous features as in Seurat .

I'll modify this issue if other ideas pop up.

unit test and cleaning spatial_tools/utils.py

Logging

Shall we have a specific logging class like scanpy
or standard logging should do the job?

package API

should we define an API of operations that can be run on adata objects? at least in images there is mostly array manipulating functions right now that could be wrapped for this or which could have their own API?

efficient extraction of crops for all spots in one image

Questions:

which data structure to use?
How to pass data between this step and the feature calculation / the cell segmentation?

Napari for visualizing images

Look at Napari for visualizing images: https://github.com/napari/napari

dask for crop extraction on disk

figure out how to use xarray on image on disk so that dask can load individual crops into memory

Agenda 8-10

roadmap

we now have a roadmp, check it out in projects

development practices:

is it good time for typing?
remember to format with black and use pylint
how does it work with counting contributions in the repo? seems like whoever squash and merge is counted as contributor, and not who actually wrote the code. Does it make sense? Shall we always make the author of the PR merge?
need to discuss about push rights
need to discuss about common dataset to be stored in repo for testing and CI

next steps

discuss next tasks for everyone

centralise plotting function for both graph and image

usage of plotting functionalities will look like:
spatial-tools.pl.plotting_function(...) / st.pl.plotting_functions(...)

scaling of image features

Problem: Distributions of different features extracted from images might have very different value magnitudes. This will introduce biases etc. in downstream analyses (e.g. clustering).

Ideally we can scale all features between 0 and 1. This might be tricky: If a feature has natural and reasonable min, max it's easily done. For other features we could set 0 and 1 according the observed min, max values (not ideal imo). Happy to discuss further.

kernel based nuclei detector for segmentation-free segment detection

try scikit learn blob extractor
use this for cel counting
use this fr labelling

change sample data for test

https://github.com/theislab/spatial-tools/blob/bcd6e7f3940b7fa9acf42ca3e1a3dbe3297de009/spatial_tools/tests/test_graph_nhood.py#L7

@AnnaChristina just a note, but would be best to not use visium_sge data but some dummy data (like generated with numpy.random with random seeds for test. Could you modify that if possible? no rush.
I went with this but maybe not optimal for you: def get_dummy_data():

unit test and cleaning spatial_tools/image/manipulate.py

statistic on node type connections in graph

when looking further into graph measures and metrics, I saw this networkx function:
https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.assortativity.node_attribute_xy.html#networkx.algorithms.assortativity.node_attribute_xy

would be quite easy to extract the clusters, i.e. node types which are often connected in the graph to maybe conclude something on cell-cell-interactions

e.g. you can use this information to also check if specific genes show higher expressions if those cell types are close to each other

image plotting functions

which plotting functions do we need for the images?

plot feature (from adata.obsm) on image
- for downscaled pngs this already works with scanpy
plot overlays on image (e.g. segmentation)
plot image crops (with features / overlays)
- useful to speed up plotting or view details

for interactive plotting we could use napari, but I think that some static plotting functions would be useful as well. Then this would behave in a very similar way to scanpy, making it easy for the users to adopt.

Agenda 24-09

Points to be discussed today with the group @hspitzer :

package organisation
- new package api - @davidsebfischer comment in the API-Issue #39
- cleanup / merging of scripts (_utils.py vs tools.py) + notebook organisation (naming convention + separate folders for devel and finished tutorials)
programming best practices
- settle on one way of doing docstrings
- write tests
- develop in feature branches + create PRs.
make sure everybody knows what they can/should be working on.

Immediate next steps (for everyone):

adapt to new api, cleanup files
document + test

functional API cropping

i think cropping should be possible on adata object, ie creating new img entries, so that cropping can be varied in functional API workflows, eg load->crop->segment->uncrop->plot. I have a uncrop function on the segmentation branch already, we just need to wrap these in functions that sit on adata

spatial-tools 0.0.1 requires scikit-image>=0.17.1, but you'll have scikit-image 0.15.0 which is incompatible.

probably not ideal