Giter VIP home page Giter VIP logo

squidpy's Introduction

PyPI Downloads Documentation Coverage Discourse Zulip NumFOCUS

Squidpy - Spatial Single Cell Analysis in Python

Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools that leverages the spatial coordinates of the data, as well as tissue images if available.

Visit our documentation for installation, tutorials, examples and more.

Squidpy is part of the scverse project (website, governance) and is fiscally sponsored by NumFOCUS. Please consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.

NumFOCUS logo

Manuscript

Please see our manuscript Palla, Spitzer et al. (2022) in Nature Methods to learn more.

Squidpy's key applications

  • Build and analyze the neighborhood graph from spatial coordinates.
  • Compute spatial statistics for cell-types and genes.
  • Efficiently store, analyze and visualize large tissue images, leveraging skimage.
  • Interactively explore anndata and large tissue images in napari.

Installation

Install Squidpy via PyPI by running:

pip install squidpy
# or with napari included
pip install 'squidpy[interactive]'

or via Conda as:

conda install -c conda-forge squidpy

Contributing to Squidpy

We are happy about any contributions! Before you start, check out our contributing guide.

squidpy's People

Contributors

annachristina avatar chaichontat avatar cornhundred avatar davidsebfischer avatar dfhannum avatar dineshpalli avatar djlee1 avatar francescadr avatar ghar1821 avatar giovp avatar gottfrid91 avatar grst avatar hspitzer avatar ilan-gold avatar ilibarra avatar ivirshup avatar jo-mueller avatar koncopd avatar linearparadox avatar llehner avatar louisk92 avatar m0hammadl avatar marcovarrone avatar michalk8 avatar mikelkou avatar mxmstrmn avatar pre-commit-ci[bot] avatar sabrinarichter avatar scverse-bot avatar stephenwilliams22 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

squidpy's Issues

feature_evaluation grid search

Description

The cropping function in manipulate.py and get_image_feature.py file now take **kwargs for easy passing of multiple cropping parameters to perform grid search. Remaining is the evaluation script in hs_feature_evaluation.ipynb.

  1. Add function to compute cluster quality score for grid search comparisons
  2. finish the evaluation loop in hs_feature_evaluation.ipynb computing features, clusters and cluster quality score for each crop setting. See notebook for more detail.

NOTE: most likely one need multithreading for feasability of all features

Counting by edges inflating z-scores for high number of n_rings.

The function permtest_leiden_pairs works well for nodes so far. However, it reports inflated number of observed edges between leiden pairs when n_rings is 2 or higher. This can be visually inspected in visium cases where the top selected leiden pairs actually separated by many nodes.

spatial_connectivity(adata, n_rings=2)
permtest_leiden_pairs(adata, count_option='edges',
                              print_log_each=25, n_permutations=n_permutations)

I can attach images for this issue. It needs to be however tackled to allow the mode edges to be accepted.

define setup file

would like to know which packages are optional and should go into functions and which are not

Multi threading of feature calculation

Implement a multi-threaded/processed version of the feature calculation code.

Description:

  • Current function "get_features_abt.py" in ./spatial_tools/image/tool.py accumulates feature results from multiple spot ids into one data frame by looping trough the spot_ids and adding them sequentially.

  • For certain features, e.g. HOG, this takes to long. Currently in the notebook hs_extract_image_features_table-multi-threading.ipynb there is a multi-threaded version implemented but its not does improve speed much.

Possible reasons for not working:

  1. currently all threads/processes are accessing the same image to extract crop and features in parallel. This might cause queuing times. A possible solution would be to pre-compute the crops and then apply parallelization.

tutorial / analysis of image features

  • write a clean tutorial on how to use the implemented features
  • analysis: combine genes + features / compare clusters, overlap, etc. Look if we can find interesting things in the data with the new image features!

relates image features to discrete annotation (e.g. clusters) or continuous annotations (genes)

@LuckyMD point during GM: provide ways to relate image features to discrete annotation, such as clusters, found in either gene expression space or image feature space, or continuous annotations, such as genes. Ways that this could be done:

  • correlates e.g. marker genes with image features @LouisK92
  • regression framework: Y= AX + B where X are image features, and Y are discrete or continuous annotations.
  • others

evaluation of different features

Some ideas:

  • Without ground truth: look at how good a clustering based on the features is (eg. silhouette score?)
  • With ground truth: Use gene expression space as GT (e.g. different cell types in brain have different morphology), and compare feature clustering to cell type clusters

Account for cavities in crops and spatial graph

basically, if you extract features on a crop larger than spot sizes, how do you account for crop that doesn't have tissue anymore?
Also, for spatial graph the same, maybe sometime the cavity is a blood vessel and you want to keep the spots close to each other

implement methods for segmenting cells / nucleii from histopathological tissue images / fluorescence images

Benefit: knowledge of cell count in each spot is very important for simulation for deconvolution, and for analysing data in general. On top of cell segmentations could calculate shape / size statistics as additional features

test_get_image_features is failing

the errors occurs in this line: spot_diameter = adata.uns['spatial'][dataset_name]['scalefactors']['spot_diameter_fullres']
Its failing because the dummy adata does not have `.uns['spatial'] containing the spot_diamteres.

Function to build graph from spatial coordinates

Core function to compute adjancency matrix from spatial coordinates array.

  • It should account for the type of spatial data (hex Visium spots or general spatial coordinates like FISH/IMC).
  • It should also allow flexibility to select the number of neighborhood (for e.g. visium, the number of surrounding circles to the spot, for others the total number of nhoods).

You can look at an initial prototype from Isaac here: scverse/scanpy@c117508

However, we might want to incorporate a notion of radius (in pixel coordinates) that would express better distances in space.

Agenda 1-10

Points to be discussed

  • images:
    • features?
  • spatial graph:
    • Integrate cellphoneDb/omnipath to neighborhood enrichment to provide ligand receptor pair in a straighforward way (Giovanni)
    • Ripley function relies on Astropy: include it as rquirements or re-implement from scratch ?
    • Clustering with spatial coordinate: remove or adapt?
  • plotting:
    • move outside of both graph and image?
  • General:
    • Include other people?

Permutation based test

Permutation based test as in histoCAT to compute neighborhood enrichment based on clusters in gene expression space. For reference, see paper

Additional features for pl.spatial function

  • Additional function parameters / changed functionality / changed defaults?
  • New analysis tool: A simple analysis tool you have been using and are missing in sc.tools?
  • New plotting function: A kind of plot you would like to see in sc.pl?
  • External tools: Do you know an existing package that should go into sc.external.*?
  • Other?

Here's some features that we should add after the scatterplot module is refactored:

  • Add scalebar on the tissue image as suggested on Twitter .
  • Change default to “plot hires if available, else lowres if available, else nothing” .
  • Add alpha smoothing for continuous features as in Seurat .

I'll modify this issue if other ideas pop up.

Logging

Shall we have a specific logging class like scanpy
or standard logging should do the job?

package API

should we define an API of operations that can be run on adata objects? at least in images there is mostly array manipulating functions right now that could be wrapped for this or which could have their own API?

Agenda 8-10

roadmap

  • we now have a roadmp, check it out in projects

development practices:

  • is it good time for typing?
  • remember to format with black and use pylint
  • how does it work with counting contributions in the repo? seems like whoever squash and merge is counted as contributor, and not who actually wrote the code. Does it make sense? Shall we always make the author of the PR merge?
  • need to discuss about push rights
  • need to discuss about common dataset to be stored in repo for testing and CI

next steps

  • discuss next tasks for everyone

scaling of image features

Problem: Distributions of different features extracted from images might have very different value magnitudes. This will introduce biases etc. in downstream analyses (e.g. clustering).

Ideally we can scale all features between 0 and 1. This might be tricky: If a feature has natural and reasonable min, max it's easily done. For other features we could set 0 and 1 according the observed min, max values (not ideal imo). Happy to discuss further.

statistic on node type connections in graph

when looking further into graph measures and metrics, I saw this networkx function:
https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.assortativity.node_attribute_xy.html#networkx.algorithms.assortativity.node_attribute_xy

would be quite easy to extract the clusters, i.e. node types which are often connected in the graph to maybe conclude something on cell-cell-interactions

e.g. you can use this information to also check if specific genes show higher expressions if those cell types are close to each other

image plotting functions

which plotting functions do we need for the images?

  • plot feature (from adata.obsm) on image
    • for downscaled pngs this already works with scanpy
  • plot overlays on image (e.g. segmentation)
  • plot image crops (with features / overlays)
    • useful to speed up plotting or view details

for interactive plotting we could use napari, but I think that some static plotting functions would be useful as well. Then this would behave in a very similar way to scanpy, making it easy for the users to adopt.

Agenda 24-09

Points to be discussed today with the group @hspitzer :

  • package organisation
    • new package api - @davidsebfischer comment in the API-Issue #39
    • cleanup / merging of scripts (_utils.py vs tools.py) + notebook organisation (naming convention + separate folders for devel and finished tutorials)
  • programming best practices
    • settle on one way of doing docstrings
    • write tests
    • develop in feature branches + create PRs.
  • make sure everybody knows what they can/should be working on.

Immediate next steps (for everyone):

  • adapt to new api, cleanup files
  • document + test

functional API cropping

i think cropping should be possible on adata object, ie creating new img entries, so that cropping can be varied in functional API workflows, eg load->crop->segment->uncrop->plot. I have a uncrop function on the segmentation branch already, we just need to wrap these in functions that sit on adata

Typing

Yeah, we should probably do it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.