Giter VIP home page Giter VIP logo

diffusion_curvature_old's Introduction

Diffusion Curvature

Note

The newest code for Diffusion Curvature may be found here. This repository is not maintained, and is preserved only as a monument to the past. There have since been significant enhancements to the diffusion curvature algorithm.

Diffusion curvature is a pointwise extension of Ollivier-Ricci curvature, designed specifically for the often messy world of pointcloud data. Its advantages include:

  1. Unaffected by density fluctuations in data: it inherits the diffusion operator’s denoising properties.
  2. Fast, and scalable to millions of points: it depends only on matrix powering - no optimal transport required.

Install

To install with pip (or better yet, poetry),

pip install diffusion-curvature

or

poetry add diffusion-curvature

Conda releases are pending.

Usage

To compute diffusion curvature, first create a graphtools graph with your data. Graphtools offers extensive support for different kernel types (if creating from a pointcloud), and can also work with graphs in the PyGSP format. We recommend using anistropy=1, and verifying that the supplied knn value encompasses a reasonable portion of the graph.

from diffusion_curvature.datasets import torus
import graphtools
X_torus, torus_gaussian_curvature = torus(n=5000)
G_torus = graphtools.Graph(X_torus, anisotropy=1, knn=30)

Graphtools offers many additional options. For large graphs, you can speed up the powering of the diffusion matrix with landmarking: simply pass n_landmarks=1000 (e.g) when creating the graphtools graph. If you enable landmarking, diffusion-curvature will automatically use it.

Next, instantiate a DiffusionCurvature operator.

from diffusion_curvature.graphtools import DiffusionCurvature
DC = DiffusionCurvature(t=12)

source

DiffusionCurvature

 DiffusionCurvature (t:int, distance_type='PHATE', dimest=None,
                     use_entropy:bool=False, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Type Default Details
t int Number of diffusion steps to use when measuring curvature. TODO: Heuristics
distance_type str PHATE
dimest NoneType None Dimensionality estimator to use. If None, defaults to KNN with default params
use_entropy bool False If true, uses KL Divergence instead of Wasserstein Distances. Faster, seems empirically as good, but less proven.
kwargs

And, finally, pass your graph through it. The DiffusionCurvature operator will store everything it computes – the powered diffusion matrix, the estimated manifold distances, and the curvatures – as attributes of your graph. To get the curvatures, you can run G.ks.

G_torus = DC.curvature(G_torus, dimension=2) # note: this is the intrinsic dimension of the data
plot_3d(X_torus, G_torus.ks, colorbar=True, title="Diffusion Curvature on the torus")

Using on a predefined graph

If you have an adjacency matrix but no pointcloud, diffusion curvature may still be useful. The caveat, currently, is that our intrinsic dimension estimation doesn’t yet support graphs, so you’ll have to compute & provide the dimension yourself – if you want a signed curvature value.

If you’re only comparing relative magnitudes of curvature, you can skip this step.

For predefined graphs, we use our own ManifoldGraph class. You can create one straight from an adjacency matrix:

from diffusion_curvature.manifold_graph import ManifoldGraph, diffusion_curvature, diffusion_entropy_curvature, entropy_of_diffusion, wasserstein_spread_of_diffusion, power_diffusion_matrix, phate_distances
from diffusion_curvature.kernels import gaussian_kernel
import numpy as np
# if you want (or have) to compute your own A
A = gaussian_kernel(X_torus, kernel_type="adaptive", k = 20, anisotropic_density_normalization=1)
np.fill_diagonal(A,0)
# initialize the manifold graph; input your computed dimension along with the adjacency matrix
G_pure = ManifoldGraph(A = A, dimension=2, anisotropic_density_normalization=1)
G_pure = diffusion_curvature(G_pure, t=8)
plot_3d(X_torus, G_pure.ks, title = "Diffusion Curvature on Graph - without pointcloud")

Alternately, to compute just the relative magnitudes of the pointwise curvatures (without signs), we can directly use either the wasserstein_spread_of_diffusion (which computes the $W_1$ distance from a dirac to its t-step diffusion), or the entropy_of_diffusion function (which computes the entropy of each t-step diffusion). The latter is nice when the manifold’s geodesic distances are hard to estimate – it corresponds to replacing the wasserstein distance with the KL divergence.

Both of these estimate an “inverse laziness” value that is inversely proportional to curvature. To use magnitude estimations in which the higher the curvature, the higher the value, we can simply take the reciprocal of the output.

# for the wasserstein version, we need manifold distances
G_pure = power_diffusion_matrix(G_pure,t=8)
G_pure = phate_distances(G_pure)
ks_wasserstein = wasserstein_spread_of_diffusion(G_pure)
# for the entropic version, we need only power the diffusion operator
G_pure = power_diffusion_matrix(G_pure, t=8)
ks_entropy = entropy_of_diffusion(G_pure)
plot_3d(X_torus, 1/ks_entropy, title="Diffusion Entropy on Torus")

diffusion_curvature_old's People

Contributors

professorwug avatar

Stargazers

Bhartendu Kumar avatar

Watchers

Bhartendu Kumar avatar  avatar

diffusion_curvature_old's Issues

Intelligently select size of comparison space to match t

Presently, the comparison spaces are constructed using the same number of points as the dataset.
If the diffusion time is reasonable, only a fraction of these points will be used -
so why not estimate the number of points needed ahead of time? It could save a significant amount of computation when powering the matrix.

Diffusion-based dimensionality estimation, to support graphs

We currently use the kNN dimensionality estimator from skdim - but this has problems:

  • It needs an array of points
  • It laboriously reconstructs the graph from those points (duplicating our own graph)
  • It uses shortest path distances, when we have better geodesic estimates

There are two ways of resolving this:

  1. In the short term, use the same algorithm, but use the precomputed geodesic distances. This would solve all of the above issues.
  2. In the long term, find something more robust than kNN dimensionality estimation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.