Giter VIP home page Giter VIP logo

geomstats's Introduction

Geomstats logo

Geomstats is a Python package for computations, statistics, machine learning and deep learning on manifolds.

The package is organized into two main modules: geometry and learning. The module geometry implements differential geometry: manifolds, Lie groups, fiber bundles, shape spaces, information manifolds, Riemannian metrics, and more. The module learning implements statistics and learning algorithms for data on manifolds. Users can choose between backends: NumPy, Autograd or PyTorch.

Code PyPI version Downloads Zenodo
Continuous Integration Build Status python
Code coverage (np, autograd, torch) Coverage Status npCoverage Status autogradCoverage Status torch
Documentation doc binder tutorial
Community contributions Slack Twitter

Keep in touch with the community by joining us on our slack workspace!

NEWS:

Citing Geomstats

If you find geomstats useful, please kindly cite:

@article{JMLR:v21:19-027,
  author  = {Nina Miolane and Nicolas Guigui and Alice Le Brigant and Johan Mathe and Benjamin Hou and Yann Thanwerdas and Stefan Heyder and Olivier Peltre and Niklas Koep and Hadi Zaatiti and Hatem Hajri and Yann Cabanes and Thomas Gerald and Paul Chauchat and Christian Shewmake and Daniel Brooks and Bernhard Kainz and Claire Donnat and Susan Holmes and Xavier Pennec},
  title   = {Geomstats:  A Python Package for Riemannian Geometry in Machine Learning},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {223},
  pages   = {1-9},
  url     = {http://jmlr.org/papers/v21/19-027.html}
}

We would sincerely appreciate citations to both the original research paper and the software version, to acknowledge authors who started the codebase and made the library possible, together with the crucial work of all contributors who are continuously implementing pivotal new geometries and important learning algorithms, as well as refactoring, testing and documenting the code to democratize geometric statistics and (deep) learning and foster reproducible research in this field.

Install geomstats via pip3

From a terminal (OS X & Linux), you can install geomstats and its requirements with pip3 as follows:

pip3 install geomstats

This method installs the latest version of geomstats that is uploaded on PyPi. Note that geomstats is only available with Python3.

Install geomstats via conda

From a terminal (OS X & Linux) or an Anaconda prompt (Windows), you can install geomstats and its requirements with conda as follows:

conda install -c conda-forge geomstats

This method installs the latest version of geomstats that is uploaded on conda-forge. Note that geomstats is only available with Python3.

Install geomstats via Git

From a terminal (OS X & Linux), you can install geomstats and its requirements via git as follows:

git clone https://github.com/geomstats/geomstats.git
cd geomstats
pip3 install .

This method installs the latest GitHub version of geomstats.

Note that this only installs the minimum requirements. To add the optional, development, continuous integration and documentation requirements, refer to the file pyproject.toml.

Install geomstats : Developers

Developers should git clone the main branch of this repository, together with the development requirements and the optional requirements to enable autograd and pytorch backends:

pip3 install geomstats[dev,opt]

Additionally, we recommend installing our pre-commit hook, to ensure that your code follows our Python style guidelines:

pre-commit install

Choose the backend

Geomstats can run seamlessly with numpy, autograd or pytorch. Note that autograd and pytorch and requirements are optional, as geomstats can be used with numpy only. By default, the numpy backend is used. The visualizations are only available with this backend.

To get the autograd and pytorch versions compatible with geomstats, install the optional requirements:

pip3 install geomstats[opt]

To install only the requirements for a given backend do:

pip3 install geomstats[<backend_name>]

You can choose your backend by setting the environment variable GEOMSTATS_BACKEND to numpy, autograd or pytorch, and importing the backend module. From the command line:

export GEOMSTATS_BACKEND=<backend_name>

and in the Python3 code:

import geomstats.backend as gs

Getting started

To use geomstats for learning algorithms on Riemannian manifolds, you need to follow three steps:

  • instantiate the manifold of interest,
  • instantiate the learning algorithm of interest,
  • run the algorithm.

The data should be represented by a gs.array. This structure represents numpy arrays, autograd or pytorch tensors, depending on the choice of backend.

The following code snippet shows the use of tangent Principal Component Analysis on simulated data on the space of 3D rotations.

from geomstats.geometry.special_orthogonal import SpecialOrthogonal
from geomstats.learning.pca import TangentPCA

so3 = SpecialOrthogonal(n=3, point_type="vector")

data = so3.random_uniform(n_samples=10)

tpca = TangentPCA(space=so3, n_components=2)
tpca = tpca.fit(data)
tangent_projected_data = tpca.transform(data)

All geometric computations are performed behind the scenes. The user only needs a high-level understanding of Riemannian geometry. Each algorithm can be used with any of the manifolds and metric implemented in the package.

To see additional examples, go to the examples or notebooks directories.

Contributing

See our contributing guidelines!

Interested? Contact us and join the next hackathons. Previous Geomstats events include:

  • January 2020: hackathon at Inria Sophia-Antipolis, Nice, France
  • April 2020: remote online hackathon
  • March - April 2021: hackathon, hybrid at Inria Sophia-Antipolis / remotely with contributors from around the world
  • July 2021: hackathon at the Geometric Science of Information (GSI) conference, Paris, France
  • August 2021: international Coding Challenge at the International Conference on Learning Representations (ICLR), remotely
  • December 2021: fixit hackathon at the Sorbonne Center for Artificial Intelligence, Paris, France.
  • February 2022: hackathon, hybrid at Inria Sophia-Antipolis / remotely with contributors from around the world
  • April 2022: in-person hackathon at the Villa Cynthia, Saint Raphael, France.
  • April 2022: international Coding Challenge at the International Conference on Learning Representations (ICLR), remotely.
  • June 2022: hackathon at the University of Washington (UW).
  • October 17-21, 2022: hackathon during the trimester Geometry and Statistics in Data Sciences, in Paris.

Acknowledgements

This work is supported by:

  • the National Science Foundation (grant 2313150).
  • the National Science Foundation (NSF CAREER award, grant 2240158).
  • the Inria-Stanford associated team GeomStats,
  • the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement G-Statistics No. 786854),
  • the French society for applied and industrial mathematics (SMAI),
  • the National Science Foundation (grant NSF DMS RTG 1501767).

geomstats's People

Contributors

abdellaoui-souhail avatar adele-myers avatar alebrigant avatar ambellan avatar cshewmake2 avatar elodiemaignant avatar florent-michel avatar hzaatiti avatar johmathe avatar johnharveymath avatar jules-deschamps avatar luisfpereira avatar mariusguerard avatar maya95assal avatar mortenapedersen avatar nguigs avatar ninamiolane avatar nkoep avatar opeltre avatar pchauchat avatar qbarthelemy avatar saitejautpala avatar shubhamtalbar96 avatar stefanheyder avatar tgeral68 avatar tramy1258 avatar xpennec avatar yanncabanes avatar ymontmarin avatar ythanwerdas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geomstats's Issues

Vectorisation of regularize() function in special_orthogonal_group.py

If the input array for regularise in class SpecialOrthogonalGroup class is of Nx3, where N is the number of SO(3) vectors to regularise, I don't think the regularisation function is correct if N > 1 as np.linalg.norm(rot_vec) calculates the norm of the entire Nx3 matrix.

Should the code be as follows?

rot_vec = vectorization.expand_dims(rot_vec, to_ndim=2)
assert self.belongs(rot_vec)

angle = np.linalg.norm(rot_vec,axis=1)
regularized_rot_vec = rot_vec

k = np.floor(angle / (2 * np.pi) + .5)
fact = (1. - 2. * np.pi * k / angle)

fact[angle == 0] = 1

regularized_rot_vec = (rot_vec.T * fact).T

assert regularized_rot_vec.ndim == 2
return regularized_rot_vec

example of gradient_descent_s2 does not work

The first exemple keep failing :

python3 ./examples/gradient_descent_s2.py
Using numpy backend
x: [0.86604857 0.49993232 0.00524429]
reached precision 1e-05
iterations: 85
Traceback (most recent call last):
  File "./examples/gradient_descent_s2.py", line 124, in <module>
    main()
  File "./examples/gradient_descent_s2.py", line 120, in main
    np.testing.assert_almost_equal(loss(x), np.min(eig), decimal=2)
  File "/usr/local/lib/python3.6/site-packages/numpy/testing/_private/utils.py", line 584, in assert_almost_equal
    raise AssertionError(_build_err_msg())
AssertionError: 
Arrays are not almost equal to 2 decimals
 ACTUAL: 0.7828534176680455
 DESIRED: 0.08684823961199317

pip :

autograd (1.2)
codecov (2.0.15)
coverage (4.5.1)
matplotlib (2.2.3)
nose2 (0.8.0)
numpy (1.15.1)
scipy (1.1.0)
tensorflow (1.10.1)
torch (0.4.0)

'module' object has no attribute 'int32'

The file geomstats/backend/numpy.py is preventing geomstats from importing numpy, renaming this file would resolve this issue (e.g. numpy_base.py).

This line would need to be changed accordingly to: from .numpy_base import * # NOQA

(pytorch) bh1511@kythnos:/vol/medic01/users/bh1511/_build/DeepPose/pytorch$ python example.py 
Traceback (most recent call last):
  File "example.py", line 18, in <module>
    from se3_geodesic_loss import SE3GeodesicLoss
  File "/vol/medic01/users/bh1511/_build/DeepPose/pytorch/se3_geodesic_loss.py", line 29, in <module>
    from geomstats.invariant_metric import InvariantMetric
  File "/vol/medic01/users/bh1511/_build/geomstats-master/geomstats/__init__.py", line 2, in <module>
    import geomstats.euclidean_space
  File "/vol/medic01/users/bh1511/_build/geomstats-master/geomstats/euclidean_space.py", line 6, in <module>
    from geomstats.riemannian_metric import RiemannianMetric
  File "/vol/medic01/users/bh1511/_build/geomstats-master/geomstats/riemannian_metric.py", line 7, in <module>
    import geomstats.backend as gs
  File "/vol/medic01/users/bh1511/_build/geomstats-master/geomstats/backend/__init__.py", line 13, in <module>
    from .common import *  # NOQA
  File "/vol/medic01/users/bh1511/_build/geomstats-master/geomstats/backend/common.py", line 1, in <module>
    import numpy as np
  File "/vol/medic01/users/bh1511/_build/geomstats-master/geomstats/backend/numpy.py", line 5, in <module>
    int32 = np.int32
AttributeError: 'module' object has no attribute 'int32'
(pytorch) bh1511@kythnos:/vol/medic01/users/bh1511/_build/DeepPose/pytorch$ 

angles close to pi in special_orthogonal_group.py rotation_vector_from_matrix() function

I think there are issues with corner cases in this function.

Test Script:

import numpy as np
import geomstats.losses as losses

from geomstats.special_euclidean_group import SpecialEuclideanGroup

DIMENSION = 3
SE3_GROUP = SpecialEuclideanGroup(n=DIMENSION)

# load up variable 'arr' here

SE3_GROUP.rotations.rotation_vector_from_matrix(arr)

For cases that are close to 0, the output sign is correct, but not for cases that are close to pi

# Close to 0

arr = np.array([[1,0,0],[0,1,-1e-9],[0,1e-9,1]])    # input rot_vec: [1e-9 0 0]          output rot_vec: [1e-9 0 0]
arr = np.array([[1,0,1e-9],[0,1,0],[-1e-9,0,1]])    # input rot_vec: [0 1e-9 0]          output rot_vec: [0 1e-9 0]
arr = np.array([[1,-1e-9,0],[1e-9,1,0],[0,0,1]])    # input rot_vec: [0 0 1e-9]          output rot_vec: [0 0 1e-9]


arr = np.array([[1,0,0],[0,1,1e-9],[0,-1e-9,1]])    # input rot_vec: [-1e-9 0 0]         output rot_vec: [-1e-9 0 0]
arr = np.array([[1,0,-1e-9],[0,1,0],[1e-9,0,1]])    # input rot_vec: [0 -1e-9 0]         output rot_vec: [0 -1e-9 0]
arr = np.array([[1,1e-9,0],[-1e-9,1,0],[0,0,1]])    # input rot_vec: [0 0 -1e-9]         output rot_vec: [0 0 -1e-9]

# Close to pi

arr = np.array([[-1,-1e-9,0],[1e-9,-1,0],[0,0,1]])  # input rot_vec: [0 0 pi-1e-9]       output rot_vec: [-0 -0 -pi]
arr = np.array([[-1,0,1e-9],[0,1,0],[-1e-9,0,-1]])  # input rot_vec: [0 pi-1e-9 0]       output rot_vec: [-0 -pi -0]
arr = np.array([[1,0,0],[0,-1,-1e-9],[0,1e-9,-1]])  # input rot_vec: [pi-1e-9 0 0]       output rot_vec: [-pi -0 -0]

arr = np.array([[-1,1e-9,0],[-1e-9,-1,0],[0,0,1]])  # input rot_vec: [0 0 -(pi-1e-9)]    output rot_vec: [-0 -0 -pi]
arr = np.array([[-1,0,-1e-9],[0,1,0],[1e-9,0,-1]])  # input rot_vec: [0 -(pi-1e-9) 0]    output rot_vec: [-0 -pi -0]
arr = np.array([[1,0,0],[0,-1,1e-9],[0,-1e-9,-1]])  # input rot_vec: [-(pi-1e-9) 0 0]    output rot_vec: [-pi -0 -0]

Impossible to go more than 2.xx

Framework for some reason doesn't allow me to go further than 2.xxx in the intrinsic coordinate system of H2. See the square example to reproduce.

Tensorflow calculates NAN gradients

To test the correctness of gradients, I modified and ran loss_and_gradient_se3.py. The following modification has been made in main:

    y_pred = SE3.random_uniform(10)
    y_true = SE3.random_uniform(10)

    with tf.GradientTape() as g:
        g.watch(y_pred)
        loss_rot_vec = loss(y_pred, y_true)
        grad_rot_vec = grad(y_pred, y_true)
        dy_dx = g.gradient(loss_rot_vec, y_pred)
        print('The loss between the rotation vectors is: {}'.format(
            loss_rot_vec))
        print('The riemannian gradient is: {}'.format(
            grad_rot_vec))
        print('Tensorflow calculated gradient: {}'.format(
            dy_dx))

This is the output I receive:

/vol/medic01/users/bh1511/_venv/geomstats-py3/bin/python3.6 /Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py --cmd-line --multiproc --qt-support=auto --client 127.0.0.1 --port 57095 --file /vol/medic01/users/bh1511/_build/geomstats-farrell/examples/loss_and_gradient_se3.py
pydev debugger: process 5100 is connecting

Connected to pydev debugger (build 182.4505.26)
/vol/medic01/users/bh1511/_venv/geomstats-py3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
Using tensorflow backend
2018-11-15 10:02:23.670933: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
The loss between the rotation vectors is: [[7.0383186]
 [3.8544152]
 [3.9183326]
 [3.0725262]
 [2.9405794]
 [7.415542 ]
 [3.8067136]
 [4.2343645]
 [1.4367774]
 [2.0562656]]
The riemannian gradient is: [[ 3.1090195  -1.3207141   2.7089927   1.1367302   1.115068   -1.3076192 ]
 [-1.6554008   0.7796661  -1.7344122  -1.9494474   2.2295787  -0.0385334 ]
 [ 0.45250762 -3.1852932   0.6125859  -0.2580671  -0.10605839 -2.07305   ]
 [-0.12945078  2.7257762  -0.27599958 -1.5244842   1.6424325   0.31258366]
 [-0.02796454  1.8669416  -2.3057005  -1.0833997   1.3495882  -0.2439363 ]
 [-0.71234685 -3.1717374   2.6985168   1.6107618   1.4890841  -1.9891183 ]
 [-0.7424381   2.2465038   1.9941958  -1.8383911   0.7192763  -0.49961993]
 [ 0.6298961   0.58670014 -0.8475082  -3.727872   -0.97431326 -0.5386404 ]
 [-0.06864594  0.507145   -1.2685865  -0.06768421  0.7303784  -1.8040254 ]
 [ 1.1138968  -1.6037818  -0.6482454   0.81384236  1.617016    0.65715736]]
Tensorflow calculated gradient: [[        nan         nan         nan  1.1367303   1.1150677  -1.307619  ]
 [        nan         nan         nan -1.9494474   2.2295775  -0.03853333]
 [        nan         nan         nan -0.2580667  -0.10605845 -2.07305   ]
 [        nan         nan         nan -1.5244837   1.6424323   0.31258366]
 [        nan         nan         nan -1.0834      1.3495883  -0.24393654]
 [        nan         nan         nan  1.6107624   1.489084   -1.9891189 ]
 [        nan         nan         nan -1.8383911   0.7192764  -0.49961996]
 [        nan         nan         nan -3.7278717  -0.974313   -0.53864086]
 [        nan         nan         nan -0.06768411  0.7303788  -1.8040257 ]
 [        nan         nan         nan  0.8138422   1.6170154   0.6571572 ]]

Using tensorflow's autograd, the gradient for rotations are nan's. I assume because in the functions, there are checks for if close to pi or 0, somewhere along the path, the gradient is exploding, likely to be a divide by 0 problem. Will need to debug this...

Write Installation guide

Currently there are some problems installing geomstats under different Python versions, in particular Python 3.7 and 3.8 cause problems with tensorflow and pytorch.

To make installation more consistent this, a working installation method should be documented, including the required programs (pyenv / pipenv (?)) and operations required to install all dependecies without causing any error.

Split out dependencies into extras

A project that wants to use geomstats as a dependency is going to pull in several further dependencies that are not strictly necessary to use the package:

  • For testing: codecov, coverage, nose2
  • Extra backends: tensorflow, torch

matplotlib is mostly used for examples and a self-contained visualization module, so that could go either way, depending on how integral you think the visualizations are to the functionality of the package.

The way to handle these is to use the extras_requires key in setup.py, e.g.:

install_requires = [
    'autograd',
    'numpy>=1.14.1',
    'scipy',
    'matplotlib',
    ]

extras_requires = {
    'test': ['nose2', 'coverage', 'codecov'],
    'tensorflow': ['tensorflow>=1.8'],
    'torch': ['torch==0.4.0'],
    }
extras_requires['all'] = sum(extras_requires.values(), [])

setup(name='geomstats',
      version='1.11',
      install_requires=install_requires,
      extras_requires=extras_requires,
      description='Geometric statistics on manifolds',
      url='http://github.com/geomstats/geomstats',
      author='Nina Miolane',
      author_email='[email protected]',
      license='MIT',
      packages=find_packages(),
      zip_safe=False)

Then you can install with pip install "geomstats[tensorflow]" to get tensorflow capabilities, or pip install "geomstats[all]" if you want full functionality.

This will help with nipy/nipype#2607, which is at least partially held up because tensorflow does not yet support Python 3.7. It will also help keep geomstats lightweight for people who are only using it for a function or two, and do not need tensorflow or pytorch.

compose() in special_euclidean_group.py

There is a compatibility issue with Tensorflow in this function L213. n_compositions is of type EagerTensor and is not iteritable with a for loop.

I see that the function composes one vector with many, many vectors with one, and many vectors with many providing that the cardinality of the inputs are equal. I would suggest that instead of

            for i in range(n_compositions):
                translation_1_i = (translation_1[0] if n_points_1 == 1
                                   else translation_1[i])
                rot_mat_1_i = (rot_mat_1[0] if n_points_1 == 1
                               else rot_mat_1[i])
                translation_2_i = (translation_2[0] if n_points_2 == 1
                                   else translation_2[i])
                composition_translation[i] = (gs.dot(translation_2_i,
                                                     gs.transpose(rot_mat_1_i))
                                              + translation_1_i)

where it keeps the index of the single vector at 0, I would suggest using tile/repeat the single vector N times. That way, the for-loop can be dropped in place of einsum, and it also saves 3 if statement checks per iteration.

Would this be a good solution? I'll start code and test this.

special_orthogonal_group.py unit tests

I have tried running the unit test for special_orthogonal_group.py with the tensorflow backend by enabling:

import os
os.environ['GEOMSTATS_BACKEND'] = 'tensorflow'  # NOQA

import tensorflow as tf
tf.enable_eager_execution()

However it fails 100% of the tests. I have started to fix some, and have some passing, but it seems there are still a lot to do. Most failures are backend issues such as incorrect of use of syntax (i.e. addressing and assigning tensor elements: someArray[someIndex] = someValue), this is not available in tensorflow. This may be allowed in pyTorch but I think it does a mem-copy from gpu to cpu, does the operation and then copy it back to the gpu. If case, then the overhead would make it very inefficient.

I have also noticed that the code is made to be generalised to nD, but there are special implementations for SO(2), SO(3), SE(2) and SE(3) cases. For example in skew_matrix_from_vector(), we can have a if self.n == 2: and if self.n == 3: and else:.

        if self.n == 2: # SO(2)
            id_skew = gs.array([[[0., 1.], [-1., 0.]]] * n_vecs)
            skew_mat = gs.einsum('nij,ni->nij', gs.cast(id_skew, gs.float32), vec)

test_regularize() fails here on the SO(3) case. It seems the CPU and GPU machine precision is different. Running on numpy, the 0 value is in the order of e-8 where as its e-7 on GPU.

Let me know your thoughts, and how about we should go to fix these :)

invariant metric with point_type=matrix

The log seems to expect vectors even when using matrices to represent points

SO3_GROUP = SpecialOrthogonalGroup(n=3, point_type='matrix')
METRIC = SO3_GROUP.bi_invariant_metric

N_SAMPLES = 10

data = SO3_GROUP.random_uniform(n_samples=N_SAMPLES)
mean = METRIC.mean(data)

This raises a ValueError, due to

n_points, _ = point.shape

which can be removed as n_points is not used later, but then

assert gs.ndim(log) == 2

Is it necessary?

Always use geomstats.backend

In order to properly abstract the different backends, we should review the use of numpy. Currently, numpy is used directly in

geomstats/geometry/discretized_curves_space.py
geomstats/visualization.py
geomstats/learning/_template.py
geomstats/learning/pca.py

Always stick to one backend in unit tests

The unit tests should respect the choice of the selected backend. Currently, tests/test_backend_{tensorflow,numpy}.py will temporarily switch the backend, which involves reloading geomstats.backend after modifying the GEOMSTATS_BACKEND environment variable. This e.g. leads to cryptic tensorflow deprecation warnings when running the unit tests on, say, the numpy backend.

Tensorflow: support for unknown compile-time size of first dimension of Tensor

Does geomstats support dynamic tensor size? That is, can I for example write:

SO3 = SpecialOrthogonalGroup(n=3)
tensor_of_matrices = tf.reshape(inputs, [-1, 3, 3])
SO3.rotation_vector_from_matrix(tensor_of_matrices)

I'm getting an assertion error, namely:

....special_orthogonal_group.py", line 429, in rotation_vector_from_matrix
    assert trace.shape == (n_rot_mats, 1), trace.shape
AssertionError: (?, 1)

This seems to suggest that n_rot_mats needs to be known at compile time. Is that correct, or am I misinterpreting this assertion statement?

Examples and applications need care

Here is my experience trying to run the Keras MNIST example

conda create -n geomstats_test
conda activate geomstats_test
conda install tensorflow-gpu=1.8
pip install geomstats
export GEOMSTATS_BACKEND=numpy
python gradient_decent_s2.py

Throws the error:

Using numpy backend
Traceback (most recent call last):
  File "gradient_decent_s2.py", line 23, in <module>
    from geomstats.geometry.hypersphere import Hypersphere
ModuleNotFoundError: No module named 'geomstats.geometry'

After fixing the import statements to:

from geomstats import ...

It gives output

x: [ 0.85848249  0.44807358 -0.24947517]
reached precision 1e-05
iterations: 46

Also throws the error

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/matplotlib/animation.py", line 161, in __getitem__
    return self.avail[name]
KeyError: 'ffmpeg'

So then I installed ffmpeg

conda install -c conda-forge ffmpeg

Then we get

x: [ 0.85848249  0.44807358 -0.24947517]
reached precision 1e-05
iterations: 46

Using numpy backend
Traceback (most recent call last):
  File "gradient_decent_s2.py", line 122, in <module>
    main()
  File "gradient_decent_s2.py", line 116, in main
    plot_and_save_video(geodesics, loss, out=output_file)
  File "gradient_decent_s2.py", line 67, in plot_and_save_video
    ax = fig.add_subplot(111, projection='3d', aspect='equal')
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/matplotlib/figure.py", line 1414, in add_subplot
    a = subplot_class_factory(projection_class)(self, *args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/matplotlib/axes/_subplots.py", line 69, in __init__
    self._axes_class.__init__(self, fig, self.figbox, **kwargs)
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/mpl_toolkits/mplot3d/axes3d.py", line 101, in __init__
    super().__init__(fig, rect, frameon=True, *args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 509, in __init__
    self.update(kwargs)
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/matplotlib/artist.py", line 974, in update
    ret = [_update_property(self, k, v) for k, v in props.items()]
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/matplotlib/artist.py", line 974, in <listcomp>
    ret = [_update_property(self, k, v) for k, v in props.items()]
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/matplotlib/artist.py", line 971, in _update_property
    return func(v)
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 1281, in set_aspect
    'It is not currently possible to manually set the aspect '
NotImplementedError: It is not currently possible to manually set the aspect on 3D axes

This was good enough and I moved onto:

python tangent_pca_s2.py
pip install sklearn

Forgot to fix the import again:

Using numpy backend
Traceback (most recent call last):
  File "tangent_pca_s2.py", line 12, in <module>
    from geomstats.geometry.hypersphere import Hypersphere
ModuleNotFoundError: No module named 'geomstats.geometry'

Here we go:

Coordinates of the Log of the first 5 data points at the mean, projected on the principal components:
[[ 0.12490868 -0.05401332]
 [-0.51283952 -0.42375442]
 [-0.22850627  0.40307911]
 [-0.04655643  0.2621807 ]
 [ 0.13915178 -0.08722839]]

Alright let's go with the keras MNIST example finally:

git clone https://github.com/geomstats/applications.git
cd applications/deep_learning/
cp /home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/tensorflow/python/ops/variables.py /home/ubuntu/
pip install keras/
./run.sh

Okay what now:

Traceback (most recent call last):
  File "mnist_hypersphere.py", line 54, in <module>
    kernel_manifold=Hypersphere(hypersphere_dimension)))
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/keras/layers/convolutional.py", line 465, in __init__
    **kwargs)
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/keras/layers/convolutional.py", line 104, in __init__
    super(_Conv, self).__init__(**kwargs)
  File "/home/ubuntu/anaconda3/envs/geomstats_test/lib/python3.6/site-packages/keras/engine/base_layer.py", line 122, in __init__
    raise TypeError('Keyword argument not understood:', kwarg)
TypeError: ('Keyword argument not understood:', 'kernel_manifold')

Let's check whatsup

vim keras/keras/engine/base_layer.py

No result for the search "manifold" so thats not good.

pip uninstall keras
git clone https://github.com/geomstats/keras.git
pip install keras/

Let's try again:

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
...

Train on 60000 samples, validate on 10000 samples
Test accuracy: 0.9732

Woohoo!

Using the loss of hyperbolic space in Keras

The paper mentioned that there was a convenience function for hyperbolic loss, however I tried
passing this to Keras with the appropriate space

def poincare_loss(metric):
    def loss(y_true, y_pred):
        L = metric.dist(y_pred, y_true)
        return L
    return loss

but got

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 1536 values, but the requested shape has 1792
         [[node loss/semantic_output_loss/einsum_3/Reshape_1 (defined at /anaconda3/lib/python3.7/site-packages/geomstats/backend/tensorflow.py:247) ]]
         [[bidirectional_1/TensorArrayUnstack_1/range/_227]]
  (1) Invalid argument: Input to reshape is a tensor with 1536 values, but the requested shape has 1792
         [[node loss/semantic_output_loss/einsum_3/Reshape_1 (defined at /anaconda3/lib/python3.7/site-packages/geomstats/backend/tensorflow.py:247) ]]
0 successful operations.
0 derived errors ignored.   

Using the TF backend of geomstats.

I also wasn't able to find any diff of what was changed to Keras in the applications folder or documentation on implementing a version of gradient descent that stays on the Poincaré ball.

Note that my concrete problem is something like https://datascience.stackexchange.com/questions/56889/hyperbolic-coordinates-poincar%c3%a9-embeddings-as-the-output-of-a-neural-network

issue in riemannian_metric.py

When trying the example:

johmathe@london ~/geomstats (master)$ python3 examples/plot_geodesics_se3.py
Traceback (most recent call last):
File "examples/plot_geodesics_se3.py", line 36, in
main()
File "examples/plot_geodesics_se3.py", line 25, in main
points = geodesic(t)
File "/Users/johmathe/geomstats/geomstats/riemannian_metric.py", line 213, in point_on_geodesic
tangent_vecs[i] = t[i] * new_initial_tangent_vec
ValueError: setting an array element with a sequence.

  1. Write a test to repro
  2. fix the issue ;)

geomstats tf backend result != numpy backend result on loss function

I have managed to run the geodesic loss function using geomstats with tensorflow backend after a few fixes, however the result is not the same as the numpy backend.

L330 in jacobian_translation() in special_euclidean_group.py. jacobian.ndim == 3 gives AttributeError: 'Tensor' object has no attribute 'ndim'. This can be fixed with a backend function e.g. get_tensor_rank() so that it can be attributed to numpy/tensorflow/pytorch.

With this commented out (as a temporary solution), the loss function would only run for N=1. N<1 will cause the einsum problem again (#153) in multiple areas.

The scripts I'm running are as follows:

Numpy:

import os
os.environ['GEOMSTATS_BACKEND'] = 'numpy'  # NOQA

import numpy as np
import geomstats.lie_group as lie_group
from geomstats.special_euclidean_group import SpecialEuclideanGroup

# Reproducable results
np.random.seed(0)

n_vecs = 1

SE3_GROUP = SpecialEuclideanGroup(3)
metric=SE3_GROUP.left_canonical_metric

y_pred = np.random.rand(n_vecs, 6)
y_true = np.random.rand(n_vecs, 6)

loss = lie_group.loss(y_pred, y_true, SE3_GROUP, metric)

print('y_pred:',y_pred)
print('y_true:',y_true)
print('loss:', loss)

'''
Output:
Using numpy backend
y_pred: [[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548  0.64589411]]
y_true: [[0.43758721 0.891773   0.96366276 0.38344152 0.79172504 0.52889492]]
loss: [[0.33912556]]
'''

Tensorflow:

import os
os.environ['GEOMSTATS_BACKEND'] = 'tensorflow'  # NOQA

import tensorflow as tf

import numpy as np
import geomstats.lie_group as lie_group
from geomstats.special_euclidean_group import SpecialEuclideanGroup

# Reproducable results
np.random.seed(0)

n_vecs = 1

SE3_GROUP = SpecialEuclideanGroup(3)
metric=SE3_GROUP.left_canonical_metric

y_pred = tf.placeholder(tf.float32, shape=[n_vecs, 6])
y_true = tf.placeholder(tf.float32, shape=[n_vecs, 6])

loss = lie_group.loss(y_pred, y_true, SE3_GROUP, metric)

sess = tf.InteractiveSession()

_y_pred = np.random.rand(n_vecs, 6)
_y_true = np.random.rand(n_vecs, 6)

_loss = sess.run(loss,feed_dict={
    y_pred: _y_pred,
    y_true: _y_true
})

print('_y_pred:',_y_pred)
print('_y_true:',_y_true)
print('loss:',_loss)

'''
Output:
Using tensorflow backend
_y_pred: [[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548  0.64589411]]
_y_true: [[0.43758721 0.891773   0.96366276 0.38344152 0.79172504 0.52889492]]
loss: [[3.6569684]]
'''

Function vectorization

Ideally, all the functions across all geomstats backends that accept array-like inputs and return array-like outputs should be vectorized (or batched, another term for that) thus containing vectors or arrays along 0-axis.

For example, the shape of an array input containing 1 vector of dimension 3 will be (1, 3), the shape of an array input containing 10 matrices of dimension 3x2 will be (10, 3, 2).

By doing this we will be able to write code in numpy-style avoiding loops. The vectorized code can be parallelized by lower-level routines to run on multiple CPU/GPU cores, while loops are obviously sequential.

Note that, in general, native backend functions are inconsistent in that sense. The same native function like qr for example, currently is vectorized in tensorflow, but not in numpy and pytorch. This inconsistency should be addressed by our backend wrappers.

tangent PCA bug

Hello,

I'm trying to apply tangent PCA on SPD matrices, something along the lines of: https://github.com/geomstats/geomstats/blob/master/examples/tangent_pca_so3.py

The following code doesn't work:

manifold = SPDMatricesSpace(10)
X = manifold.random_uniform(n_samples=140)
# X = manifold.vector_from_symmetric_matrix(X)   # this doesn't work either
mean = manifold.metric.mean(X[None, :])
tpca = TangentPCA(metric=manifold.metric)
tpca = tpca.fit(X)
tangent_projected_data = tpca.transform(X)

Passing the SPD matrices as vectors, I get the following error :

  File "/Users/nicolas/anaconda3/lib/python3.7/site-packages/geomstats/learning/pca.py", line 128, in fit
    self._fit(X, base_point, point_type)
  File "/Users/nicolas/anaconda3/lib/python3.7/site-packages/geomstats/learning/pca.py", line 164, in _fit
    tangent_vecs = self.metric.log(X, base_point=base_point)
  File "/Users/nicolas/anaconda3/lib/python3.7/site-packages/geomstats/spd_matrices_space.py", line 233, in log
    sqrt_base_point = gs.linalg.sqrtm(base_point)
  File "/Users/nicolas/anaconda3/lib/python3.7/site-packages/geomstats/backend/numpy_linalg.py", line 19, in sqrtm
    scipy.linalg.sqrtm, signature='(n,m)->(n,m)')(x)
  File "/Users/nicolas/anaconda3/lib/python3.7/site-packages/scipy/linalg/_matfuncs_sqrtm.py", line 170, in sqrtm
    T, Z = schur(A)
  File "/Users/nicolas/anaconda3/lib/python3.7/site-packages/scipy/linalg/decomp_schur.py", line 126, in schur
    raise ValueError('expected square matrix')
ValueError: expected square matrix

When passing a matrix, I get a different error in riemannian_metric:

  File "/Users/nicolas/anaconda3/lib/python3.7/site-packages/geomstats/riemannian_metric.py", line 389, in <lambda>
    lambda i, m, v, sq: while_loop_body(i, m, v, sq),
  File "/Users/nicolas/anaconda3/lib/python3.7/site-packages/geomstats/riemannian_metric.py", line 340, in while_loop_body
    tangent_mean += gs.einsum('nk,nj->j', weights, logs)
  File "/Users/nicolas/anaconda3/lib/python3.7/site-packages/geomstats/backend/numpy.py", line 256, in einsum
    return np.einsum(*args, **kwargs)
  File "<__array_function__ internals>", line 6, in einsum
  File "/Users/nicolas/anaconda3/lib/python3.7/site-packages/numpy/core/einsumfunc.py", line 1356, in einsum
    return c_einsum(*operands, **kwargs)
ValueError: operand has more dimensions than subscripts given in einstein sum, but no '...' ellipsis provided to broadcast the extra dimensions.

From what I can tell, the TangentPCA.fit() only supports vectors. However, the SPDMatricesSpace.log() requires a square matrix.

Is there any way around this ?

SO3 squared distance is not symmetric in edge cases

The original problem is the non-symmetry of the canonical riemannian squared distance on SO(3), in some edge cases:

SO3 = SpecialOrthogonalGroup(n=3)
metric = SO3.bi_invariant_metric
point_1 =  (np.pi - 1e-9) / np.sqrt(2) * np.array([0., 1., -1])  
# = [ 0.        ,  2.22144147, -2.22144147]
point_2 =  (np.pi + 0.3) / np.sqrt(5) * np.array([-2., 1., 0])  
# = [-3.07825405,  1.53912702,  0.        ]

sq_dist_1_2 = metric.squared_dist(point_1, point_2)
sq_dist_2_1 = metric.squared_dist(point_2, point_1)

gives

squared distance from 1 to 2: 2.3231752211574856
squared distance from 2 to 1: 3.4963809233705936

This problem probably has an origin in / is linked to the fact that group_log and group_exp, as well as metric.log and metric.exp, are not inverse of each other in some edge cases. (observed experimentally).

And this, in turn, probably has an origin in the inheritance of the function regularize from the class LieGroup (the function regularize is just the identity) to the class SpecialOrthogonalGroup (the regularization of the rotation vector as we know it).

Also, there might be more than one problem.

bug in special_orthogonal_group.py rotation_vector_from_matrix() with multiple matrices

for multiple rotation matrices with rotation angle of pi, the converted results are determined by the first one. such as,

import numpy as np
from geomstats.special_orthogonal_group import SpecialOrthogonalGroup

SO3 = SpecialOrthogonalGroup(n=3)
rot_mat = np.array([[[-1, 0, 0], [0, -1, 0], [0, 0, 1]], [[-1, 0, 0], [0, 1, 0], [0, 0, -1]]], dtype=np.float32)
vecs = SO3.rotation_vector_from_matrix(rot_mat)

vecs:

[[0.        0.        3.1415927]
 [0.        0.        3.1415927]]

which should be

[[0.        0.        3.1415927]
 [0.        3.1415927         0]]

Implement Procrustes metric on SPD matrices

For now, the Riemannian metric implemented on SPD matrices in geomstats is the so called affine-invariant metric or Fisher-Rao metric. The Procrustes metric is another canonical Riemannian metric on SPD matrices. It can also be viewed as coming from the theory of optimal transport and it is also called the Bures-Wasserstein metric.

Implementing the Procrustes metric is a first step towards defining families of Riemannian metrics on SPD matrices such as the alpha-Procrustes metrics, the power-affine metrics, etc.

Unify backend interface

Some functions do not behave the same in different backends. One example is gs.diag:

If numpy is selected as a backend the following call results in a 3-dimensional array: gs.diag(gs.array([1., 1., 0.])) will have shape (1, 3, 3). Calling the same code with pytorch as backend results in a tensor of shape (3,3).

To resolve this, we should specify (and unify) the output shapes of backend arrays.

Refactor Common Matrix Operations

Vectorized matrix operations such as multiplication, commutator and comarpison with a tolerance should be added to the MatrixSpace class and refactored from old code.

can't find arctan2 in backend

Running the example, loss_and_gradient_se3.py gives the following output:

(py35) $ python example_loss_and_gradient_se3.py
Using numpy backend
The loss between the rotation vectors is: 3.7191479601637325
The riemannian gradient is: [ 1.33950627 -0.55349447 -0.29312899 2. 2. 2. ]
Traceback (most recent call last):
File "example_loss_and_gradient_se3.py", line 135, in
main()
File "example_loss_and_gradient_se3.py", line 127, in main
representation='quaternion')
File "example_loss_and_gradient_se3.py", line 73, in grad
quat_arctan2 = gs.arctan2(quat_vec_norm, quat_scalar)
AttributeError: module 'geomstats.backend' has no attribute 'arctan2'

Error in executing geomstats samples

@ninamiolane

I am trying to use the tool "geomstats". I tried to install it, so I installed Ubuntu 18.04 and all of the tools introduced in the readme files of geomstats exactly. After that I try to execute: 1) nose2, run.sh ( from geomstats-master folder ), and 2) mnist_hypersphere.py ( from deep_learning folder ).
for the first runn I get this error:

KeyError: 'GEOMSTATS_BACKEND'

Ran 27 tests in 0.435s
FAILED (errors=19)

and for the second execution I get this:

Using TensorFlow backend.
Using tensorflow backend
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Traceback (most recent call last):
File "mnist_hypersphere.py", line 54, in
kernel_manifold=Hypersphere(hypersphere_dimension)))
File "/usr/local/lib/python3.6/dist-packages/Keras-2.1.6-py3.6.egg/keras/legacy/interfaces.py", line 91, in wrapper
File "/usr/local/lib/python3.6/dist-packages/Keras-2.1.6-py3.6.egg/keras/layers/convolutional.py", line 465, in init
File "/usr/local/lib/python3.6/dist-packages/Keras-2.1.6-py3.6.egg/keras/layers/convolutional.py", line 104, in init
File "/usr/local/lib/python3.6/dist-packages/Keras-2.1.6-py3.6.egg/keras/engine/base_layer.py", line 122, in init
TypeError: ('Keyword argument not understood:', 'kernel_manifold')

My settings are:

UBUNU 18.04
tensorflow==1.8
python==3.6.7
keras==2.1.6 installed from kerass folder of the tool applications-master and uses tf_patch
autograd
codecov
coverage
h5py==2.8.0
matplotlib
nose2
numpy==1.16.3
scipy
torch==0.4.0

Please help me install it correctly.

Add documentation on requirements for new algorithms

These should come with at least the following:

  • Working implementation of the algorithms
  • unittest verifying the correctness of the algorithm
  • An example detailing the use of the algorithm, ideally using real world data

In pytorch backend a lot of functions are actually wrappers to numpy

Currently, a lot of functions in pytorch backend and all except one in pytorch.linalg are just wrappers to numpy, which is suboptimal. It seems that it is because geomstats development started before pytorch 0.4, the release prior to that advanced linear algebra functions were virtually non-existent in pytorch.

Solution: update backend by replacing wrappers to numpy with native pytorch functions, if and when they become available.

Add unit tests for backend functions

The backend functions are currently not tested individually in isolation. This means that bugs in the backend abstractions are only spotted by unit tests if some of the more involved tests actually use a particular backend function. This would also resolve #303.

Seed all tests

Using test without seeding rng can be dangerous - you can sometimes get flaky tests. They need to be initialized.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.