Giter VIP home page Giter VIP logo

pylspack's Introduction

pylspack

A lightweight, multithreaded Python package for Leverage Scores computations of tall-and-thin, dense or sparse matrices of arbitrary rank. Includes algorithms for:

  • statistical leverage scores
  • column subset selection
  • sketching (CountSketch and Gaussian embeddings)
  • squared row-norms computation
  • basic linear algebra tools like rank-k update, diagonal scaling and more.

This package aims to provide quick access to multithreaded algorithms with minimal memory overhead in Python, which are compatible with standard numpy.ndarray and scipy.sparse.csr_matrix data structures. The individual kernels can be used as a building block for various problems like low rank approximations, least squares regression, preconditioning, feature selection, clustering, to name a few. It is left to the user to unlock the package's full potential.

The basic matrix algorithms of this package are developed in C++, using OpenMP for multithreading and SIMD instructions. As already noted, the implementation is designed for existing data structures of SciPy and Numpy and therefore the entire codebase is only tested and should only be used via the python wrappers. The C++ API can be used as a standalone package, but it has not been tested.

Citation

If you use this software in academic work, please consider citing the corresponding publications:

@article{sobczyk2021estimating,
  title={Estimating leverage scores via rank revealing methods and randomization},
  author={Sobczyk, Aleksandros and Gallopoulos, Efstratios},
  journal={SIAM Journal on Matrix Analysis and Applications},
  volume={42},
  number={3},
  pages={1199--1228},
  year={2021},
  doi={10.1137/20m1314471},
  url={https://doi.org/10.1137/20m1314471},
  publisher={SIAM}
}
@article{sobczyk2022pylspack,
  title={pylspack: Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores},
  author={Sobczyk, Aleksandros and Gallopoulos, Efstratios},
  journal={ACM Transactions on Mathematical Software},
  volume={48},
  number={4},
  pages={1--27},
  year={2022},
  publisher={ACM New York, NY}
}

Usage

A simple usage example to compute the leverage scores of a sparse csr_matrix with the ls_via_inv_gram method.

from scipy.sparse import random
from pylspack.leverage_scores import ls_via_inv_gram
A = random(10000, 20, density=0.3, format='csr')
q = ls_via_inv_gram(A)

Python API - Quick reference

Basic linear algebra kernels: pylspack.linalg_kernels

Fuction Description
csrcgs(A, m, r) Given a csr_matrix A, compute the product GSA where G is a matrix with standard normal random elements and S is a CountSketch transform (G can be omitted, in which case only SA is returned).
csrjlt(A, m) Given a csr_matrix A, compute the product GA where G is as above.
csrrk(alpha, A, beta, C) Given a csr_matrix A and a matrix C, computes C <- alpha * A' * A + beta * C (C is updated inplace).
csrsqn(alpha, A, B, beta, x) Given a csr_matrix A and a matrix B, computes the squared Euclidean norms of the rows of the matrix C = A * B, without explicitly forming C, and store the result in the vector x (x is updated inplace).
rmcgs(A, m, r) Same as csrcgs, but in this case A is a dense matrix in Row-Major (C_CONTIGUOUS) format.
rmdsc(B, D) Given matrix B and a diagonal matrix D stored as a vector, computes the product B <- B * D (B is updated inplace).
rmsqn(alpha, A, B, beta, x) Same as csrsqn, but in this case A is a dense matrix in Row-Major (C_CONTIGUOUS) format.
gemm(alpha, A, B, beta, C) Simplified version of standard BLAS dgemm. NOTE: this method was implemented only to be used internally by the higher level methods, to avoid external dependencies in the C++ API. It is highly recommended to use the scipy.linalg.blas.dgemm instead, which is faster and more flexible!
scale(B, alpha) Compute B <- B * alpha for some matrix B and some scalar alpha (B is updated inplace).
set_randn(B) Fill the elements of a matrix B with random elements from the standard normal distribution (B is updated inplace).
set_value(B, value) Fill the elements of a matrix B with fixed given float value (B is updated inplace).

High level algorithms: pylspack.leverage_scores

Function Description
sample_columns(A, rcond, m, r) Select a subset of columns of A, such that the dominant-k subspace of the subset will be close to the dominant-k subspace of A. m and r define the parameters for the underlying call to csrcgs/rmcgs, and rcond is used as a threshold for the small singular values to determine the numerical rank k and the subspace dimension.
ls_via_inv_gram(A, rcond) Compute the leverage scores of the best rank-k approximation of A, which is determined based on rcond.
ls_via_sketched_svd(A, rcond, m, r1, r2) Compute the leverage scores of the best rank-k approximation of A approximately using sketching (rmcgs/csrcgs). m, r1, r2 define the sketching dimensions.
ls_hrn_exact(A, rcond, m, r) Compute the leverage scores of A using sample_columns as a first step and then calling ls_via_inv_gram on the selected column subset.
ls_hrn_approx(A, rcond, m, r, m_ls, r1_ls, r2_ls) Compute the leverage scores of A approximately using sample_columns as a first step and then calling ls_via_sketched_svd on the selected column subset.

Install

Requirements:

  • C++11 capable compiler
  • CMake >= 3.11
  • OpenMP >= 4.0 (simd directives)
  • Numpy and SciPy (ideally with OpenMP support)

The installation will build the C/C++ code to generate the shared library that is used by the python wrappers. NOTE: In order to keep the code architecture-independent the compiler optimization flags are kept as generic as possible. In order to apply additional optimization flags, simply add them in the ${PYLSPACK_ADDITIONAL_CMAKE_CXX_FLAGS} environment variable prior to executing pip install:

python3 -m venv venv
source venv/bin/activate
# Optional step to add more optimization flags:
# export PYLSPACK_ADDITIONAL_CMAKE_CXX_FLAGS="-march=native "
pip install git+https://github.com/IBM/pylspack

Testing

To run the tests:

python3 -m pip install -r test_requirements.txt
cd test
python3 -m pytest -svvv .
# If you get an error about liblinalg_kernels.so, do the following:
# PYLSPACK_LOCATION="$(pip show pylspack | grep Location: | awk '{print $2}')/pylspack/"
# export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${PYLSPACK_LOCATION}

Contributing

See CONTRIBUTING.md.

Tips

It is recommended to always set OMP_NUM_THREADS, OMP_PLACES and OMP_PROC_BIND appropriately for better performance. For more details see the official documentation of OpenMP.

pylspack's People

Contributors

asobczyk avatar stevemar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

ghas-results

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.