Giter VIP home page Giter VIP logo

rapidsai / cuvs Goto Github PK

View Code? Open in Web Editor NEW
87.0 10.0 29.0 2.92 MB

cuVS - a library for vector search and clustering on the GPU

Home Page: https://rapids.ai

License: Apache License 2.0

Dockerfile 0.03% Shell 1.86% Python 4.43% CMake 1.78% C++ 12.48% Cuda 9.23% HTML 0.12% C 3.88% Jupyter Notebook 61.61% Cython 2.16% Rust 2.43%
anns clustering cuda distance gpu information-retrieval llm machine-learning nearest-neighbors neighborhood-methods

cuvs's Introduction

 cuVS: Vector Search and Clustering on the GPU

Note

cuVS is a new library mostly derived from the approximate nearest neighbors and clustering algorithms in the RAPIDS RAFT library of data mining primitives. RAPIDS RAFT currently contains the most fully-featured versions of the approximate nearest neighbors and clustering algorithms in cuVS. We are in the process of migrating the algorithms from RAFT to cuVS, but if you are unsure of which to use, please consider the following:

  1. RAFT contains C++ and Python APIs for all of the approximate nearest neighbors and clustering algorithms.
  2. cuVS contains a growing support for different languages, including C, C++, Python, and Rust. We will be adding more language support to cuVS in the future but will not be improving the language support for RAFT.
  3. Once all of RAFT's approximate nearest neighbors and clustering algorithms are moved to cuVS, the RAFT APIs will be deprecated and eventually removed altogether. Once removed, RAFT will become a lightweight header-only library. In the meantime, there's no harm in using RAFT if support for additional languages is not needed.

Contents

  1. Useful Resources
  2. What is cuVS?
  3. Installing cuVS
  4. Getting Started
  5. Contributing
  6. References

Useful Resources

What is cuVS?

cuVS contains state-of-the-art implementations of several algorithms for running approximate nearest neighbors and clustering on the GPU. It can be used directly or through the various databases and other libraries that have integrated it. The primary goal of cuVS is to simplify the use of GPUs for vector similarity search and clustering.

Installing cuVS

cuVS comes with pre-built packages that can be installed through conda. Different packages are available for the different languages supported by cuVS:

Python C++ C
pycuvs libcuvs libcuvs_c

Stable release

It is recommended to use mamba to install the desired packages. The following command will install the Python package. You can substitute pycuvs for any of the packages in the table above:

mamba install -c conda-forge -c nvidia -c rapidsai pycuvs

Nightlies

If installing a version that has not yet been released, the rapidsai channel can be replaced with rapidsai-nightly:

mamba install -c conda-forge -c nvidia -c rapidsai-nightly pycuvs=24.06

Please see the Build and Install Guide for more information on installing cuVS and building from source.

Getting Started

The following code snippets train an approximate nearest neighbors index for the CAGRA algorithm.

Python API

from cuvs.neighbors import cagra

dataset = load_data()
index_params = cagra.IndexParams()

index = cagra.build_index(build_params, dataset)

C++ API

#include <cuvs/neighbors/cagra.hpp>

using namespace cuvs::neighbors;

raft::device_matrix_view<float> dataset = load_dataset();
raft::device_resources res;

cagra::index_params index_params;

auto index = cagra::build(res, index_params, dataset);

For more example of the C++ APIs, refer to cpp/examples directory in the codebase.

C API

#include <cuvs/neighbors/cagra.h>

cuvsResources_t res;
cuvsCagraIndexParams_t index_params;
cuvsCagraIndex_t index;

DLManagedTensor *dataset;
load_dataset(dataset);

cuvsResourcesCreate(&res);
cuvsCagraIndexParamsCreate(&index_params);
cuvsCagraIndexCreate(&index);

cuvsCagraBuild(res, index_params, dataset, index);

cuvsCagraIndexDestroy(index);
cuvsCagraIndexParamsDestroy(index_params);
cuvsResourcesDestroy(res);

Rust API

use cuvs::cagra::{Index, IndexParams, SearchParams};
use cuvs::{ManagedTensor, Resources, Result};

use ndarray::s;
use ndarray_rand::rand_distr::Uniform;
use ndarray_rand::RandomExt;

/// Example showing how to index and search data with CAGRA
fn cagra_example() -> Result<()> {
    let res = Resources::new()?;

    // Create a new random dataset to index
    let n_datapoints = 65536;
    let n_features = 512;
    let dataset =
        ndarray::Array::<f32, _>::random((n_datapoints, n_features), Uniform::new(0., 1.0));

    // build the cagra index
    let build_params = IndexParams::new()?;
    let index = Index::build(&res, &build_params, &dataset)?;
    println!(
        "Indexed {}x{} datapoints into cagra index",
        n_datapoints, n_features
    );

    // use the first 4 points from the dataset as queries : will test that we get them back
    // as their own nearest neighbor
    let n_queries = 4;
    let queries = dataset.slice(s![0..n_queries, ..]);

    let k = 10;

    // CAGRA search API requires queries and outputs to be on device memory
    // copy query data over, and allocate new device memory for the distances/ neighbors
    // outputs
    let queries = ManagedTensor::from(&queries).to_device(&res)?;
    let mut neighbors_host = ndarray::Array::<u32, _>::zeros((n_queries, k));
    let neighbors = ManagedTensor::from(&neighbors_host).to_device(&res)?;

    let mut distances_host = ndarray::Array::<f32, _>::zeros((n_queries, k));
    let distances = ManagedTensor::from(&distances_host).to_device(&res)?;

    let search_params = SearchParams::new()?;

    index.search(&res, &search_params, &queries, &neighbors, &distances)?;

    // Copy back to host memory
    distances.to_host(&res, &mut distances_host)?;
    neighbors.to_host(&res, &mut neighbors_host)?;

    // nearest neighbors should be themselves, since queries are from the
    // dataset
    println!("Neighbors {:?}", neighbors_host);
    println!("Distances {:?}", distances_host);
    Ok(())
}

Contributing

If you are interested in contributing to the cuVS library, please read our Contributing guidelines. Refer to the Developer Guide for details on the developer guidelines, workflows, and principles.

References

When citing cuVS generally, please consider referencing this Github repository.

@misc{rapidsai,
  title={Rapidsai/cuVS: Vector Search and Clustering on the GPU.},
  url={https://github.com/rapidsai/cuvs},
  journal={GitHub},
  publisher={Nvidia RAPIDS},
  author={Rapidsai},
  year={2024}
}

If citing CAGRA, please consider the following bibtex:

@misc{ootomo2023cagra,
      title={CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs},
      author={Hiroyuki Ootomo and Akira Naruse and Corey Nolet and Ray Wang and Tamas Feher and Yong Wang},
      year={2023},
      eprint={2308.15136},
      archivePrefix={arXiv},
      primaryClass={cs.DS}
}

If citing the k-selection routines, please consider the following bibtex:

@proceedings{10.1145/3581784,
    title = {SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
    year = {2023},
    isbn = {9798400701092},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    abstract = {Started in 1988, the SC Conference has become the annual nexus for researchers and practitioners from academia, industry and government to share information and foster collaborations to advance the state of the art in High Performance Computing (HPC), Networking, Storage, and Analysis.},
    location = {, Denver, CO, USA, }
}

If citing the nearest neighbors descent API, please consider the following bibtex:

@inproceedings{10.1145/3459637.3482344,
    author = {Wang, Hui and Zhao, Wan-Lei and Zeng, Xiangxiang and Yang, Jianye},
    title = {Fast K-NN Graph Construction by GPU Based NN-Descent},
    year = {2021},
    isbn = {9781450384469},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3459637.3482344},
    doi = {10.1145/3459637.3482344},
    abstract = {NN-Descent is a classic k-NN graph construction approach. It is still widely employed in machine learning, computer vision, and information retrieval tasks due to its efficiency and genericness. However, the current design only works well on CPU. In this paper, NN-Descent has been redesigned to adapt to the GPU architecture. A new graph update strategy called selective update is proposed. It reduces the data exchange between GPU cores and GPU global memory significantly, which is the processing bottleneck under GPU computation architecture. This redesign leads to full exploitation of the parallelism of the GPU hardware. In the meantime, the genericness, as well as the simplicity of NN-Descent, are well-preserved. Moreover, a procedure that allows to k-NN graph to be merged efficiently on GPU is proposed. It makes the construction of high-quality k-NN graphs for out-of-GPU-memory datasets tractable. Our approach is 100-250\texttimes{} faster than the single-thread NN-Descent and is 2.5-5\texttimes{} faster than the existing GPU-based approaches as we tested on million as well as billion scale datasets.},
    booktitle = {Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
    pages = {1929–1938},
    numpages = {10},
    keywords = {high-dimensional, nn-descent, gpu, k-nearest neighbor graph},
    location = {Virtual Event, Queensland, Australia},
    series = {CIKM '21}
}

cuvs's People

Contributors

achirkin avatar ayodeawe avatar bdice avatar benfred avatar cjnolet avatar dantegd avatar divyegala avatar galipremsagar avatar hurutoriya avatar kylefromnvidia avatar raydouglass avatar robertmaynard avatar trxcllnt avatar viclafargue avatar yinze00 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cuvs's Issues

[FEA] Investigate NvRTC / Jitify for compiling / instantiating expensive kernels at runtime.

Two of the largest complaints about RAFT's vector search layer is that libraft is 1) huge, and 2) takes forever to compile. There's been a lot of great progress to reduce the compile time and binary size but some of that has been simply from reducing the amount of supported types.

We should investigate JIT compiling some of these things. Especially for index builds, which usually take some time anyways, the user could be willing to suffer a one-time penalty to compile the necessary kernels so that we can shrink the compile time and footprint of the binary as much as possible.

The other benefit to this more subtle but extremely powerful, as the user would only need to JIT compile for their local architecture, all the while cuVS would still be portable across compute architectures. This means for these code paths, we can avoid the 6+ architectures needing to be pre-compiled, and so the binary and compile times shrink by roughly that factor.

I propose we explore JIT compiling entire code paths from end-to-end (like CAGRA index build vs search for a specific set of types). Of course, it's important that these instantiations be available for the tests, but as most users are not downloading the test packages, it gives us some leeway to transfer compile times and binary footprint to the tests while still keeping the main conda package very lightweight.

Another option we can explore is providing 2 conda packages-

  1. one that is very lightweight and uses primarily JIT
  2. Another that has a lot of the expensive bits pre-compiled but leaves some of the custom type specializations to JIT

This would further allow users to determine which is more important to them.

A few folks have brought this up in the past and I wanted to capture this in an issue to start collecting thoughts from the team and the community.

The GraphBLAS has used this pattern successfully, and has adopted the pattern to fully JIT all of the kernels.

Tagging some of the folks who I've discussed this with in the past:

@jeaton32 @tfeher @divyegala @benfred @lowener @achirkin @robertmaynard @jrhemstad @leofang @vyasr @bdice @dantegd @trxcllnt @

[DOC] README Update

Report incorrect documentation

Location of incorrect documentation

cuvs/README.md

Lines 41 to 52 in 26377ef

### Stable release
It is recommended to use [mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html) to install the desired packages. The following command will install the Python package. You can substitute `pycuvs` for any of the packages in the table above:
```bash
mamba install -c conda-forge -c nvidia -c rapidsai pycuvs
```
### Nightlies
If installing a version that has not yet been released, the `rapidsai` channel can be replaced with `rapidsai-nightly`:
```bash
mamba install -c conda-forge -c nvidia -c rapidsai-nightly pycuvs=24.06
```

Describe the problems or issues found in the documentation
The installation guide incorrectly has users install pycuvs

Suggested fix for documentation
Update to cuvs.

While making this update, consider adding sample pip installation commands.

[FEA] Python API

This should rely on pylibraft and the c-api as much as possible. Since cuvs is not going to have a header-only component, we will always be dependent upon libcuvs (and libcuvs++ at this layer.

[FEA] Pairwise distances to support int64_t internally (and downcast as necessary).

While migrating the pairwise distance API from RAFT, we realized that it's still accepting mdspans w/ indexing type of uint32_t. In order to keep our public facing APIs consistent in the meantime, we're accepting mdspan with int64_t index type on the outer API layer for now and then constructing new mdspans with the uint32_t indexing type for the call to the pairwise distances underneath.

This is a short-term fix and we should revisit this sometime in the near(ish) future to support int64_t efficiently in the pairwise distances layer itself.

[FEA] Rust API

Rust API has been highly requested by users. Ideally this would be a wrapper around the C-API

[FEA] Java API

This would unlock a lot of capability for our users. There are quite a bit of important libraries (such as Lucene, for example) that would benefit from a direct Java interface.

[FEA] Explore potential for incremental/online PCA to improve performance of layer0 (often brute-force) index

This could use online PCA or a set of centroids could be trained up front to make online pq possible. The idea here is to reduce the time between vectors being received in a vector database and being fully searchable.

In most cases, we should not have to wait for enough vectors to train a full ANN index and brute-force should be able to satisfy searching in the meantime. As brute-force is exhaustive, we have 2 basic options for being able to improve the performance:

  1. We prune distances using an IVF method. Random ball cover could also be a great drop-in replacement for bfknn.

  2. We reduce the number of dimensions. Incremental/online PCA could help reduce the concept drift over time while still providing a very fast mechanism for reducing the dimensionality significantly. Since these newly inserted vectors are often shoved in smaller segment files until merged together into a larger index, we could investigate something like random projections or rp-trees, which are great for smaller chunks of data (usually on the order of 1M-10M) and can maintain distance error within a provable epsilon (which is based only on the number of vectors).

The primary goal here is to reduce the time between a vector being sent to a vector db and it being fully searchable. We also need to do this at scale.

[EXAMPLE] Hybrid search with cuVS + cuDF

Hybried search is an important capability that arises in end-to-end vector databases. We have been asked now by several users whether or not metadata can be stored alongside the vectors and the usual answer is "it can, but you'd want to use soemthing like cuDF/Pandas for that". Databases like LanceDB do soemthing similar, for example, and similar to cuDF, they offer filtering / reductions / groupbys but not a first-class SQL dialect.

cuVS allows for pre-filtered search, which is often a requirement for hybrid search. Post-filtered search can also be preferred under certain circumstances, though this is often something users do themselves downstream.

cuDF offers a really powerful metadata management API and we should explore using it to implement hybrid search capabilities. At the very least, we can provide an example notebook and maybe compare cuVS+cuDF side-by-side with HNSWlib+Pandas for a direct GPU/CPU comparison. If this capability proves useful enough, we could even implement a first-class API within cuVS itself to perform hybrid search.

[FEA] API to merge indexes

As we expand cuVS to different storage and database libraries, we are consistently seeing a need for efficient merges of one index (and potential one index type) into another.

FAISS has an API for efficiently merging indexes. We should have an API for this also and we should eventually back the GPU portion of the FAISS index merge API with RAFT/cuVS.

[FEA] cuVS C-API

Epic-level tracking issue for cuVS C-API

  • handle
  • CAGRA
  • IVF-Flat
  • IVF-PQ
  • Brute-force
  • RBC

(Please add / update the above list as needed)

[FEA] Port CAGRA Runtime API from RAFT

Initially we should use the existing raft::runtime APIs so that existing RAFT users can easily migrate to cuVS. We will be revisiting the c++ APIs over the spring and summer to provide a more refined experience and eventually remove libraft altogether.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.