Giter VIP home page Giter VIP logo

ngt-rs's Introduction

Tools

Libraries

Experiments

ngt-rs's People

Contributors

cjrh avatar lerouxrgd avatar sufflope avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ngt-rs's Issues

fatal error: 'NGT/Capi.h' file not found

Hi @lerouxrgd

I tried building ngt-rs today and it failed. This is the error I see:

  /home/caleb/tmp/ngt-rs/target/debug/build/ngt-sys-8ce361414492c0ac/out/include/NGT/NGTQ/Capi.h:107:10: fatal error: 'NGT/Capi.h' file not found

I checked, and the file NGT/Capi.h is present:

~/tmp/ngt-rs/target/debug/build/ngt-sys-8ce361414492c0ac/out/include/NGT
$ ls -lah
Permissions Size User  Date Modified Name
.rw-r--r--  6.2k caleb  3 Sep 21:44  SharedMemoryAllocator.h
.rw-r--r--   13k caleb  3 Sep 21:44  Tree.h
.rw-r--r--  1.4k caleb  3 Sep 21:44  Version.h
.rw-r--r--   14k caleb  3 Sep 21:44  ObjectRepository.h
.rw-r--r--   18k caleb  3 Sep 21:44  Node.h
.rw-r--r--  7.7k caleb  3 Sep 21:44  Thread.h
.rw-r--r--  5.8k caleb  3 Sep 21:44  ArrayFile.h
.rw-r--r--   20k caleb  3 Sep 21:44  MmapManagerImpl.hpp
.rw-r--r--  2.6k caleb  3 Sep 21:44  MmapManager.h
.rw-r--r--   17k caleb  3 Sep 21:44  ObjectSpace.h
.rw-r--r--   54k caleb  3 Sep 21:44  Optimizer.h
.rw-r--r--   860 caleb  3 Sep 21:44  MmapManagerException.h
.rw-r--r--  2.4k caleb  3 Sep 21:44  MmapManagerDefs.h
.rw-r--r--   29k caleb  3 Sep 21:44  Clustering.h
.rw-r--r--  7.9k caleb  3 Sep 21:44  Capi.h
.rw-r--r--   56k caleb  3 Sep 21:44  Common.h
.rw-r--r--   30k caleb  3 Sep 21:44  GraphReconstructor.h
.rw-r--r--  149k caleb  3 Sep 21:44  half.hpp
.rw-r--r--   22k caleb  3 Sep 21:44  GraphOptimizer.h
.rw-r--r--  4.1k caleb  3 Sep 21:44  Command.h
.rw-r--r--   40k caleb  3 Sep 21:44  Graph.h
.rw-r--r--   254 caleb  3 Sep 21:44  version_defs.h
.rw-r--r--   36k caleb  3 Sep 21:44  PrimitiveComparator.h
.rw-r--r--   31k caleb  3 Sep 21:44  ObjectSpaceRepository.h
.rw-r--r--   59k caleb  3 Sep 21:44  Index.h
.rw-r--r--  2.6k caleb  3 Sep 21:44  HashBasedBooleanSet.h
.rw-r--r--  1.7k caleb  3 Sep 21:44  defines.h
drwxrwxr-x     - caleb  3 Sep 21:45  NGTQ

It seems to me some kind of path problem. Do you have any suggestions for what I can try to fix this, or is it something that must be changed in ngt-rs?

For completeness, I've attached the full build output.

err.log

Can cargo run, can't cargo build

Hi @lerouxrgd,

When using cargo run and cargo run --release, I can use the ngt-rs crate without any issues. Search works and it's really fast. However, when I use cargo build --release and run the binary, I get the following error:

error while loading shared libraries: libngt.so.1: cannot open shared object file: No such file or directory

I'm not sure if this is an issue with ngt-rs, an issue with the underlying NGT, or an issue with my setup. Do you have any thoughts on this?

Ubuntu 18.04, ngt = "0.4.0"

Radius parameter for search is not exposed

Hi @lerouxrgd

I was interested to use the radius parameter for search, but I see that it is only exposed in QGQuery and not for regular search:

image

Was there a specific reason for that, or is it just waiting for someone to make a PR to expose that?

Statically link ngt?

The idea

Looking in /lib after a build, I see libngt.a:

$ fd --hidden --no-ignore --glob libngt*
...
target/debug/build/ngt-sys-a9e71d1b7f68537e/out/lib/libngt.a
target/debug/build/ngt-sys-a9e71d1b7f68537e/out/lib/libngt.so
target/debug/build/ngt-sys-a9e71d1b7f68537e/out/lib/libngt.so.1
target/debug/build/ngt-sys-a9e71d1b7f68537e/out/lib/libngt.so.1.14.7

So I was wondering whether it would be possible to link ngt statically. This would remove the need to have to put libngt.so in a place where executables can find it.

Naively, I changed a line in build.rs from this:

    println!("cargo:rustc-link-lib=dylib=ngt");

to this:

    println!("cargo:rustc-link-lib=static=ngt");

This fails

After making the change to build.rs, this error occurs when running cargo build:

   <snip>
   Compiling proc-macro-crate v1.2.1
   Compiling num_enum_derive v0.5.7
   Compiling ngt-sys v1.14.8 (/home/caleb/tmp/ngt-rs/ngt-sys)
   Compiling num_enum v0.5.7
   Compiling ngt v0.4.4 (/home/caleb/tmp/ngt-rs)
error[E0425]: cannot find function `ngt_get_number_of_objects` in crate `sys`
   --> src/index.rs:326:23
    |
326 |         unsafe { sys::ngt_get_number_of_objects(self.index, self.ebuf) }
    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `sys`

error[E0425]: cannot find function `ngt_get_number_of_indexed_objects` in crate `sys`
   --> src/index.rs:331:23
    |
331 |         unsafe { sys::ngt_get_number_of_indexed_objects(self.index, self.ebuf) }
    |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in `sys`

For more information about this error, try `rustc --explain E0425`.
error: could not compile `ngt` due to 2 previous errors
warning: build failed, waiting for other jobs to finish...

Further investigation

Checking the file sizes of the ngt build artifacts:

$ ls -lah target/debug/build/ngt-sys-a9e71d1b7f68537e/out/lib/
Permissions Size User  Date Modified Name
.rw-r--r--   56M caleb  4 Sep 19:17  libngt.a
.rw-r--r--   22M caleb  4 Sep 19:18  libngt.so.1.14.7
lrwxrwxrwx    16 caleb  4 Sep 19:18  libngt.so.1 -> libngt.so.1.14.7
lrwxrwxrwx    11 caleb  4 Sep 19:18  libngt.so -> libngt.so.1

We see libngt.a is ~ 56 M. Checking for the rlib libraries produced by rust:

$ ll target/debug/deps/ | rg ngt
.rw-rw-r--   720 caleb  4 Sep 19:18  ngt_sys-88de9e97cd094267.d
.rw-rw-r--  206k caleb  4 Sep 19:18  libngt_sys-88de9e97cd094267.rmeta
.rw-rw-r--   485 caleb  4 Sep 19:18  ngt-1421cbd3da3a38b1.d
.rw-rw-r--   57M caleb  4 Sep 19:18  libngt_sys-88de9e97cd094267.rlib

We see that libngt_sys-88de9e97cd094267.rlib size is around ~ 57 MB, suggesting that libngt.a has been linked into it?

That's as far as I can go for now, but hopefully we can figure out a way to link ngt statically?

file descriptor leak on `index.build`

calling index.build successively seems to cause a file descriptor leak.

When calling insert and build many times (~120) some file is not closed and this causes ngt to crash. Below is the output of a small test that inserts and builds.

inserted vector with id 107
inserted vector with id 108
inserted vector with id 109
inserted vector with id 110
inserted vector with id 111
inserted vector with id 112
inserted vector with id 113
Error: Error("Capi : ngt_save_index() : Error: /Users/drbh/Projects/ngt-rs/ngt-sys/NGT/lib/NGT/ObjectRepository.h:47: NGT::ObjectSpace: Cannot open the specified file /var/folders/0n/s24tgvhd60xghdtn1wz4z5wm0000gn/T/.tmp87SDPW/obj.")

---- index::tests::test_multithreaded stdout ----
Error: Custom { kind: Uncategorized, error: PathError { path: "/var/folders/0n/s24tgvhd60xghdtn1wz4z5wm0000gn/T/.tmpdwhCs3", err: Os { code: 24, kind: Uncategorized, message: "Too many open files" } } }


failures:
    index::tests::test_incremental_insert_and_build
    index::tests::test_multithreaded

test to reproduce this leak is available here: #12

TLDR the test;

for _ in 0..120 {
    let vec = vec![1.0, 2.0, 3.0];
    let id = index.insert(vec.clone())?;
    println!("inserted vector with id {}", id);

    // Build and persist the index
    index.build(1)?; // <------------------- LEAKS HERE
    index.persist()?;
}

I believe this is a file descriptor issue because

  1. this causes the multithread test to fail with "Too many open files"
  2. running lsof on a binary that calls insert and build shows a growing number of /dev/null and /dev/ttys006 files

I'm not exactly sure where this issue originates, and it does not appear that other ngt bindings explicitly close files after calling build (create_index), however I may be missing something simple that will close these files and avoid the crash.

Please let me know if I can provide any more information!

macOS cannot compile

Hello Team,

I have the following error in MacOS:

error: failed to run custom build command for ngt-sys v1.14.8-static (/Users/jianshuzhao/Github/ngt-rs/ngt-sys)

Caused by:
process didn't exit successfully: /Users/jianshuzhao/Github/ngt-rs/target/release/build/ngt-sys-01413d2ad24e15fd/build-script-build (signal: 6, SIGABRT: process abort signal)
--- stdout
CMAKE_TOOLCHAIN_FILE_aarch64-apple-darwin = None
CMAKE_TOOLCHAIN_FILE_aarch64_apple_darwin = None
HOST_CMAKE_TOOLCHAIN_FILE = None
CMAKE_TOOLCHAIN_FILE = None
CMAKE_GENERATOR_aarch64-apple-darwin = None
CMAKE_GENERATOR_aarch64_apple_darwin = None
HOST_CMAKE_GENERATOR = None
CMAKE_GENERATOR = None
CMAKE_PREFIX_PATH_aarch64-apple-darwin = None
CMAKE_PREFIX_PATH_aarch64_apple_darwin = None
HOST_CMAKE_PREFIX_PATH = None
CMAKE_PREFIX_PATH = None
CMAKE_aarch64-apple-darwin = None
CMAKE_aarch64_apple_darwin = None
HOST_CMAKE = None
CMAKE = None
running: cd "/Users/jianshuzhao/Github/ngt-rs/target/release/build/ngt-sys-cbc11c3c769b2631/out/build" && CMAKE_PREFIX_PATH="" "cmake" "/Users/jianshuzhao/Github/ngt-rs/ngt-sys/NGT" "-DCMAKE_INSTALL_PREFIX=/Users/jianshuzhao/Github/ngt-rs/target/release/build/ngt-sys-cbc11c3c769b2631/out" "-DCMAKE_C_FLAGS= -ffunction-sections -fdata-sections -fPIC -arch arm64" "-DCMAKE_C_COMPILER=/usr/local/bin/cc" "-DCMAKE_CXX_FLAGS= -ffunction-sections -fdata-sections -fPIC -arch arm64" "-DCMAKE_CXX_COMPILER=/usr/local/bin/c++" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -arch arm64" "-DCMAKE_ASM_COMPILER=/usr/local/bin/cc" "-DCMAKE_BUILD_TYPE=Release"
-- VERSION: 1.14.7
-- CMAKE_BUILD_TYPE: Release
-- CMAKE_BUILD_TYPE_LOWER: release
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/jianshuzhao/Github/ngt-rs/ngt-sys/NGT
running: cd "/Users/jianshuzhao/Github/ngt-rs/target/release/build/ngt-sys-cbc11c3c769b2631/out/build" && "cmake" "--build" "." "--target" "install" "--config" "Release" "--parallel" "10"

--- stderr
CMake Warning (dev):
Policy CMP0068 is not set: RPATH settings on macOS do not affect
install_name. Run "cmake --help-policy CMP0068" for policy details. Use
the cmake_policy command to set the policy and suppress this warning.

For compatibility with older versions of CMake, the install_name fields for
the following targets are still affected by RPATH settings:

 ngt

This warning is for project developers. Use -Wno-dev to suppress it.

Error: could not load cache
thread 'main' panicked at '
command did not execute successfully, got: exit status: 1

build script failed, must exit now', /Users/jianshuzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/cmake-0.1.48/src/lib.rs:975:5
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5

Thanks,

Jianshu

Question: how to think about search `radius` when using `NormalizedCosine` distance type

Hi @lerouxrgd!

I hope this is an easy one to answer.

I have simple code that does the following:

  • create an index with distance type NormalizedCosine, dimension 3
  • add two vectors
  • perform a search with the radius setting.

I am trying to understand how the radius numerically affects the search results. I am also asking about whether the normalization is fully handled for me, or whether I need to do my own normalization on search vector, for example.

Basic code, for discussion, looks something like this:

        // Create a new index
        let prop = NgtProperties::<f32>::dimension(3)?
        .creation_edge_size(10)?
        .search_edge_size(40)?
            .distance_type(NgtDistance::NormalizedCosine)?;

        let temp_dir_p = std::env::temp_dir();
        let temp_dir = temp_dir_p.to_string_lossy();
        let index_path = format!("{temp_dir}/ngttest");
        std::fs::remove_dir_all(&index_path).unwrap_or_else(|e| {
            println!("Got error removing dir: {}", e);
        });

        let _index = NgtIndex::create(&index_path, prop)?;

        // Open an existing index
        let mut index = NgtIndex::open(&index_path)?;

        // Insert two vectors and get their id
        let vec1 = vec![1.0, 2.0, 3.0];
        let vec2 = vec![4.0, 5.0, 6.0];
        let id1 = index.insert(vec1)?;
        let id2 = index.insert(vec2)?;

        // Actually build the index (not yet persisted on disk)
        // This is required in order to be able to search vectors
        index.build(2)?;

        // Perform a search with a specific radius
        use ngt::NgtQuery;
        let query = NgtQuery::new(&[1.1, 2.1, 3.1])
            .size(10)
            .radius(0.004);                                          // <--------------- How to set this?
        let res = index.search_query(query)?;
        println!("radius res {:?}", &res);
        assert_eq!(res.len(), 1);

These are my questions:

  • The vectors I am adding to the index are not normalized. Is this correct? I saw a comment on the NGT repo that said that normalization is automatic, but the comment was referring to a python code example and I'm not sure whether the python wrapper, or ngt-rs, is performing an additional normalization or not.
  • My search vector in the code above is not normalized. Is that correct? I did a few tests to replace that vector [1.1, 2.1, 3.1] with its normalized version [0.5280169 , 0.57601843, 0.62401997] and I got very different behaviour in the search results using the same radius values, which leads me to think I don't understand how the normalization works.
  • I know that the behaviour of radius is defined by the upstream NGT library so it isn't really a question for you, but since I'm here anyway: Is there a simple way I can reason about the quantitative value of the radius parameter for the NormalizedCosine distance type?

How to reconstruct ANNG?

I guess ngt::optim::refine_anng is the one, but I don't know how to use it...
I tried to put ngt::optim::AnngRefineParams::default() into the second argument but the system crushed.

Error ngt_remove_index() only with normalized distance types

Summary

I've come across a problem when calling index.remove(id: VecId), but it only happens with certain distance types:

  • NormalizedAngle
  • NormalizedCosine
  • NormalizedL2

The common theme seems to be that these are normalized?

This is the error that is produced:

Error: Capi : ngt_remove_index() : Error: /.../ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

Reproducer

I've made a basic project with test cases:

ngtbug.zip

These are the versions in use:

[dependencies]
anyhow = "1.0.75"
ngt = "0.6.1"
uuid = { version = "1.5.0", features = ["v4"] }

and in the lock file, ngt-sys is at 2.1.3.

This is test case, to explain what is happening:

#[cfg(test)]
mod tests {
    use anyhow::Result;
    use ngt::{NgtIndex, NgtDistance, NgtProperties, EPSILON};

    fn run_test(dist: NgtDistance) -> Result<()> {
        // Create a new index
        let prop = NgtProperties::<f32>::dimension(3)?
            .distance_type(dist)?;

        // Temp dir for tests
        let temp_dir_p = std::env::temp_dir();
        let temp_dir = temp_dir_p.to_string_lossy();
        let u = uuid::Uuid::new_v4();
        let u = u.as_simple();
        let index_path = format!("{temp_dir}/{u}");
        std::fs::remove_dir_all(&index_path).ok();

        let mut index = NgtIndex::create(&index_path, prop)?;

        // Insert two vectors and get their id
        let vec1 = vec![1.0, 2.0, 3.0];
        let vec2 = vec![4.0, 5.0, 6.0];
        let id1 = index.insert(vec1)?;
        let id2 = index.insert(vec2)?;
        println!("id1: {}", id1);
        println!("id2: {}", id2);

        // Actually build the index (not yet persisted on disk)
        // This is required in order to be able to search vectors
        index.build(2)?;
        index.persist()?;

        // Perform a vector search (with 1 result)
        let res = index.search(&[1.1, 2.1, 3.1], 1, EPSILON)?;
        assert_eq!(res[0].id, id1);

        // PROBLEM HERE
        index.remove(id1)?;

        // Cleanup - remove the index path
        std::fs::remove_dir_all(&index_path)?;

        Ok(())
    }

    #[test]
    fn test01_L1() -> Result<()> {
        run_test(NgtDistance::L1)
    }

    #[test]
    fn test02_L2() -> Result<()> {
        run_test(NgtDistance::L2)
    }

    #[test]
    fn test03_Angle() -> Result<()> {
        run_test(NgtDistance::Angle)
    }

    #[test]
    fn test04_Hamming() -> Result<()> {
        run_test(NgtDistance::Hamming)
    }

    #[test]
    fn test05_Cosine() -> Result<()> {
        run_test(NgtDistance::Cosine)
    }

    #[test]
    fn test06_NormalizedAngle() -> Result<()> {
        run_test(NgtDistance::NormalizedAngle)
    }

    #[test]
    fn test07_NormalizedCosine() -> Result<()> {
        run_test(NgtDistance::NormalizedCosine)
    }

    #[test]
    fn test08_Jaccard() -> Result<()> {
        run_test(NgtDistance::Jaccard)
    }

    #[test]
    fn test09_SparseJaccard() -> Result<()> {
        run_test(NgtDistance::SparseJaccard)
    }

    #[test]
    fn test10_NormalizedL2() -> Result<()> {
        run_test(NgtDistance::NormalizedL2)
    }

    #[test]
    fn test11_Poincare() -> Result<()> {
        run_test(NgtDistance::Poincare)
    }

    #[test]
    fn test12_Lorentz() -> Result<()> {
        run_test(NgtDistance::Lorentz)
    }
}

This is the output from the test run:

$ cargo test -- --test-threads=1
   Compiling ngtbug v0.1.0 (/home/caleb/tmp/ngtbug)

running 12 tests
test tests::test01_L1 ... ok
test tests::test02_L2 ... ok
test tests::test03_Angle ... ok
test tests::test04_Hamming ... ok
test tests::test05_Cosine ... ok
test tests::test06_NormalizedAngle ... FAILED
test tests::test07_NormalizedCosine ... FAILED
test tests::test08_Jaccard ... ok
test tests::test09_SparseJaccard ... FAILED
test tests::test10_NormalizedL2 ... FAILED
test tests::test11_Poincare ... ok
test tests::test12_Lorentz ... FAILED

failures:

---- tests::test06_NormalizedAngle stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

---- tests::test07_NormalizedCosine stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

---- tests::test09_SparseJaccard stdout ----
Error: Capi : ngt_insert_index_as_float() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/ObjectRepository.h:allocatePersistentObject:345: ObjectSpace::allocatePersistentObject: Fatal error! The dimensionality is invalid. The specified dimensionality=3. The specified object=2.

---- tests::test10_NormalizedL2 stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

---- tests::test12_Lorentz stdout ----
id1: 1
id2: 2
thread 'tests::test12_Lorentz' panicked at src/main.rs:40:23:
index out of bounds: the len is 0 but the index is 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::test06_NormalizedAngle
    tests::test07_NormalizedCosine
    tests::test09_SparseJaccard
    tests::test10_NormalizedL2
    tests::test12_Lorentz

test result: FAILED. 7 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

The failures in SparseJaccard and Lorentz are interesting but unrelated to my issue. I need to use NormalizedCosine for my application.

Comments

It is possible this is an issue with the upstream NGT library. I am not sure but I decided to ask here first. It is also possible that I have missed some detail about these distance types are supposed to be used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.