Summary
I've come across a problem when calling index.remove(id: VecId)
, but it only happens with certain distance types:
NormalizedAngle
NormalizedCosine
NormalizedL2
The common theme seems to be that these are normalized?
This is the error that is produced:
Error: Capi : ngt_remove_index() : Error: /.../ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id
Reproducer
I've made a basic project with test cases:
ngtbug.zip
These are the versions in use:
[dependencies]
anyhow = "1.0.75"
ngt = "0.6.1"
uuid = { version = "1.5.0", features = ["v4"] }
and in the lock file, ngt-sys
is at 2.1.3.
This is test case, to explain what is happening:
#[cfg(test)]
mod tests {
use anyhow::Result;
use ngt::{NgtIndex, NgtDistance, NgtProperties, EPSILON};
fn run_test(dist: NgtDistance) -> Result<()> {
// Create a new index
let prop = NgtProperties::<f32>::dimension(3)?
.distance_type(dist)?;
// Temp dir for tests
let temp_dir_p = std::env::temp_dir();
let temp_dir = temp_dir_p.to_string_lossy();
let u = uuid::Uuid::new_v4();
let u = u.as_simple();
let index_path = format!("{temp_dir}/{u}");
std::fs::remove_dir_all(&index_path).ok();
let mut index = NgtIndex::create(&index_path, prop)?;
// Insert two vectors and get their id
let vec1 = vec![1.0, 2.0, 3.0];
let vec2 = vec![4.0, 5.0, 6.0];
let id1 = index.insert(vec1)?;
let id2 = index.insert(vec2)?;
println!("id1: {}", id1);
println!("id2: {}", id2);
// Actually build the index (not yet persisted on disk)
// This is required in order to be able to search vectors
index.build(2)?;
index.persist()?;
// Perform a vector search (with 1 result)
let res = index.search(&[1.1, 2.1, 3.1], 1, EPSILON)?;
assert_eq!(res[0].id, id1);
// PROBLEM HERE
index.remove(id1)?;
// Cleanup - remove the index path
std::fs::remove_dir_all(&index_path)?;
Ok(())
}
#[test]
fn test01_L1() -> Result<()> {
run_test(NgtDistance::L1)
}
#[test]
fn test02_L2() -> Result<()> {
run_test(NgtDistance::L2)
}
#[test]
fn test03_Angle() -> Result<()> {
run_test(NgtDistance::Angle)
}
#[test]
fn test04_Hamming() -> Result<()> {
run_test(NgtDistance::Hamming)
}
#[test]
fn test05_Cosine() -> Result<()> {
run_test(NgtDistance::Cosine)
}
#[test]
fn test06_NormalizedAngle() -> Result<()> {
run_test(NgtDistance::NormalizedAngle)
}
#[test]
fn test07_NormalizedCosine() -> Result<()> {
run_test(NgtDistance::NormalizedCosine)
}
#[test]
fn test08_Jaccard() -> Result<()> {
run_test(NgtDistance::Jaccard)
}
#[test]
fn test09_SparseJaccard() -> Result<()> {
run_test(NgtDistance::SparseJaccard)
}
#[test]
fn test10_NormalizedL2() -> Result<()> {
run_test(NgtDistance::NormalizedL2)
}
#[test]
fn test11_Poincare() -> Result<()> {
run_test(NgtDistance::Poincare)
}
#[test]
fn test12_Lorentz() -> Result<()> {
run_test(NgtDistance::Lorentz)
}
}
This is the output from the test run:
$ cargo test -- --test-threads=1
Compiling ngtbug v0.1.0 (/home/caleb/tmp/ngtbug)
running 12 tests
test tests::test01_L1 ... ok
test tests::test02_L2 ... ok
test tests::test03_Angle ... ok
test tests::test04_Hamming ... ok
test tests::test05_Cosine ... ok
test tests::test06_NormalizedAngle ... FAILED
test tests::test07_NormalizedCosine ... FAILED
test tests::test08_Jaccard ... ok
test tests::test09_SparseJaccard ... FAILED
test tests::test10_NormalizedL2 ... FAILED
test tests::test11_Poincare ... ok
test tests::test12_Lorentz ... FAILED
failures:
---- tests::test06_NormalizedAngle stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id
---- tests::test07_NormalizedCosine stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id
---- tests::test09_SparseJaccard stdout ----
Error: Capi : ngt_insert_index_as_float() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/ObjectRepository.h:allocatePersistentObject:345: ObjectSpace::allocatePersistentObject: Fatal error! The dimensionality is invalid. The specified dimensionality=3. The specified object=2.
---- tests::test10_NormalizedL2 stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id
---- tests::test12_Lorentz stdout ----
id1: 1
id2: 2
thread 'tests::test12_Lorentz' panicked at src/main.rs:40:23:
index out of bounds: the len is 0 but the index is 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::test06_NormalizedAngle
tests::test07_NormalizedCosine
tests::test09_SparseJaccard
tests::test10_NormalizedL2
tests::test12_Lorentz
test result: FAILED. 7 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s
The failures in SparseJaccard
and Lorentz
are interesting but unrelated to my issue. I need to use NormalizedCosine
for my application.
Comments
It is possible this is an issue with the upstream NGT library. I am not sure but I decided to ask here first. It is also possible that I have missed some detail about these distance types are supposed to be used.