Comments (6)
You can store a mapping between nodeId <-> docid
in an external system (database, filesystem, in-memory hashtable etc).
from jvector.
You can store a mapping between
nodeId <-> docid
in an external system (database, filesystem, in-memory hashtable etc).
There isn't a separate place to store it in the index, so this seems to be the only way for now. I understand that the order of this index is supposed to be determined after the entire index is complete, how exactly should this be handled? I looked through the test cases and didn't see the corresponding code
from jvector.
nodeId
is the ordinal of the vector value provided by RandomAccessVectorValues
. It's already known upfront. You already provide an instance if RandomAccessVectorValues
to JVector. You should just keep a mapping unique id for every ordinal.
But note that, if you delete some values then ordinals can change and you need to update the mapping too.
from jvector.
nodeId
is the ordinal of the vector value provided byRandomAccessVectorValues
. It's already known upfront. You already provide an instance ifRandomAccessVectorValues
to JVector. You should just keep a mapping unique id for every ordinal.But note that, if you delete some values then ordinals can change and you need to update the mapping too.
In the way you said, I wrote a test case with the following code, please help me to see if it is correct or not
@Test
public void testNodeDocIdMapping() {
var sourceData = List.of(new TestVectorItem(new float[]{-1, -1}, 11L),
new TestVectorItem(new float[]{1.5f, 1.4f}, 13L),
new TestVectorItem(new float[]{0.9f, 0.9f}, 14L),
new TestVectorItem(new float[]{1, 1}, 12L));
Map<Integer, Long> nodeMap = new HashMap<>();
List<VectorFloat<?>> rawVectors = new ArrayList<>();
for (int i = 0; i < sourceData.size(); i++) {
rawVectors.add(vectorTypeSupport.createFloatVector(sourceData.get(i).getVectors()));
nodeMap.put(i, sourceData.get(i).getDocId());
}
var vectors = new ListRandomAccessVectorValues(rawVectors, 2);
var builder = new GraphIndexBuilder(vectors, VectorSimilarityFunction.EUCLIDEAN, 2, 2, 1.0f, 1.0f);
try (var graph = builder.build()) {
var qv = vectorTypeSupport.createFloatVector(new float[]{0.5f, 0.5f});
var results = GraphSearcher.search(qv, 4, vectors, VectorSimilarityFunction.EUCLIDEAN, graph, Bits.ALL);
SearchResult.NodeScore[] nodes = results.getNodes();
for (SearchResult.NodeScore nodeScore : nodes) {
int node = nodeScore.node;
float[] ff = (float[]) rawVectors.get(node).get();
float score = nodeScore.score;
System.out.println("float[0]=" + ff[0] + ", float[1]=" + ff[1] + ", node=" + node + ", score=" + score + ", docId=" + nodeMap.get(node));
}
}
}
public static class TestVectorItem {
public float[] vectors;
public long docId;
public TestVectorItem(float[] vectors, long docId) {
this.vectors = vectors;
this.docId = docId;
}
public float[] getVectors() {
return vectors;
}
public long getDocId() {
return docId;
}
}
from jvector.
That looks reasonable to me.
from jvector.
That looks reasonable to me.
tks
from jvector.
Related Issues (20)
- Some notes HOT 3
- GraphSearch#resume listed as experimental HOT 1
- List Lucene version used in README benchmark
- Add Lucene benchmark code used HOT 5
- mvn compile yields error message release version 22 not supported HOT 3
- GraphIndexBench comments
- GraphBuildBench comments
- Per version release notes
- package jdk.incubator.vector is not visible HOT 3
- The most advanced vector search algo HOT 3
- Is jvector going to implement FreshDiskANN HOT 9
- Writing with BufferedRandomAccessWriter is 2x slower than with BufferedOutputStream
- View interface could use class level javadoc
- ScoreFunction#isExact is redundant with ExactScoreFunction HOT 1
- GraphSearcher has inconsistent new line brackets HOT 4
- FusedADC* classes could use some more explanation HOT 4
- Make it possible for JVector users to consume MemorySegmentReader HOT 2
- Experiment with direct i/o in OnDiskGraphIndexWriter HOT 1
- Decouple vector values from index creation HOT 6
- JVector should clear scratch search structures when a search terminates exceptionally
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jvector.