Comments (3)
At least as far back as 0401011 when we first merged Bench into JVector
from jvector.
It looks like the problem is with enforceMaxConnLimit. I took a shortcut and just applied the max-alpha diversity check, which means that often all the edges in the list are fine and we just end up removing the farthest-away edge.
If instead we apply increase the diversity incrementally, like we do in insertDiverse, then we do a better job of prioritizing removal of nodes that are okay wrt max-alpha, but not with alpha=1.0.
hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
Average degree is 31.9973529909132
Index M=16 ef=100: top 100/1 recall 0.7327, build 38.62s, query 11.71s. 219005090 nodes visited
hdf5/glove-100-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 100
Average degree is 31.99960879212244
Index M=16 ef=100: top 100/1 recall 0.7116, build 128.20s, query 8.44s. 262719410 nodes visited
hdf5/glove-200-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 200
Average degree is 31.989657072075193
Index M=16 ef=100: top 100/1 recall 0.6451, build 208.60s, query 13.42s. 281149900 nodes visited
from jvector.
Also tested neighborOverflow. 1.2 is the sweet spot for speed. Recall keeps creeping up past that though. Could be interesting to allow significantly larger M overall during construction (and then trim it back when complete), not just a small amount of overflow. Here's the raw data:
hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
Index M=16 ef=100 ov=1.0: top 100/1 recall 0.7248, build 41.33s, query 10.30s. 216100630 nodes visited
Index M=16 ef=100 ov=1.1: top 100/1 recall 0.7253, build 34.46s, query 10.22s. 216682080 nodes visited
Index M=16 ef=100 ov=1.2: top 100/1 recall 0.7273, build 34.09s, query 10.19s. 217711860 nodes visited
Index M=16 ef=100 ov=1.3: top 100/1 recall 0.7282, build 34.87s, query 10.00s. 217189300 nodes visited
Index M=16 ef=100 ov=1.4: top 100/1 recall 0.7300, build 34.40s, query 10.06s. 217574420 nodes visited
Index M=16 ef=100 ov=1.5: top 100/1 recall 0.7310, build 35.67s, query 9.92s. 216313450 nodes visited
Index M=16 ef=100 ov=1.6: top 100/1 recall 0.7318, build 36.42s, query 9.95s. 217324750 nodes visited
Index M=16 ef=100 ov=1.7: top 100/1 recall 0.7324, build 37.23s, query 10.00s. 216880550 nodes visited
Index M=16 ef=100 ov=1.8: top 100/1 recall 0.7331, build 38.25s, query 10.10s. 218359990 nodes visited
Index M=16 ef=100 ov=1.9: top 100/1 recall 0.7349, build 39.12s, query 9.98s. 217950400 nodes visited
Index M=16 ef=100 ov=2.0: top 100/1 recall 0.7361, build 39.77s, query 10.15s. 219628900 nodes visited
hdf5/glove-100-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 100
Index M=16 ef=100 ov=1.0: top 100/1 recall 0.7083, build 129.28s, query 7.96s. 262328150 nodes visited
Index M=16 ef=100 ov=1.1: top 100/1 recall 0.7099, build 112.93s, query 7.89s. 259868660 nodes visited
Index M=16 ef=100 ov=1.2: top 100/1 recall 0.7107, build 111.35s, query 7.95s. 262270020 nodes visited
Index M=16 ef=100 ov=1.3: top 100/1 recall 0.7115, build 114.18s, query 7.86s. 262288160 nodes visited
Index M=16 ef=100 ov=1.4: top 100/1 recall 0.7115, build 120.58s, query 8.23s. 262504090 nodes visited
Index M=16 ef=100 ov=1.5: top 100/1 recall 0.7112, build 121.68s, query 8.32s. 264554650 nodes visited
Index M=16 ef=100 ov=1.6: top 100/1 recall 0.7124, build 125.39s, query 7.89s. 261228060 nodes visited
Index M=16 ef=100 ov=1.7: top 100/1 recall 0.7130, build 120.96s, query 7.87s. 263518890 nodes visited
Index M=16 ef=100 ov=1.8: top 100/1 recall 0.7112, build 123.28s, query 7.92s. 266836720 nodes visited
Index M=16 ef=100 ov=1.9: top 100/1 recall 0.7137, build 128.29s, query 7.86s. 263501200 nodes visited
Index M=16 ef=100 ov=2.0: top 100/1 recall 0.7148, build 127.33s, query 7.93s. 263909890 nodes visited
hdf5/glove-200-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 200
Index M=16 ef=100 ov=1.0: top 100/1 recall 0.6385, build 214.84s, query 12.92s. 273614200 nodes visited
Index M=16 ef=100 ov=1.1: top 100/1 recall 0.6390, build 191.94s, query 12.99s. 276782920 nodes visited
Index M=16 ef=100 ov=1.2: top 100/1 recall 0.6376, build 192.05s, query 13.21s. 278794360 nodes visited
Index M=16 ef=100 ov=1.3: top 100/1 recall 0.6392, build 194.30s, query 13.36s. 283579970 nodes visited
Index M=16 ef=100 ov=1.4: top 100/1 recall 0.6422, build 199.53s, query 13.16s. 278914420 nodes visited
Index M=16 ef=100 ov=1.5: top 100/1 recall 0.6445, build 204.94s, query 13.13s. 278050930 nodes visited
Index M=16 ef=100 ov=1.6: top 100/1 recall 0.6445, build 207.79s, query 13.21s. 277796700 nodes visited
Index M=16 ef=100 ov=1.7: top 100/1 recall 0.6422, build 209.80s, query 13.06s. 277976660 nodes visited
Index M=16 ef=100 ov=1.8: top 100/1 recall 0.6435, build 216.14s, query 13.20s. 280327590 nodes visited
Index M=16 ef=100 ov=1.9: top 100/1 recall 0.6435, build 219.54s, query 13.03s. 277604230 nodes visited
Index M=16 ef=100 ov=2.0: top 100/1 recall 0.6458, build 225.36s, query 13.19s. 279667700 nodes visited
from jvector.
Related Issues (20)
- add concurrent support for removeDeletedNodes HOT 1
- beta1 takes 20s to write out a graph of 5M 128 dimension vectors
- add a RandomAccessReader implementation for jdk 20+ using modern MMap or MemorySegment
- Some notes HOT 3
- GraphSearch#resume listed as experimental HOT 1
- List Lucene version used in README benchmark
- Add Lucene benchmark code used HOT 5
- mvn compile yields error message release version 22 not supported HOT 3
- GraphIndexBench comments
- GraphBuildBench comments
- Per version release notes
- package jdk.incubator.vector is not visible HOT 3
- The most advanced vector search algo HOT 3
- Is jvector going to implement FreshDiskANN HOT 9
- Writing with BufferedRandomAccessWriter is 2x slower than with BufferedOutputStream
- View interface could use class level javadoc
- ScoreFunction#isExact is redundant with ExactScoreFunction HOT 1
- GraphSearcher has inconsistent new line brackets HOT 4
- FusedADC* classes could use some more explanation HOT 4
- Make it possible for JVector users to consume MemorySegmentReader
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jvector.