Giter VIP home page Giter VIP logo

Comments (3)

jbellis avatar jbellis commented on May 20, 2024

At least as far back as 0401011 when we first merged Bench into JVector

from jvector.

jbellis avatar jbellis commented on May 20, 2024

It looks like the problem is with enforceMaxConnLimit. I took a shortcut and just applied the max-alpha diversity check, which means that often all the edges in the list are fine and we just end up removing the farthest-away edge.

If instead we apply increase the diversity incrementally, like we do in insertDiverse, then we do a better job of prioritizing removal of nodes that are okay wrt max-alpha, but not with alpha=1.0.

hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
Average degree is 31.9973529909132
Index   M=16 ef=100: top 100/1 recall 0.7327, build 38.62s, query 11.71s. 219005090 nodes visited

hdf5/glove-100-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 100
Average degree is 31.99960879212244
Index   M=16 ef=100: top 100/1 recall 0.7116, build 128.20s, query 8.44s. 262719410 nodes visited

hdf5/glove-200-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 200
Average degree is 31.989657072075193
Index   M=16 ef=100: top 100/1 recall 0.6451, build 208.60s, query 13.42s. 281149900 nodes visited

from jvector.

jbellis avatar jbellis commented on May 20, 2024

Also tested neighborOverflow. 1.2 is the sweet spot for speed. Recall keeps creeping up past that though. Could be interesting to allow significantly larger M overall during construction (and then trim it back when complete), not just a small amount of overflow. Here's the raw data:

hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
Index   M=16 ef=100 ov=1.0: top 100/1 recall 0.7248, build 41.33s, query 10.30s. 216100630 nodes visited
Index   M=16 ef=100 ov=1.1: top 100/1 recall 0.7253, build 34.46s, query 10.22s. 216682080 nodes visited
Index   M=16 ef=100 ov=1.2: top 100/1 recall 0.7273, build 34.09s, query 10.19s. 217711860 nodes visited
Index   M=16 ef=100 ov=1.3: top 100/1 recall 0.7282, build 34.87s, query 10.00s. 217189300 nodes visited
Index   M=16 ef=100 ov=1.4: top 100/1 recall 0.7300, build 34.40s, query 10.06s. 217574420 nodes visited
Index   M=16 ef=100 ov=1.5: top 100/1 recall 0.7310, build 35.67s, query 9.92s. 216313450 nodes visited
Index   M=16 ef=100 ov=1.6: top 100/1 recall 0.7318, build 36.42s, query 9.95s. 217324750 nodes visited
Index   M=16 ef=100 ov=1.7: top 100/1 recall 0.7324, build 37.23s, query 10.00s. 216880550 nodes visited
Index   M=16 ef=100 ov=1.8: top 100/1 recall 0.7331, build 38.25s, query 10.10s. 218359990 nodes visited
Index   M=16 ef=100 ov=1.9: top 100/1 recall 0.7349, build 39.12s, query 9.98s. 217950400 nodes visited
Index   M=16 ef=100 ov=2.0: top 100/1 recall 0.7361, build 39.77s, query 10.15s. 219628900 nodes visited

hdf5/glove-100-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 100
Index   M=16 ef=100 ov=1.0: top 100/1 recall 0.7083, build 129.28s, query 7.96s. 262328150 nodes visited
Index   M=16 ef=100 ov=1.1: top 100/1 recall 0.7099, build 112.93s, query 7.89s. 259868660 nodes visited
Index   M=16 ef=100 ov=1.2: top 100/1 recall 0.7107, build 111.35s, query 7.95s. 262270020 nodes visited
Index   M=16 ef=100 ov=1.3: top 100/1 recall 0.7115, build 114.18s, query 7.86s. 262288160 nodes visited
Index   M=16 ef=100 ov=1.4: top 100/1 recall 0.7115, build 120.58s, query 8.23s. 262504090 nodes visited
Index   M=16 ef=100 ov=1.5: top 100/1 recall 0.7112, build 121.68s, query 8.32s. 264554650 nodes visited
Index   M=16 ef=100 ov=1.6: top 100/1 recall 0.7124, build 125.39s, query 7.89s. 261228060 nodes visited
Index   M=16 ef=100 ov=1.7: top 100/1 recall 0.7130, build 120.96s, query 7.87s. 263518890 nodes visited
Index   M=16 ef=100 ov=1.8: top 100/1 recall 0.7112, build 123.28s, query 7.92s. 266836720 nodes visited
Index   M=16 ef=100 ov=1.9: top 100/1 recall 0.7137, build 128.29s, query 7.86s. 263501200 nodes visited
Index   M=16 ef=100 ov=2.0: top 100/1 recall 0.7148, build 127.33s, query 7.93s. 263909890 nodes visited

hdf5/glove-200-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 200
Index   M=16 ef=100 ov=1.0: top 100/1 recall 0.6385, build 214.84s, query 12.92s. 273614200 nodes visited
Index   M=16 ef=100 ov=1.1: top 100/1 recall 0.6390, build 191.94s, query 12.99s. 276782920 nodes visited
Index   M=16 ef=100 ov=1.2: top 100/1 recall 0.6376, build 192.05s, query 13.21s. 278794360 nodes visited
Index   M=16 ef=100 ov=1.3: top 100/1 recall 0.6392, build 194.30s, query 13.36s. 283579970 nodes visited
Index   M=16 ef=100 ov=1.4: top 100/1 recall 0.6422, build 199.53s, query 13.16s. 278914420 nodes visited
Index   M=16 ef=100 ov=1.5: top 100/1 recall 0.6445, build 204.94s, query 13.13s. 278050930 nodes visited
Index   M=16 ef=100 ov=1.6: top 100/1 recall 0.6445, build 207.79s, query 13.21s. 277796700 nodes visited
Index   M=16 ef=100 ov=1.7: top 100/1 recall 0.6422, build 209.80s, query 13.06s. 277976660 nodes visited
Index   M=16 ef=100 ov=1.8: top 100/1 recall 0.6435, build 216.14s, query 13.20s. 280327590 nodes visited
Index   M=16 ef=100 ov=1.9: top 100/1 recall 0.6435, build 219.54s, query 13.03s. 277604230 nodes visited
Index   M=16 ef=100 ov=2.0: top 100/1 recall 0.6458, build 225.36s, query 13.19s. 279667700 nodes visited

from jvector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.