Giter VIP home page Giter VIP logo

Comments (6)

jkni avatar jkni commented on May 20, 2024

Initial testing of a lazy thread-local float[][] cache in the context of each search as well as precomputation suggests there is mileage here. Lazy caching has a small overhead on small graphs and precomputation of all subvector comparisons at the start of a query has fixed overhead determined by PQ. As comparisons go up, we see benefit from both approaches, with a surprising improvement in the precalculation cases.

from jvector.

jkni avatar jkni commented on May 20, 2024
With Cache

hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
PQ@128 build 44.07s,
PQ encode 4.85s,
Build M=8 ef=120 in 28.23s with 0.91 short edges
  Query PQ=true top 100/1 recall 0.6335 in 16.26s after 127452470 nodes visited
  Query PQ=true top 100/2 recall 0.7082 in 24.29s after 218207850 nodes visited
Build M=12 ef=120 in 43.14s with 0.84 short edges
  Query PQ=true top 100/1 recall 0.6915 in 20.37s after 176538520 nodes visited
  Query PQ=true top 100/2 recall 0.7570 in 29.63s after 308661330 nodes visited
Build M=16 ef=120 in 54.68s with 0.78 short edges
  Query PQ=true top 100/1 recall 0.7205 in 22.67s after 220878340 nodes visited
  Query PQ=true top 100/2 recall 0.7841 in 34.89s after 392117050 nodes visited
Build M=24 ef=120 in 77.28s with 0.66 short edges
  Query PQ=true top 100/1 recall 0.7538 in 27.14s after 301028140 nodes visited
  Query PQ=true top 100/2 recall 0.8185 in 44.18s after 545185090 nodes visited

hdf5/glove-100-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 100
PQ@50 build 20.52s,
PQ encode 7.52s,
Build M=8 ef=120 in 91.37s with 0.86 short edges
  Query PQ=true top 100/1 recall 0.5632 in 9.24s after 152024670 nodes visited
  Query PQ=true top 100/2 recall 0.6824 in 15.13s after 255257650 nodes visited
Build M=12 ef=120 in 116.90s with 0.77 short edges
  Query PQ=true top 100/1 recall 0.6346 in 13.20s after 210274760 nodes visited
  Query PQ=true top 100/2 recall 0.7533 in 20.35s after 359428560 nodes visited
Build M=16 ef=120 in 156.19s with 0.70 short edges
  Query PQ=true top 100/1 recall 0.6726 in 15.70s after 265477430 nodes visited
  Query PQ=true top 100/2 recall 0.7908 in 23.62s after 456959390 nodes visited
Build M=24 ef=120 in 245.01s with 0.58 short edges
  Query PQ=true top 100/1 recall 0.7128 in 18.55s after 365712450 nodes visited
  Query PQ=true top 100/2 recall 0.8340 in 31.57s after 637646700 nodes visited

hdf5/glove-200-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 200
PQ@100 build 39.80s,
PQ encode 15.47s,
Build M=8 ef=120 in 146.79s with 0.81 short edges
  Query PQ=true top 100/1 recall 0.4833 in 16.07s after 151592250 nodes visited
  Query PQ=true top 100/2 recall 0.5987 in 24.01s after 261382020 nodes visited
Build M=12 ef=120 in 210.04s with 0.76 short edges
  Query PQ=true top 100/1 recall 0.5773 in 19.71s after 220615200 nodes visited
  Query PQ=true top 100/2 recall 0.6879 in 33.56s after 376501920 nodes visited
Build M=16 ef=120 in 267.74s with 0.70 short edges
  Query PQ=true top 100/1 recall 0.6217 in 22.95s after 277479120 nodes visited
  Query PQ=true top 100/2 recall 0.7303 in 38.70s after 477498100 nodes visited
Build M=24 ef=120 in 395.52s with 0.59 short edges
  Query PQ=true top 100/1 recall 0.6745 in 30.88s after 390517330 nodes visited
  Query PQ=true top 100/2 recall 0.7769 in 49.85s after 669481300 nodes visited

hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
PQ@128 build 49.99s,
PQ encode 5.30s,
Build M=8 ef=120 in 29.25s with 0.91 short edges
  Query PQ=true top 1/1 recall 0.1683 in 1.67s after 6023290 nodes visited
  Query PQ=true top 1/10 recall 0.6650 in 4.99s after 25667710 nodes visited
Build M=12 ef=120 in 40.47s with 0.84 short edges
  Query PQ=true top 1/1 recall 0.2578 in 1.81s after 9635620 nodes visited
  Query PQ=true top 1/10 recall 0.7901 in 6.41s after 36602620 nodes visited
Build M=16 ef=120 in 53.48s with 0.78 short edges
  Query PQ=true top 1/1 recall 0.3118 in 2.32s after 12960760 nodes visited
  Query PQ=true top 1/10 recall 0.8691 in 7.79s after 47000530 nodes visited
Build M=24 ef=120 in 77.23s with 0.66 short edges
  Query PQ=true top 1/1 recall 0.4206 in 3.43s after 20021310 nodes visited
  Query PQ=true top 1/10 recall 0.9267 in 9.72s after 64112850 nodes visited



Without Cache
hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
PQ@128 build 54.87s,
PQ encode 5.13s,
Build M=8 ef=120 in 28.97s with 0.91 short edges
  Query PQ=true top 100/1 recall 0.6358 in 16.19s after 127353430 nodes visited
  Query PQ=true top 100/2 recall 0.7082 in 27.35s after 217810190 nodes visited
Build M=12 ef=120 in 42.26s with 0.84 short edges
  Query PQ=true top 100/1 recall 0.6897 in 21.58s after 176787060 nodes visited
  Query PQ=true top 100/2 recall 0.7561 in 37.93s after 308727190 nodes visited
Build M=16 ef=120 in 53.38s with 0.78 short edges
  Query PQ=true top 100/1 recall 0.7201 in 26.53s after 221319640 nodes visited
  Query PQ=true top 100/2 recall 0.7833 in 46.94s after 392516800 nodes visited
Build M=24 ef=120 in 77.63s with 0.66 short edges
  Query PQ=true top 100/1 recall 0.7537 in 36.03s after 301606500 nodes visited
  Query PQ=true top 100/2 recall 0.8190 in 64.78s after 545210160 nodes visited

hdf5/glove-100-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 100
PQ@50 build 21.37s,
PQ encode 7.32s,
Build M=8 ef=120 in 93.70s with 0.86 short edges
  Query PQ=true top 100/1 recall 0.5651 in 10.00s after 152986490 nodes visited
  Query PQ=true top 100/2 recall 0.6842 in 18.30s after 256383300 nodes visited
Build M=12 ef=120 in 116.17s with 0.77 short edges
  Query PQ=true top 100/1 recall 0.6368 in 13.48s after 208624760 nodes visited
  Query PQ=true top 100/2 recall 0.7541 in 23.88s after 357208550 nodes visited
Build M=16 ef=120 in 166.23s with 0.70 short edges
  Query PQ=true top 100/1 recall 0.6728 in 16.28s after 264001190 nodes visited
  Query PQ=true top 100/2 recall 0.7908 in 29.63s after 455278300 nodes visited
Build M=24 ef=120 in 216.16s with 0.58 short edges
  Query PQ=true top 100/1 recall 0.7131 in 22.27s after 370713270 nodes visited
  Query PQ=true top 100/2 recall 0.8340 in 40.12s after 642336160 nodes visited

hdf5/glove-200-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 200
PQ@100 build 39.86s,
PQ encode 15.31s,
Build M=8 ef=120 in 161.08s with 0.81 short edges
  Query PQ=true top 100/1 recall 0.4815 in 16.30s after 152419420 nodes visited
  Query PQ=true top 100/2 recall 0.5974 in 29.75s after 262154640 nodes visited
Build M=12 ef=120 in 221.83s with 0.76 short edges
  Query PQ=true top 100/1 recall 0.5785 in 27.72s after 220724270 nodes visited
  Query PQ=true top 100/2 recall 0.6876 in 41.13s after 376284920 nodes visited
Build M=16 ef=120 in 282.27s with 0.70 short edges
  Query PQ=true top 100/1 recall 0.6231 in 30.13s after 281461690 nodes visited
  Query PQ=true top 100/2 recall 0.7307 in 50.03s after 480860130 nodes visited
Build M=24 ef=120 in 397.85s with 0.59 short edges
  Query PQ=true top 100/1 recall 0.6726 in 40.85s after 389610260 nodes visited
  Query PQ=true top 100/2 recall 0.7762 in 68.66s after 670488170 nodes visited

hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
PQ@128 build 49.49s,
PQ encode 5.31s,
Build M=8 ef=120 in 29.35s with 0.91 short edges
  Query PQ=true top 1/1 recall 0.1635 in 1.03s after 5838970 nodes visited
  Query PQ=true top 1/10 recall 0.6641 in 2.71s after 25420320 nodes visited
Build M=12 ef=120 in 41.56s with 0.84 short edges
  Query PQ=true top 1/1 recall 0.2373 in 0.88s after 9382700 nodes visited
  Query PQ=true top 1/10 recall 0.7933 in 3.81s after 36945570 nodes visited
Build M=16 ef=120 in 51.18s with 0.78 short edges
  Query PQ=true top 1/1 recall 0.3503 in 1.16s after 13310250 nodes visited
  Query PQ=true top 1/10 recall 0.8706 in 4.86s after 46286040 nodes visited
Build M=24 ef=120 in 76.39s with 0.66 short edges
  Query PQ=true top 1/1 recall 0.3898 in 1.69s after 19322610 nodes visited
  Query PQ=true top 1/10 recall 0.9117 in 7.08s after 64656430 nodes visited

hdf5/glove-100-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 100
PQ@50 build 21.18s,
PQ encode 7.68s,
Build M=8 ef=120 in 80.76s with 0.86 short edges
  Query PQ=true top 1/1 recall 0.1893 in 0.38s after 7234510 nodes visited
  Query PQ=true top 1/10 recall 0.7287 in 1.78s after 30940780 nodes visited
Build M=12 ef=120 in 112.27s with 0.77 short edges
  Query PQ=true top 1/1 recall 0.2427 in 0.56s after 11179110 nodes visited
  Query PQ=true top 1/10 recall 0.8389 in 2.56s after 45618860 nodes visited
Build M=16 ef=120 in 143.83s with 0.70 short edges
  Query PQ=true top 1/1 recall 0.3706 in 0.84s after 15474020 nodes visited
  Query PQ=true top 1/10 recall 0.9087 in 3.35s after 56824020 nodes visited
Build M=24 ef=120 in 211.80s with 0.58 short edges
  Query PQ=true top 1/1 recall 0.4262 in 1.14s after 23526640 nodes visited
  Query PQ=true top 1/10 recall 0.9528 in 4.86s after 81680670 nodes visited

hdf5/glove-200-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 200
PQ@100 build 39.73s,
PQ encode 15.28s,
Build M=8 ef=120 in 142.20s with 0.81 short edges
  Query PQ=true top 1/1 recall 0.1376 in 0.63s after 6484420 nodes visited
  Query PQ=true top 1/10 recall 0.6399 in 2.95s after 31069950 nodes visited
Build M=12 ef=120 in 206.60s with 0.76 short edges
  Query PQ=true top 1/1 recall 0.2288 in 0.78s after 10191030 nodes visited
  Query PQ=true top 1/10 recall 0.7777 in 4.73s after 44947200 nodes visited
Build M=16 ef=120 in 264.49s with 0.70 short edges
  Query PQ=true top 1/1 recall 0.2268 in 1.01s after 13493630 nodes visited
  Query PQ=true top 1/10 recall 0.8167 in 5.28s after 60192000 nodes visited
Build M=24 ef=120 in 391.26s with 0.59 short edges
  Query PQ=true top 1/1 recall 0.2934 in 1.57s after 21070640 nodes visited
  Query PQ=true top 1/10 recall 0.8659 in 8.57s after 86199800 nodes visited

from jvector.

jkni avatar jkni commented on May 20, 2024
Precomputed
hdf5/nytimes-256-angular.hdf5: 289761 base and 9991 query vectors loaded, dimensions 256
PQ@128 build 45.18s,
PQ encode 5.77s,
Build M=24 ef=120 in 77.00s with 0.66 short edges
  Query PQ=true top 100/1 recall 0.7543 in 14.80s after 301791400 nodes visited
  Query PQ=true top 100/2 recall 0.8188 in 25.43s after 546336860 nodes visited

hdf5/glove-100-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 100
PQ@50 build 21.27s,
PQ encode 7.76s,
Build M=24 ef=120 in 250.30s with 0.57 short edges
  Query PQ=true top 100/1 recall 0.7130 in 12.81s after 364761190 nodes visited
  Query PQ=true top 100/2 recall 0.8335 in 22.36s after 636527870 nodes visited
  
 hdf5/glove-200-angular.hdf5: 1183514 base and 10000 query vectors loaded, dimensions 200
PQ@100 build 39.55s,
PQ encode 16.30s,
Build M=24 ef=120 in 413.76s with 0.59 short edges
  Query PQ=true top 100/1 recall 0.6746 in 17.70s after 386508760 nodes visited
  Query PQ=true top 100/2 recall 0.7770 in 32.47s after 666304560 nodes visited

from jvector.

jkni avatar jkni commented on May 20, 2024

I'm tabling this for now and will retest after #80 is available. There's potential here, but it may need some heuristic to gate precalculation.

from jvector.

jkni avatar jkni commented on May 20, 2024

Hacked together branch (shoving the cache into dotProduct) is here https://github.com/jkni/jvector/tree/cache-pq-results. Most recent commit precalculates, previous commit is lazy cache.

from jvector.

jkni avatar jkni commented on May 20, 2024

@jbellis added this in #94. Closing out.

from jvector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.