Hi, this package looks really cool and I'd love to use it for my use case. <p dir=

Very slow performance on large sets about setsimilaritysearch HOT 1 OPEN

ekzhu commented on June 6, 2024

Very slow performance on large sets

from setsimilaritysearch.

Comments (1)

ekzhu commented on June 6, 2024

You are correct that this package is not suitable comparing to datasketch when it comes to larger sets. The benchmark datasets used in README have average set size around 20-30.

For this package, the query time is directly proportional to the size of the query set (# of tokens). It is also heavily influenced by the size of indexed sets because exact set similarity calculation is made at query time for candidate sets.

There are some algorithmic tricks that are designed to handle exact search over large sets. I made one: https://github.com/ekzhu/josie. I haven't had time to make it available here.

from setsimilaritysearch.

Recommend Projects

Very slow performance on large sets about setsimilaritysearch HOT 1 OPEN

Comments (1)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent