Giter VIP home page Giter VIP logo

setsimilaritysearch's Issues

Very slow performance on large sets

Hi, this package looks really cool and I'd love to use it for my use case.

I have about 7,000 sets with about 1,000 elements each that I'm using as my index. I also have a set of about 1,000 queries with similar sizes, about 1,000 elements each, as queries. However, when I profile the times for queries for this package vs. datasketch's MinHashLSHEnsemble method, the results are pretty wildly off-base from the numbers presented in the readme.

In general, a single minhash LSH ensemble query in my case is taking about 10ms, and the SetSimilarity query is taking anywhere from 300ms to 500ms, even whole seconds in some cases. Are these numbers to be expected, and is SetSimilaritySearch simply not suitable for sets this large? My sets are exclusively integers, if that matters.

Any insight or help is appreciated.

Unexpected multiset behaviour

Firstly thanks for this package and the datasketch one—they're both great.

I noticed some unexpected behaviour when using the all_pairs function with input data that aren't set-like:

sets = [["a", "b"], ["a", "a"]]
list(all_pairs(sets, similarity_func_name="jaccard", similarity_threshold=0.1))
# [(1, 0, 1.0)]

sets = [["a", "a", "b"], ["a", "a"]]
list(all_pairs(sets, similarity_func_name="jaccard", similarity_threshold=0.1))
# [(1, 0, 1.5)]

I assume that this package doesn't support multisets, and that the outputs in such cases are undefined (setting the threshold to 0.75 in the second example leads to an empty result set, for instance), but if that's the case perhaps it would be a good idea to make this explicit in the documentation, and to mention that it's the user's responsibility to ensure that there are no duplicates in their input sets/lists.

In my case this simply means that I have to convert my lists to sets before passing them to all_pairs, but it did catch me off guard because that step wouldn't be necessary if I were applying MinHash LSH.

Allow 'sets' argument to be any iterable

Currently

https://github.com/ekzhu/SetSimilaritySearch/blob/master/SetSimilaritySearch/search.py#L27
https://github.com/ekzhu/SetSimilaritySearch/blob/master/SetSimilaritySearch/all_pairs.py#L28

state:

if not isinstance(sets, list) or len(sets) == 0:
        raise ValueError("Input parameter sets must be a non-empty list.")

I propose to change this to:

if not isinstance(sets, Iterable) or len(sets) == 0:
        raise ValueError("Input parameter sets must be a non-empty iterable.")

Which then allows inputs as tuple as well, as well as ordered key-sets. Was helpful in my use case, rather than having to create a copy of the data in list form.

Setting up a PR from my fork, let me know what you think.

ordering of x,y coordinates when converting to identity matrix

Hi - I noticed an unexpected ordering of x,y coordinates when converting the output of all_pairs to an identity matrix. Here's the behavior that I see with identical sets:

import numpy as np
from SetSimilaritySearch import all_pairs
import random

nsets = 10
population = list(range(100))

sets = [set(population) for i in range(nsets)]
coords = all_pairs(sets, similarity_threshold=0)

arr = np.nan * np.empty((nsets, nsets))
x, y, z = zip(*coords)
arr[x, y] = z
print(np.round(arr, 2))

The output is a nice lower-triangular matrix.

[[nan nan nan nan nan nan nan nan nan nan]
 [ 1. nan nan nan nan nan nan nan nan nan]
 [ 1.  1. nan nan nan nan nan nan nan nan]
 [ 1.  1.  1. nan nan nan nan nan nan nan]
 [ 1.  1.  1.  1. nan nan nan nan nan nan]
 [ 1.  1.  1.  1.  1. nan nan nan nan nan]
 [ 1.  1.  1.  1.  1.  1. nan nan nan nan]
 [ 1.  1.  1.  1.  1.  1.  1. nan nan nan]
 [ 1.  1.  1.  1.  1.  1.  1.  1. nan nan]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1. nan]]

But when the sets are not identical, the x and y indices seem to be ordered arbitrarily:

sets = [set(population) - set(random.choices(population, k=10)) for i in range(nsets)]
coords = all_pairs(sets, similarity_threshold=0)

arr = np.nan * np.empty((nsets, nsets))
x, y, z = zip(*coords)
arr[x, y] = z
print(np.round(arr, 2))
[[ nan  nan  nan  nan  nan  nan  nan  nan  nan  nan]
 [0.83  nan 0.83 0.81 0.81  nan 0.81  nan 0.87  nan]
 [0.84  nan  nan  nan  nan  nan  nan  nan  nan  nan]
 [0.8   nan 0.8   nan  nan  nan  nan  nan  nan  nan]
 [0.84  nan 0.8  0.82  nan  nan  nan  nan  nan  nan]
 [0.84 0.83 0.82 0.82 0.84  nan 0.84 0.83 0.82  nan]
 [0.84  nan 0.84 0.82 0.8   nan  nan  nan  nan  nan]
 [0.81 0.84 0.83 0.87 0.81  nan 0.87  nan 0.83  nan]
 [0.82  nan 0.82 0.82 0.82  nan 0.82  nan  nan  nan]
 [0.82 0.87 0.82 0.86 0.84 0.84 0.84 0.85 0.82  nan]]

I can restore the lower-triangular matrix by adding the following line before assigning to the array:

x, y = zip(*[sorted(pair) for pair in zip(x, y)])

So I can still accomplish what I want with minimal difficulty, but I thought I'd let you know because it seems like generating an identity matrix might be a common use case and the behavior is a bit surprising.

Thanks a lot for sharing this project!

all_pairs: get *all* pairwise indices, including 0?

I noticed that not all possible pairs are returned when I tried the all_pairs function and that even with threshold = 0.0, the lowest values are still above 0. Is there a possibility to get all pairwise similarities?

Using all_pairs() on a 14k dataset causes MemoryErrors

Hi,

I'm trying to use the all_pairs() function to find all the (near-)duplicates in a set of about 14000 text documents (after turning them into ngram shingles first). However, I'm running against MemoryErrors crashing my script, despite the virtual machine I work on having 16GB of RAM. I've checked, and indeed the entire RAM and swap get maxed out before the script stops working.

Do you have any advice on how to reduce RAM usage, or any indication of how much memory the algorithm uses? I don't run into any memory issues when I use LSH with datasketch, but I'd rather have an exact list of duplicates.

What does it mean that SearchIndex cannot be updated?

Hi,

Interesting work. I'd like to evaluate it for my use case. However, I am not getting what you mean by "Search index cannot be updated." Are you saying that it cannot be updated at all (maybe blow up the database and do over?) or that it has no update method? What if I update the sets and then create the index again? That would be creating a brand new index. To me, that's a method of updating.

Please clarify.

Thanks so much,

Incorrect output during Query

I have two files where some tokens are disjoint between the files (regardless of the set they belong to). The following line filters any tokens from the query set that are not present in the index file:

s1 = np.sort([self.order[token] for token in s if token in self.order])

This generates erroneous Jaccard calculations as the filtered size of the query set is being used.

I have included some data to reproduce this:
jaccard.csv
query.txt
index.txt

set_ID_x set_ID_y set_size_x set_size_y similarity
CP000438.1:1214715 PAGI-9|6|6 5 1 0.5

Here is a reduced dataset:
index.txt
query.txt

Debug Output
2022-07-12 17:16:30,568: Reading set tuples from /home/nolan/Projects/curated_GIs/gene_order.txt (reversed=False)...
2022-07-12 17:16:30,591: 24737 tuples read.
2022-07-12 17:16:30,591: Creating sets...
2022-07-12 17:16:30,600: 379 sets created.
2022-07-12 17:16:30,601: Reading set tuples from /home/nolan/Projects/curated_GIs/wild_gis/pseudomonas/genes_blast_sets.tabular (reversed=False)...
2022-07-12 17:16:30,672: 67023 tuples read.
2022-07-12 17:16:30,672: Creating sets...
2022-07-12 17:16:30,682: 3010 sets created.
2022-07-12 17:16:30,687: Building search index on 379 sets.
2022-07-12 17:16:30,687: Building SearchIndex on 379 sets.
2022-07-12 17:16:30,687: Start frequency transform.
2022-07-12 17:16:30,687: Applying frequency order transform on tokens...
2022-07-12 17:16:30,700: Done applying frequency order.
2022-07-12 17:16:30,700: Finish frequency transform, 14048 tokens in total.
2022-07-12 17:16:30,700: Start indexing sets.
2022-07-12 17:16:30,705: Finished indexing sets.
2022-07-12 17:16:30,705: Finished building search index.
2022-07-12 17:16:30,705: Find pairs with similarity >= 0.5.
2022-07-12 17:16:30,705: 29 original tokens and 10 tokens after applying frequency order.
2022-07-12 17:16:30,705: 1 candidates found.
2022-07-12 17:16:30,705: 0 verified sets found.
2022-07-12 17:16:30,705: 9 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,705: 0 candidates found.
2022-07-12 17:16:30,705: 0 verified sets found.
2022-07-12 17:16:30,705: 29 original tokens and 11 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 25 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 53 original tokens and 18 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 31 original tokens and 11 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 5 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 3 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 24 original tokens and 13 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 44 original tokens and 14 tokens after applying frequency order.
2022-07-12 17:16:30,706: 1 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 13 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 19 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 69 original tokens and 22 tokens after applying frequency order.
2022-07-12 17:16:30,706: 1 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 11 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 34 original tokens and 11 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 55 original tokens and 18 tokens after applying frequency order.
2022-07-12 17:16:30,707: 1 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 9 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 13 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 1 original tokens and 0 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 55 original tokens and 20 tokens after applying frequency order.
2022-07-12 17:16:30,707: 1 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 31 original tokens and 8 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 15 original tokens and 8 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 23 original tokens and 8 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,708: 5 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 3 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 5 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 17 original tokens and 8 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 5 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 17 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 5 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 24 original tokens and 8 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 9 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 11 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 21 original tokens and 13 tokens after applying frequency order.
2022-07-12 17:16:30,708: 1 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 11 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 32 original tokens and 12 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 23 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 11 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 40 original tokens and 20 tokens after applying frequency order.
2022-07-12 17:16:30,709: 3 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 25 original tokens and 10 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 19 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 17 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 44 original tokens and 13 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 9 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,710: 23 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 50 original tokens and 17 tokens after applying frequency order.
2022-07-12 17:16:30,710: 1 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 11 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 9 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 7 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 9 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 35 original tokens and 17 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 1 original tokens and 0 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 55 original tokens and 17 tokens after applying frequency order.
2022-07-12 17:16:30,710: 1 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,711: 17 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 15 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 30 original tokens and 14 tokens after applying frequency order.
2022-07-12 17:16:30,711: 1 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 23 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 11 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 39 original tokens and 15 tokens after applying frequency order.
2022-07-12 17:16:30,711: 1 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 18 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 13 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 17 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,712: 75 original tokens and 29 tokens after applying frequency order.
2022-07-12 17:16:30,712: 2 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 83 original tokens and 30 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 16 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 13 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 14 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 23 original tokens and 9 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 7 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 15 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 41 original tokens and 14 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 59 original tokens and 20 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 43 original tokens and 17 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 9 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 7 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 32 original tokens and 9 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 9 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 39 original tokens and 11 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 24 original tokens and 9 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 13 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 9 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 13 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 5 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 21 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 117 original tokens and 51 tokens after applying frequency order.
2022-07-12 17:16:30,714: 4 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 35 original tokens and 11 tokens after applying frequency order.
2022-07-12 17:16:30,714: 1 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,715: 52 original tokens and 16 tokens after applying frequency order.
2022-07-12 17:16:30,715: 1 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 9 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 15 original tokens and 9 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 23 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 13 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 17 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 39 original tokens and 12 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 11 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 11 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,716: 79 original tokens and 26 tokens after applying frequency order.
2022-07-12 17:16:30,716: 1 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 15 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 11 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 9 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 29 original tokens and 9 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 17 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
...
2022-07-12 17:16:30,944: 17 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,945: 0 candidates found.
2022-07-12 17:16:30,945: 0 verified sets found.
2022-07-12 17:16:30,945: 11 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,945: 0 candidates found.
2022-07-12 17:16:30,945: 0 verified sets found.
2022-07-12 17:16:30,945: Found 21 pairs.
2022-07-12 17:16:30,945: Average query time: 7.890807433777869e-05.
2022-07-12 17:16:30,945: Median query time: 7.890807433777869e-05.
2022-07-12 17:16:30,945: 90pct query time: 0.00011157989501953125.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.