ekzhu / setsimilaritysearch Goto Github PK
View Code? Open in Web Editor NEWAll-pair set similarity search on millions of sets in Python and on a laptop
License: Apache License 2.0
All-pair set similarity search on millions of sets in Python and on a laptop
License: Apache License 2.0
Hi, this package looks really cool and I'd love to use it for my use case.
I have about 7,000 sets with about 1,000 elements each that I'm using as my index. I also have a set of about 1,000 queries with similar sizes, about 1,000 elements each, as queries. However, when I profile the times for queries for this package vs. datasketch
's MinHashLSHEnsemble
method, the results are pretty wildly off-base from the numbers presented in the readme.
In general, a single minhash LSH ensemble query in my case is taking about 10ms, and the SetSimilarity query is taking anywhere from 300ms to 500ms, even whole seconds in some cases. Are these numbers to be expected, and is SetSimilaritySearch simply not suitable for sets this large? My sets are exclusively integers, if that matters.
Any insight or help is appreciated.
Is there any possibility of integration using redis or cassandra as already Minhash LSH has?
Firstly thanks for this package and the datasketch one—they're both great.
I noticed some unexpected behaviour when using the all_pairs
function with input data that aren't set-like:
sets = [["a", "b"], ["a", "a"]]
list(all_pairs(sets, similarity_func_name="jaccard", similarity_threshold=0.1))
# [(1, 0, 1.0)]
sets = [["a", "a", "b"], ["a", "a"]]
list(all_pairs(sets, similarity_func_name="jaccard", similarity_threshold=0.1))
# [(1, 0, 1.5)]
I assume that this package doesn't support multisets, and that the outputs in such cases are undefined (setting the threshold to 0.75 in the second example leads to an empty result set, for instance), but if that's the case perhaps it would be a good idea to make this explicit in the documentation, and to mention that it's the user's responsibility to ensure that there are no duplicates in their input sets/lists.
In my case this simply means that I have to convert my lists to sets before passing them to all_pairs
, but it did catch me off guard because that step wouldn't be necessary if I were applying MinHash LSH.
Currently
https://github.com/ekzhu/SetSimilaritySearch/blob/master/SetSimilaritySearch/search.py#L27
https://github.com/ekzhu/SetSimilaritySearch/blob/master/SetSimilaritySearch/all_pairs.py#L28
state:
if not isinstance(sets, list) or len(sets) == 0:
raise ValueError("Input parameter sets must be a non-empty list.")
I propose to change this to:
if not isinstance(sets, Iterable) or len(sets) == 0:
raise ValueError("Input parameter sets must be a non-empty iterable.")
Which then allows inputs as tuple
as well, as well as ordered key-sets. Was helpful in my use case, rather than having to create a copy of the data in list
form.
Setting up a PR from my fork, let me know what you think.
Hi - I noticed an unexpected ordering of x,y coordinates when converting the output of all_pairs
to an identity matrix. Here's the behavior that I see with identical sets:
import numpy as np
from SetSimilaritySearch import all_pairs
import random
nsets = 10
population = list(range(100))
sets = [set(population) for i in range(nsets)]
coords = all_pairs(sets, similarity_threshold=0)
arr = np.nan * np.empty((nsets, nsets))
x, y, z = zip(*coords)
arr[x, y] = z
print(np.round(arr, 2))
The output is a nice lower-triangular matrix.
[[nan nan nan nan nan nan nan nan nan nan]
[ 1. nan nan nan nan nan nan nan nan nan]
[ 1. 1. nan nan nan nan nan nan nan nan]
[ 1. 1. 1. nan nan nan nan nan nan nan]
[ 1. 1. 1. 1. nan nan nan nan nan nan]
[ 1. 1. 1. 1. 1. nan nan nan nan nan]
[ 1. 1. 1. 1. 1. 1. nan nan nan nan]
[ 1. 1. 1. 1. 1. 1. 1. nan nan nan]
[ 1. 1. 1. 1. 1. 1. 1. 1. nan nan]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. nan]]
But when the sets are not identical, the x and y indices seem to be ordered arbitrarily:
sets = [set(population) - set(random.choices(population, k=10)) for i in range(nsets)]
coords = all_pairs(sets, similarity_threshold=0)
arr = np.nan * np.empty((nsets, nsets))
x, y, z = zip(*coords)
arr[x, y] = z
print(np.round(arr, 2))
[[ nan nan nan nan nan nan nan nan nan nan]
[0.83 nan 0.83 0.81 0.81 nan 0.81 nan 0.87 nan]
[0.84 nan nan nan nan nan nan nan nan nan]
[0.8 nan 0.8 nan nan nan nan nan nan nan]
[0.84 nan 0.8 0.82 nan nan nan nan nan nan]
[0.84 0.83 0.82 0.82 0.84 nan 0.84 0.83 0.82 nan]
[0.84 nan 0.84 0.82 0.8 nan nan nan nan nan]
[0.81 0.84 0.83 0.87 0.81 nan 0.87 nan 0.83 nan]
[0.82 nan 0.82 0.82 0.82 nan 0.82 nan nan nan]
[0.82 0.87 0.82 0.86 0.84 0.84 0.84 0.85 0.82 nan]]
I can restore the lower-triangular matrix by adding the following line before assigning to the array:
x, y = zip(*[sorted(pair) for pair in zip(x, y)])
So I can still accomplish what I want with minimal difficulty, but I thought I'd let you know because it seems like generating an identity matrix might be a common use case and the behavior is a bit surprising.
Thanks a lot for sharing this project!
I noticed that not all possible pairs are returned when I tried the all_pairs function and that even with threshold = 0.0, the lowest values are still above 0. Is there a possibility to get all pairwise similarities?
Hi,
I'm trying to use the all_pairs() function to find all the (near-)duplicates in a set of about 14000 text documents (after turning them into ngram shingles first). However, I'm running against MemoryErrors crashing my script, despite the virtual machine I work on having 16GB of RAM. I've checked, and indeed the entire RAM and swap get maxed out before the script stops working.
Do you have any advice on how to reduce RAM usage, or any indication of how much memory the algorithm uses? I don't run into any memory issues when I use LSH with datasketch, but I'd rather have an exact list of duplicates.
Hi,
Interesting work. I'd like to evaluate it for my use case. However, I am not getting what you mean by "Search index cannot be updated." Are you saying that it cannot be updated at all (maybe blow up the database and do over?) or that it has no update method? What if I update the sets and then create the index again? That would be creating a brand new index. To me, that's a method of updating.
Please clarify.
Thanks so much,
Hello ekzhu,may I join your program?
I have two files where some tokens are disjoint between the files (regardless of the set they belong to). The following line filters any tokens from the query set that are not present in the index file:
This generates erroneous Jaccard calculations as the filtered size of the query set is being used.
I have included some data to reproduce this:
jaccard.csv
query.txt
index.txt
set_ID_x | set_ID_y | set_size_x | set_size_y | similarity |
---|---|---|---|---|
CP000438.1:1214715 | PAGI-9|6|6 | 5 | 1 | 0.5 |
Here is a reduced dataset:
index.txt
query.txt
2022-07-12 17:16:30,568: Reading set tuples from /home/nolan/Projects/curated_GIs/gene_order.txt (reversed=False)...
2022-07-12 17:16:30,591: 24737 tuples read.
2022-07-12 17:16:30,591: Creating sets...
2022-07-12 17:16:30,600: 379 sets created.
2022-07-12 17:16:30,601: Reading set tuples from /home/nolan/Projects/curated_GIs/wild_gis/pseudomonas/genes_blast_sets.tabular (reversed=False)...
2022-07-12 17:16:30,672: 67023 tuples read.
2022-07-12 17:16:30,672: Creating sets...
2022-07-12 17:16:30,682: 3010 sets created.
2022-07-12 17:16:30,687: Building search index on 379 sets.
2022-07-12 17:16:30,687: Building SearchIndex on 379 sets.
2022-07-12 17:16:30,687: Start frequency transform.
2022-07-12 17:16:30,687: Applying frequency order transform on tokens...
2022-07-12 17:16:30,700: Done applying frequency order.
2022-07-12 17:16:30,700: Finish frequency transform, 14048 tokens in total.
2022-07-12 17:16:30,700: Start indexing sets.
2022-07-12 17:16:30,705: Finished indexing sets.
2022-07-12 17:16:30,705: Finished building search index.
2022-07-12 17:16:30,705: Find pairs with similarity >= 0.5.
2022-07-12 17:16:30,705: 29 original tokens and 10 tokens after applying frequency order.
2022-07-12 17:16:30,705: 1 candidates found.
2022-07-12 17:16:30,705: 0 verified sets found.
2022-07-12 17:16:30,705: 9 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,705: 0 candidates found.
2022-07-12 17:16:30,705: 0 verified sets found.
2022-07-12 17:16:30,705: 29 original tokens and 11 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 25 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 53 original tokens and 18 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 31 original tokens and 11 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 5 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 3 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 24 original tokens and 13 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 44 original tokens and 14 tokens after applying frequency order.
2022-07-12 17:16:30,706: 1 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 13 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 19 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,706: 0 candidates found.
2022-07-12 17:16:30,706: 0 verified sets found.
2022-07-12 17:16:30,706: 69 original tokens and 22 tokens after applying frequency order.
2022-07-12 17:16:30,706: 1 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 11 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 34 original tokens and 11 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 55 original tokens and 18 tokens after applying frequency order.
2022-07-12 17:16:30,707: 1 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 9 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 13 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 1 original tokens and 0 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 55 original tokens and 20 tokens after applying frequency order.
2022-07-12 17:16:30,707: 1 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 31 original tokens and 8 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 15 original tokens and 8 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,707: 23 original tokens and 8 tokens after applying frequency order.
2022-07-12 17:16:30,707: 0 candidates found.
2022-07-12 17:16:30,707: 0 verified sets found.
2022-07-12 17:16:30,708: 5 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 3 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 5 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 17 original tokens and 8 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 5 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 17 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 5 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 24 original tokens and 8 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 9 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 11 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 21 original tokens and 13 tokens after applying frequency order.
2022-07-12 17:16:30,708: 1 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 11 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,708: 0 candidates found.
2022-07-12 17:16:30,708: 0 verified sets found.
2022-07-12 17:16:30,708: 32 original tokens and 12 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 23 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 11 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 40 original tokens and 20 tokens after applying frequency order.
2022-07-12 17:16:30,709: 3 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 25 original tokens and 10 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 19 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 17 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 44 original tokens and 13 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 9 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,709: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,709: 0 candidates found.
2022-07-12 17:16:30,709: 0 verified sets found.
2022-07-12 17:16:30,710: 23 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 50 original tokens and 17 tokens after applying frequency order.
2022-07-12 17:16:30,710: 1 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 11 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 9 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 7 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 9 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 35 original tokens and 17 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 1 original tokens and 0 tokens after applying frequency order.
2022-07-12 17:16:30,710: 0 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,710: 55 original tokens and 17 tokens after applying frequency order.
2022-07-12 17:16:30,710: 1 candidates found.
2022-07-12 17:16:30,710: 0 verified sets found.
2022-07-12 17:16:30,711: 17 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 15 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 30 original tokens and 14 tokens after applying frequency order.
2022-07-12 17:16:30,711: 1 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 23 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 11 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 39 original tokens and 15 tokens after applying frequency order.
2022-07-12 17:16:30,711: 1 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 18 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 13 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,711: 17 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,711: 0 candidates found.
2022-07-12 17:16:30,711: 0 verified sets found.
2022-07-12 17:16:30,712: 75 original tokens and 29 tokens after applying frequency order.
2022-07-12 17:16:30,712: 2 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 83 original tokens and 30 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 16 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 13 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 14 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 23 original tokens and 9 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 7 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 15 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,712: 0 candidates found.
2022-07-12 17:16:30,712: 0 verified sets found.
2022-07-12 17:16:30,712: 41 original tokens and 14 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 59 original tokens and 20 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 43 original tokens and 17 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 9 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 7 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 32 original tokens and 9 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 9 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,713: 0 candidates found.
2022-07-12 17:16:30,713: 0 verified sets found.
2022-07-12 17:16:30,713: 39 original tokens and 11 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 24 original tokens and 9 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 13 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 9 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 13 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 5 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 21 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 117 original tokens and 51 tokens after applying frequency order.
2022-07-12 17:16:30,714: 4 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 35 original tokens and 11 tokens after applying frequency order.
2022-07-12 17:16:30,714: 1 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,714: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,714: 0 candidates found.
2022-07-12 17:16:30,714: 0 verified sets found.
2022-07-12 17:16:30,715: 52 original tokens and 16 tokens after applying frequency order.
2022-07-12 17:16:30,715: 1 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 9 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 15 original tokens and 9 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 23 original tokens and 7 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 1 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 13 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 17 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 39 original tokens and 12 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 11 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 11 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,715: 5 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,715: 0 candidates found.
2022-07-12 17:16:30,715: 0 verified sets found.
2022-07-12 17:16:30,716: 79 original tokens and 26 tokens after applying frequency order.
2022-07-12 17:16:30,716: 1 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 15 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 3 original tokens and 1 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 11 original tokens and 4 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 9 original tokens and 3 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 29 original tokens and 9 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 3 original tokens and 2 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
2022-07-12 17:16:30,716: 17 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,716: 0 candidates found.
2022-07-12 17:16:30,716: 0 verified sets found.
...
2022-07-12 17:16:30,944: 17 original tokens and 6 tokens after applying frequency order.
2022-07-12 17:16:30,945: 0 candidates found.
2022-07-12 17:16:30,945: 0 verified sets found.
2022-07-12 17:16:30,945: 11 original tokens and 5 tokens after applying frequency order.
2022-07-12 17:16:30,945: 0 candidates found.
2022-07-12 17:16:30,945: 0 verified sets found.
2022-07-12 17:16:30,945: Found 21 pairs.
2022-07-12 17:16:30,945: Average query time: 7.890807433777869e-05.
2022-07-12 17:16:30,945: Median query time: 7.890807433777869e-05.
2022-07-12 17:16:30,945: 90pct query time: 0.00011157989501953125.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.