mezis / fuzzily Goto Github PK
View Code? Open in Web Editor NEWFast fuzzy string searching/matching for Rails
License: MIT License
Fast fuzzy string searching/matching for Rails
License: MIT License
allow_any_instance_of(User).to receive(:update_fuzzy_full_name!)
seems to not work in what is a pretty typical spec configration
In the project readme a way to increase performance is to change the data type of VARCHAR of owner_type and fuzzy_field to an ENUM. I researched this online and wasn't sure how to implement this. Do I create an enum with a few values? eg ENUM(1,2,3) or ENUM ('one', 'two', 'three')? Sorry, I don't know the code enough to know what I should change it to. Thank you!
Hello,
I always get an array of nil results after the fuzzy search.
Rails 4.0.1 + Ruby 2.0.0
Example:
results = Appellation.find_by_fuzzy_name('ote du roussillo', :limit => 5)
gives me some right SQL:
Trigram Load (64.1ms) SELECT owner_id, owner_type, count(_) AS matches, MAX(score) AS score FROM trigrams
WHERE trigrams
.owner_type
= 'Appellation' AND trigrams
.fuzzy_field
= 'name' AND trigrams
.trigram
IN ('__o', 'ot', 'ote', 'te', 'e_d', 'du', 'du', 'u_r', '_ro', 'rou', 'ous', 'uss', 'ssi', 'sil', 'ill', 'llo', 'lo*') GROUP BY owner_id, owner_type ORDER BY matches DESC, score ASC LIMIT 5 OFFSET 0
Appellation Load (1.5ms) SELECT appellations
.* FROM appellations
WHERE appellations
.id
IN (212, 213, 217, 214, 216)
but :
logger.info results
give me:
[nil, nil, nil, nil, nil]
I run the SQL in PMA, and results are good.
I didn't understand why I got an array of nil. Is-it an issue ?
Thanks
Alex
I've bundled the gem, ran the trigram migration, and my model looks like this:
class Color < ActiveRecord::Base
include Fuzzily::Model
fuzzily_searchable :name
end
When I try to create a new record I get the following stack trace:
NoMethodError: undefined method `trigram' for #<Color:0x007f8dce577700>
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activemodel-3.2.14/lib/active_model/attribute_methods.rb:407:in `method_missing'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activerecord-3.2.14/lib/active_record/attribute_methods.rb:149:in `method_missing'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activemodel-3.2.14/lib/active_model/validator.rb:151:in `block in validate'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activemodel-3.2.14/lib/active_model/validator.rb:150:in `each'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activemodel-3.2.14/lib/active_model/validator.rb:150:in `validate'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activesupport-3.2.14/lib/active_support/callbacks.rb:310:in `_callback_before_177'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activesupport-3.2.14/lib/active_support/callbacks.rb:418:in `_run__1728579470845015111__validate__2761590767353581502__callbacks'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activesupport-3.2.14/lib/active_support/callbacks.rb:405:in `__run_callback'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activesupport-3.2.14/lib/active_support/callbacks.rb:385:in `_run_validate_callbacks'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activesupport-3.2.14/lib/active_support/callbacks.rb:81:in `run_callbacks'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activemodel-3.2.14/lib/active_model/validations.rb:228:in `run_validations!'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activemodel-3.2.14/lib/active_model/validations/callbacks.rb:53:in `block in run_validations!'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activesupport-3.2.14/lib/active_support/callbacks.rb:403:in `_run__1728579470845015111__validation__2761590767353581502__callbacks'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activesupport-3.2.14/lib/active_support/callbacks.rb:405:in `__run_callback'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activesupport-3.2.14/lib/active_support/callbacks.rb:385:in `_run_validation_callbacks'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activesupport-3.2.14/lib/active_support/callbacks.rb:81:in `run_callbacks'
... 3 levels...
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activerecord-3.2.14/lib/active_record/validations.rb:77:in `perform_validations'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activerecord-3.2.14/lib/active_record/validations.rb:56:in `save!'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activerecord-3.2.14/lib/active_record/attribute_methods/dirty.rb:33:in `save!'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activerecord-3.2.14/lib/active_record/transactions.rb:264:in `block in save!'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activerecord-3.2.14/lib/active_record/transactions.rb:313:in `block in with_transaction_returning_status'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activerecord-3.2.14/lib/active_record/connection_adapters/abstract/database_statements.rb:192:in `transaction'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activerecord-3.2.14/lib/active_record/transactions.rb:208:in `transaction'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activerecord-3.2.14/lib/active_record/transactions.rb:311:in `with_transaction_returning_status'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activerecord-3.2.14/lib/active_record/transactions.rb:264:in `save!'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/activerecord-3.2.14/lib/active_record/validations.rb:41:in `create!'
from (irb):1
from ~/.rvm/gems/ruby-1.9.3-p194/gems/railties-3.2.14/lib/rails/commands/console.rb:47:in `start'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/railties-3.2.14/lib/rails/commands/console.rb:8:in `start'
from ~/.rvm/gems/ruby-1.9.3-p194/gems/railties-3.2.14/lib/rails/commands.rb:41:in `<top (required)>'
from script/rails:6:in `require'
Everything seemed to work as planned until I actually ran:
Business.find_by_fuzzy_name('Bay')
...and received:
Trigram Load (1.6ms) SELECT owner_id, owner_type, count(*) AS matches, score FROM "trigrams" WHERE "trigrams"."owner_type" = 'Business' AND "trigrams"."fuzzy_field" = 'name' AND "trigrams"."trigram" IN ('**b', '*ba', 'bay', 'ay*') GROUP BY "trigrams"."owner_id" ORDER BY matches DESC, score ASC LIMIT 10
ActiveRecord::StatementInvalid: PG::Error: ERROR: column "trigrams.owner_type" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT owner_id, owner_type, count(*) AS matches, score FRO...
Trying to see if I can wire this up to a rails 5 api-only application, I've got the trigram model and migration set up correctly, but in my AR model when I include fuzzily_searchable :name
(yes its actually called :name in my model) I get the undefined method fuzzily_searchable
error.
In the readme it is explained like this:
class AddTrigramsModel < ActiveRecord::Migration
extend Fuzzily::Migration
trigrams_table_name = :custom_trigrams
end
As far as I could see, the trigrams_table_name should be an instance var:
class AddTrigramsModel < ActiveRecord::Migration
extend Fuzzily::Migration
@trigrams_table_name = :custom_trigrams
end
It should be worth noting the migration is incompatible with setups using UUID's for the owner id.
Copying the migration code directly into the migration and changing t.integer :owner_id
to t.uuid :owner_id
fixes this.
I have classes like this:
class Person < ActiveRecord::Base
fuzzily_searchable :name
end
class Employee < Person; end
class Freelancer < Person; end
The following returns 0 records:
Freelancer.find_by_fuzzy_name 'john'
This is because the polymorphic trigrams.owner_type
is queried as 'Freelancer'
not 'Person'
.
Hello,
I have a bad fuzzy result if I use a model with international fields.
Model:
class Category < ActiveRecord::Base
fuzzily_searchable :name_fr, :name_en, :name_de, :name_es, :name_it
end
Fill the trigrams:
Category.bulk_update_fuzzy_name_en
Category.bulk_update_fuzzy_name_fr
Category.bulk_update_fuzzy_name_de
Category.bulk_update_fuzzy_name_es
Category.bulk_update_fuzzy_name_it
Fuzzily search:
category = Category.find_by_fuzzy_name('rouge', :limit => 1).first
=> gives me "rose" as result
What I need to do:
category = Category.find_by_fuzzy_name_fr('rouge', :limit => 1).first
=> gives me "rouge" as result
So I need to set the language in the search method to catch the right result.
The problem is I want to make the search to work without knowing the language.
Is-this possible ?
I have a really long article with a "body" field that is fuzzily searchable. At some point it starts taking a long time for fuzzily to re-index / update the search trigrams for the article when I click "update" on my web form and it does the article.save! in my rails application.
Eventually it crashes with this error!
Trigram Exists (0.7ms) SELECT 1 AS one FROM "trigrams" WHERE "trigrams"."trigram" = $1 AND "trigrams"."owner_type" = $2 AND "trigrams"."owner_id" = $3 AND "trigrams"."fuzzy_field" = $4 LIMIT $5 [["trigram", "**<"], ["owner_type", "Article"], ["owner_id", 1094], ["fuzzy_field", "body"], ["LIMIT", 1]]
App 11471 output: (0.5ms) ROLLBACK
App 11471 output: Completed 500 Internal Server Error in 517ms (ActiveRecord: 84.7ms)
App 11471 output:
App 11471 output: ActiveModel::RangeError (35693 is out of range for ActiveModel::Type::Integer with limit 2 bytes):
This is bad. I believe it's because the trigram "score" is really high and because of this line:
t.integer "score", limit: 2
Specifically I think it's because it's an article with a lot of html symbols and it seems to be searching for a trigram on "**<" which is a little odd, but ok.
In the trigrams migration. I need this limit to be higher I guess. removing the 2-byte limit with a migration, date_update_trigrams_score.rb ...
class UpdateTrigramsScore < ActiveRecord::Migration[5.2]
# allow higher fuzzily scores for long searchable article bodies...
def change
change_column :trigrams, :score, :integer
end
end
and my problem is resolved. So, maybe 2 bytes is a little too low for normal usage on the trigrams score?
I ve been checking out fuzzily gem it greatly helps. It would be great if there is a rank for suggestions returned. I know that the best suggestion is the first result. If there is a way to give point for each suggestion (say 0 => Exact match, 0.2 => deviates to some extend, 0.9 => deviates to a great extend), it would be really great.
Say I'm doing Product.find_by_fuzzy_name("Call of D")
, is there any way I can get the scores as well?
find_by_fuzzy can be applied to a class but not to an ActiveRecord::Relation (or a scope). A typical case when this would be very useful is when you want to make an exact match on one attribute followed by a fuzzy match on another attribute, like:
Person.where("country=?", theCountry).find_by_fuzzy_name(theName)
I'm hitting this each time I try to run rake db:migrate. Any thoughts as to why?
I am trying out fuzzily v0.3.0 on a small (~360) set of records, in Rails 3.0.x.
When I search for a string, many of the results in the default set of 10 are nil
. Is this expected? It makes pagination rather difficult.
Also the matches are extremely fuzzy, i.e. results often don't contain all the letters of the search string. For example aceven
returns Ace Ventura
as expected, but also Agentur Nina Klein
– which is not what I want. Am I doing something wrong or does fuzzily not require all the letters in the search string to be present in the results, in the same order?
Thanks in advance.
Is it possible to use redis as the trigram store instead of in the database? I'm trying to find ways to cut down the 200-300ms response time I'm getting right now for 1000 rows of data for fuzzy searching. I'm assuming this would change the build task and am not proficient enough to do it in the code. Thanks!
Is it possible to get distinct values from the database using fuzilly.
SearchCategory.distinct(:name).find_by_fuzzy_name(term, limit: limit)
gives
SELECT DISTINCT businesses.* FROM businesses WHERE businesses.deleted_at IS NULL AND businesses.id IN (1, 2, 3, 10, 9)
and I want
SELECT DISTINCT businesses.name FROM businesses WHERE businesses.deleted_at IS NULL AND businesses.id IN (1, 2, 3, 10, 9)
Please help.
When creating a new or editing an existing object whose class model has a fuzzy_searchable attribute I get a mass assignment violation error. This happens when I do an "update_attributes" with a hash, even if the hash does not contain any of the attributes (score, trigram, owner_type). If I change the "update_attributes", the error appears at save. I use Rails 3.2.12 with Ruby 1.9.3
Oh, andI tried adding attr_accessible for the three attributes after the call to fuzzily_searchable in my model, but it didn't change anything
When I am using fuzzily with Rails 4, it has problem with method Model.find_by_fuzzy_field. I indexed two models, User and Car. When I call Car.find_by_fuzzy_name("toyota"), the records with User are also returned. The order of the results are not so correct as well. The car with exact name "toyota" are not the first record in the search result.
The scoped method is deprecated in Rails 4. So there are a bunch of warnings as well.
Any plan to make it Rails 4 compatible? Thanks.
Running into an issue when running bulk updates on existing data.
Whenever a trigram create is attempted on a field with a nil value, I receive the following bomb:
** Execute environment
** Execute fuzzily:bulk_update
Running Router#bulk_update_fuzzy_name
Running Router#bulk_update_fuzzy_dns_name
rake aborted!
undefined method `force_encoding' for nil:Fuzzily::String
/Users/nickbender/.rvm/gems/ruby-1.9.3-p448/gems/activesupport-3.2.13/lib/active_support/multibyte/chars.rb:45:in `initialize'
../ruby-1.9.3-p448/gems/fuzzily-0.2.3/lib/fuzzily/trigram.rb:22:in `new'
../ruby-1.9.3-p448/gems/fuzzily-0.2.3/lib/fuzzily/trigram.rb:22:in `normalize'
../ruby-1.9.3-p448/gems/fuzzily-0.2.3/lib/fuzzily/trigram.rb:7:in `trigrams'
../ruby-1.9.3-p448/gems/fuzzily-0.2.3/lib/fuzzily/trigram.rb:13:in `scored_trigrams'
../ruby-1.9.3-p448/gems/fuzzily-0.2.3/lib/fuzzily/searchable.rb:57:in `block (3 levels) in make_field_fuzzily_searchable'
../ruby-1.9.3-p448/gems/fuzzily-0.2.3/lib/fuzzily/searchable.rb:55:in `each'
../ruby-1.9.3-p448/gems/fuzzily-0.2.3/lib/fuzzily/searchable.rb:55:in `block (2 levels) in make_field_fuzzily_searchable'
../ruby-1.9.3-p448/gems/activerecord-3.2.13/lib/active_record/relation/batches.rb:72:in `find_in_batches'
../ruby-1.9.3-p448/gems/fuzzily-0.2.3/lib/fuzzily/searchable.rb:53:in `block in make_field_fuzzily_searchable'
../services/lib/tasks/fuzzy.rake:21:in `block (4 levels) in <top (required)>'
../services/lib/tasks/fuzzy.rake:19:in `each'
../services/lib/tasks/fuzzy.rake:19:in `block (3 levels) in <top (required)>'
../services/lib/tasks/fuzzy.rake:13:in `each'
../services/lib/tasks/fuzzy.rake:13:in `block (2 levels) in <top (required)>'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/task.rb:236:in `call'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/task.rb:236:in `block in execute'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/task.rb:231:in `each'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/task.rb:231:in `execute'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/task.rb:175:in `block in invoke_with_call_chain'
../ruby-1.9.3-p448/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/task.rb:168:in `invoke_with_call_chain'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/task.rb:161:in `invoke'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/application.rb:149:in `invoke_task'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/application.rb:106:in `block (2 levels) in top_level'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/application.rb:106:in `each'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/application.rb:106:in `block in top_level'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/application.rb:115:in `run_with_threads'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/application.rb:100:in `top_level'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/application.rb:78:in `block in run'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/application.rb:165:in `standard_exception_handling'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/lib/rake/application.rb:75:in `run'
../ruby-1.9.3-p448@global/gems/rake-10.1.0/bin/rake:33:in `<top (required)>'
../ruby-1.9.3-p448@global/bin/rake:23:in `load'
../ruby-1.9.3-p448@global/bin/rake:23:in `<main>'
../ruby-1.9.3-p448/bin/ruby_noexec_wrapper:14:in `eval'
../ruby-1.9.3-p448/bin/ruby_noexec_wrapper:14:in `<main>'
Do all indexed fields have to have a value? If so, is there a way to override or skip this behavior?
The README says:
your searchable fields do not have to be stored, they can be dynamic methods too
I have a class like this:
class Employee < ActiveRecord::Base
fuzzily_searchable :name
def name
"#{first_name} #{last_name}"
end
end
– where :first_name
and :last_name
are columns in my employees
table.
Fuzzily happily indexes the names and searches them. However any update to a record throws an error because fuzzily's after_save
callback executes:
record.send("#{field}_changed?".to_sym)
– and the name_changed?
method doesn't exist (and isn't created by active record).
We have been using this gem in applications and found it to be pretty nifty.
Would you be open to sharing the roadmap for upgrades and support for this gem?
Are you looking for new maintainers?
Do let me know
Thanks
Using the fuzzily gem and getting this error ActiveRecord::StatementInvalid: PG::UndefinedColumn: ERROR: column "owner_id" does not exist
LINE 1: SELECT owner_id, owner_type, count() AS matches, MAX(score...
^
: SELECT owner_id, owner_type, count() AS matches, MAX(score) AS score FROM "trigrams" WHERE "trigrams"."owner_type" = $1 AND "trigrams"."fuzzy_field" = $2 AND "trigrams"."trigram" IN ($3, $4, $5) GROUP BY owner_id, owner_type ORDER BY matches DESC, score ASC LIMIT $6 OFFSET $7
from /home/amit/.rvm/rubies/ruby-2.6.3/lib/ruby/gems/2.6.0/gems/activerecord-5.2.3/lib/active_record/connection_adapters/postgresql_adapter.rb:611:in `async_exec_params'
Caused by PG::UndefinedColumn: ERROR: column "owner_id" does not exist
LINE 1: SELECT owner_id, owner_type, count(*) AS matches, MAX(score...
Hi, thanks for such a useful gem.
It seems like search doesn't work for numbers?
The rest seems great.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.