Comments (26)
For now, I'm just setting it manually before I build the ActiveRecord query.
ActiveRecord::Base.connection.execute("SELECT set_limit(0.2);")
I imagine the easiest way to do it if you wanted to override the default would be to make a configuration option that would take effect globally.
from textacular.
I'm also having this issue, thanks @smooshy for the suggestion to run the set_limit query before the search.
One way might be to give fuzzy_search the limit like this: scope.fuzzy_search(query, 0.2)
. What do you think?
from textacular.
I came up with this solution for Ruby on Rails:
SIMILARITY_LIMIT = '0.1'
ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.set_callback :checkout, :after do
raw_connection.exec("set_limit(#{SIMILARITY_LIMIT});")
end
from textacular.
I actually ended up wrapping it in a
ActiveSupport.on_load(:active_record) do; end
block and that worked. thanks!
from textacular.
Can you propose an API? In general, Textacular is meant to be super simple. For very tricky stuff, the more complex pg_search gem might be a better fit.
from textacular.
Right now, Textacular doesn't really have configuration, so... Where would you specify this?
from textacular.
I have a service object that makes the call to search so I just call it there. Works for my situation, but not very flexible.
from textacular.
If I were to add a configuration, I might consider application.rb:
config.textacular.pgtrgm_limit = 0.2
from textacular.
Something in config.textacular
sounds nice. Perhaps #threshold
and #threshold=
methods?
Documentation for these methods should state that this affects the global state of the Postgres database, so that all trigram searches will use this new threshold limit.
from textacular.
Remember that Rails is not the only environment where we'd like Textacular to be useful. So, whatever solution we have should have a stand-alone interface. If there's sugar to be added when you're using Rails, then great. But we shouldn't forget about the other case.
from textacular.
A configuration option would be very useful. @ecin has already stated it well.
from textacular.
That suggestion would only work in Rails land. I'm interested in a solution that doesn't expand the dependencies of this gem. Is it worth introducing some kind of configuration object within Textacular?
Also, IIRC, this has to be called once per connection. Or am I mistaken and there's a way to set it globally? I guess we could iterate over all of ActiveRecord's connections to fix them all for the life of the process? Is that something AR exposes publicly?
from textacular.
Config var should live in Textacular.fuzzy_treshold
(probably backed by https://github.com/steveklabnik/request_store or Thread.current
). But I wonder if we need to send this before every query. We can send it in every query also.
SELECT set_limit(0.1);
SELECT "posts".*, COALESCE(similarity("posts"."name", 'marcela'), 0) AS "rank64611571751258688"
FROM "posts"
WHERE (("posts"."name" % 'marcela'))
ORDER BY "rank64611571751258688" DESC
vs
SELECT "posts".*, set_limit(0.1) AS textacular_treshold, COALESCE(similarity("posts"."name", 'marcela'), 0) AS "rank64611571751258688"
FROM "posts"
WHERE (("posts"."name" % 'marcela'))
ORDER BY "rank64611571751258688" DESC
We can change that on query basis or fallback to default one. But I'm not sure if we need to send it with every query. Maybe it will be better to make option to enable it first. But that looks too complicated for this gem :(
Any ideas?
from textacular.
I'm wary of the latter, especially given that it seems newer versions of AR treat SELECT
clauses pretty naively. If we make a complex one all the time, will we be asking for more problems like the ones we experience currently with count
.
It's a pity this is per-connection...
Does anyone know if AR affords for the user to give a kind of initialization block that runs at the start of establishing a connection? If so, we could maybe solve this with a note in the documentation.
from textacular.
If I understand this it is not possible to set this property per request. That's the spirit. To keep this gem simple there's no way how to handle this. I can try to contribute some options to pg_search
instead.
from textacular.
I was thinking about this on my drive home. If we're not going to execute two queries every time, which I think could have negative performance impact, or make our SELECT
statements all the more complex, we'll have to figure out how to set the fuzziness limit on all the connections in the connection pool.
So I started investigating in the AR docs and playing around in Pry.
I was hoping we could do something like this:
ActiveRecord::Base.connection_pool.connections.each do |connection|
connection.execute("SELECT set_limit(#{new_limit})")
end
And if that were the case, we could just make a method on Texacular
that you could call to set this. Call it in an initializer or something, no configuration required.
However:
[1] pry(main)> ActiveRecord::Base.connection_pool.connections.size
=> 1
[2] pry(main)> threads = [0...10].map { Thread.new { Character.count } }
=> [#<Thread:0x007f1ccb9d0a78@(pry):2 sleep>]
[3] pry(main)> D, [2015-03-30T20:03:58.336277 #1104] DEBUG -- : (0.6ms) SELECT COUNT(*) FROM "characters"
[3] pry(main)>
[4] pry(main)> ActiveRecord::Base.connection_pool.connections.size
=> 2
What this shows is that AR doesn't create the connections in the pool until something needs them. I'll keep looking by spelunking the AR code to see if there's an inflection point when new connections are created that we might jump into, somehow.
from textacular.
Ugh. So, it looks like we could monkey patch ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::StatementPool#configure_connection
or similar to send this call. I'm not super happy about that idea, but there are worse sins. What do folks think about this strategy?
from textacular.
I was thinking about this in bath. This is not so easy since set_limit
can be changed everywhere and also connection pool can be reloaded on the fly. For example by reap.
textacular
can't guarantee that set_limit
(from config) will be used unless it will be set in every query. But if we set this per query, non-textacular queries can be affected, because set_limit
will be changed often in some cases.
I think there's also some hook (I can't find it right now) to setup AR connection. But still set_limit can be overridden basically on "every line of code" by custom query.
There's only one way how to get this 100% deterministic:
- select current set_limit and save it
- set set_limit to desired value
- run query
- set set_limit back to original value
^ 2-4 queries needed for this (maybe in callbacks)
That's the reason why I think it is not easy to implement this into textacular
.
Does it make sense?
from textacular.
I think evjan's solution is simple and acceptable for multiple frameworks besides rails. scope.fuzzy_search(query, 0.2)
Depending on how fuzzy_search is implemented under the hood, making the second param optional. A notice in the documentation that this performs two executions in order to set_limit
then perform the search would be appropriate and probably not a dealbreaker for people looking to use this gem compared to an elasticsearch or pg_search alternative. I'd like to use this gem for basic fuzzy-search with indexing, for example.
If this is not backwards-compatible, please advise.
from textacular.
Why not follow the same pattern we use for searchable_language
and define a class method in the model?
def self.similarity_threshold
0.1
end
Then textacular itself executes set_limit()
before each search.
from textacular.
What about adding it as another parameter to fuzzy_search
... Object.fuzzy_search(..., similarity_threshold: 0.1)
If similarity_threshold
isn't set, use the default, but it is then set_limit(similarity_threshold)
?
from textacular.
@pawurb where did you put this code? It seems to fail in an initializer
from textacular.
In initializer. The problem is that it does not work on initial DB setup and CI systems because PG extension is not yet loaded so set_limit
method is missing. You need to handle it by setting some ENV variables.
Let me know if the problem persists and send me the bug trace and maybe I can help.
from textacular.
It's actually not even that the set_limit
function doesn't exist, I get a Name Error:
...
10: from /usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:614:in `block (2 levels) in <class:Engine>'
9: from /usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:656:in `load_config_initializer'
8: from /usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/notifications.rb:170:in `instrument'
7: from /usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:657:in `block in load_config_initializer'
6: from /usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:42:in `load'
5: from /usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:42:in `load'
4: from /Users/bradrobertson/Code/my-app/config/initializers/pg_trgm_limit.rb:7:in `<main>'
3: from /usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/active_support.rb:42:in `load_missing_constant'
2: from /usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/active_support.rb:53:in `rescue in load_missing_constant'
1: from /usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/active_support.rb:8:in `without_bootsnap_cache'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/active_support.rb:53:in `block in load_missing_constant':
uninitialized constant ActiveRecord::ConnectionAdapters::PostgreSQLAdapter (NameError)
But I know it's not a typo because that code works in the console.
from textacular.
@bradrobertson looks like an issue with bootsnap not loading dependencies before the initialize file is executed. Maybe try to require active_record before? I've never used it so far so I cannot help in that matter.
from textacular.
@bradrobertson would you mind to contribute this to README.md
?
from textacular.
Related Issues (20)
- Readme instructions result in indexes that are not used by query plan
- textacular 4.0.1 doesn't support activerecord 5.1.1 HOT 3
- JSONB columns HOT 1
- Migration Error
- "count" method not working results from searches HOT 10
- Gin index is not working with simple dictionaries HOT 1
- Deprecation warning in Rails 5.2 HOT 3
- Migration failed - SQL Syntax Error - "CREATE EXTENSION" HOT 1
- Disable ranking? HOT 3
- One step is missing > Bundle install HOT 1
- Rails 6 Support HOT 1
- Support ActiveRecord 6.0.0 HOT 1
- Create Trigram Migration appends [5.0] even on older Rails HOT 2
- Search across multiple columns not working HOT 1
- How to search distinct results HOT 1
- Web search missing as instance method from 5.2.0 gem HOT 1
- Unable to distinct on a multi-table search HOT 20
- Indexing on web_search? HOT 2
- Support for rails 6.1 HOT 2
- Documentation Site 404s HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from textacular.