Giter VIP home page Giter VIP logo

Comments (26)

smooshy avatar smooshy commented on May 31, 2024 1

For now, I'm just setting it manually before I build the ActiveRecord query.

ActiveRecord::Base.connection.execute("SELECT set_limit(0.2);")

I imagine the easiest way to do it if you wanted to override the default would be to make a configuration option that would take effect globally.

from textacular.

evjan avatar evjan commented on May 31, 2024 1

I'm also having this issue, thanks @smooshy for the suggestion to run the set_limit query before the search.

One way might be to give fuzzy_search the limit like this: scope.fuzzy_search(query, 0.2). What do you think?

from textacular.

pawurb avatar pawurb commented on May 31, 2024 1

I came up with this solution for Ruby on Rails:

SIMILARITY_LIMIT = '0.1'

ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.set_callback :checkout, :after do
  raw_connection.exec("set_limit(#{SIMILARITY_LIMIT});")
end

from textacular.

bradrobertson avatar bradrobertson commented on May 31, 2024 1

I actually ended up wrapping it in a

ActiveSupport.on_load(:active_record) do; end

block and that worked. thanks!

from textacular.

benhamill avatar benhamill commented on May 31, 2024

Can you propose an API? In general, Textacular is meant to be super simple. For very tricky stuff, the more complex pg_search gem might be a better fit.

from textacular.

benhamill avatar benhamill commented on May 31, 2024

Right now, Textacular doesn't really have configuration, so... Where would you specify this?

from textacular.

smooshy avatar smooshy commented on May 31, 2024

I have a service object that makes the call to search so I just call it there. Works for my situation, but not very flexible.

from textacular.

smooshy avatar smooshy commented on May 31, 2024

If I were to add a configuration, I might consider application.rb:

config.textacular.pgtrgm_limit = 0.2

from textacular.

ecin avatar ecin commented on May 31, 2024

Something in config.textacular sounds nice. Perhaps #threshold and #threshold= methods?

Documentation for these methods should state that this affects the global state of the Postgres database, so that all trigram searches will use this new threshold limit.

from textacular.

benhamill avatar benhamill commented on May 31, 2024

Remember that Rails is not the only environment where we'd like Textacular to be useful. So, whatever solution we have should have a stand-alone interface. If there's sugar to be added when you're using Rails, then great. But we shouldn't forget about the other case.

from textacular.

Epigene avatar Epigene commented on May 31, 2024

A configuration option would be very useful. @ecin has already stated it well.

from textacular.

benhamill avatar benhamill commented on May 31, 2024

That suggestion would only work in Rails land. I'm interested in a solution that doesn't expand the dependencies of this gem. Is it worth introducing some kind of configuration object within Textacular?

Also, IIRC, this has to be called once per connection. Or am I mistaken and there's a way to set it globally? I guess we could iterate over all of ActiveRecord's connections to fix them all for the life of the process? Is that something AR exposes publicly?

from textacular.

simi avatar simi commented on May 31, 2024

Config var should live in Textacular.fuzzy_treshold (probably backed by https://github.com/steveklabnik/request_store or Thread.current). But I wonder if we need to send this before every query. We can send it in every query also.

SELECT set_limit(0.1); 
SELECT "posts".*, COALESCE(similarity("posts"."name", 'marcela'), 0) AS "rank64611571751258688"
FROM "posts"
WHERE (("posts"."name" % 'marcela'))
ORDER BY "rank64611571751258688" DESC

vs

SELECT "posts".*, set_limit(0.1) AS textacular_treshold, COALESCE(similarity("posts"."name", 'marcela'), 0) AS "rank64611571751258688"
FROM "posts"
WHERE (("posts"."name" % 'marcela'))
ORDER BY "rank64611571751258688" DESC

We can change that on query basis or fallback to default one. But I'm not sure if we need to send it with every query. Maybe it will be better to make option to enable it first. But that looks too complicated for this gem :(

Any ideas?

from textacular.

benhamill avatar benhamill commented on May 31, 2024

I'm wary of the latter, especially given that it seems newer versions of AR treat SELECT clauses pretty naively. If we make a complex one all the time, will we be asking for more problems like the ones we experience currently with count.

It's a pity this is per-connection...

Does anyone know if AR affords for the user to give a kind of initialization block that runs at the start of establishing a connection? If so, we could maybe solve this with a note in the documentation.

from textacular.

simi avatar simi commented on May 31, 2024

If I understand this it is not possible to set this property per request. That's the spirit. To keep this gem simple there's no way how to handle this. I can try to contribute some options to pg_search instead.

from textacular.

benhamill avatar benhamill commented on May 31, 2024

I was thinking about this on my drive home. If we're not going to execute two queries every time, which I think could have negative performance impact, or make our SELECT statements all the more complex, we'll have to figure out how to set the fuzziness limit on all the connections in the connection pool.

So I started investigating in the AR docs and playing around in Pry.

I was hoping we could do something like this:

ActiveRecord::Base.connection_pool.connections.each do |connection|
  connection.execute("SELECT set_limit(#{new_limit})")
end

And if that were the case, we could just make a method on Texacular that you could call to set this. Call it in an initializer or something, no configuration required.

However:

[1] pry(main)> ActiveRecord::Base.connection_pool.connections.size
=> 1
[2] pry(main)> threads = [0...10].map { Thread.new { Character.count } }
=> [#<Thread:0x007f1ccb9d0a78@(pry):2 sleep>]
[3] pry(main)> D, [2015-03-30T20:03:58.336277 #1104] DEBUG -- :    (0.6ms)  SELECT COUNT(*) FROM "characters"
[3] pry(main)> 
[4] pry(main)> ActiveRecord::Base.connection_pool.connections.size
=> 2

What this shows is that AR doesn't create the connections in the pool until something needs them. I'll keep looking by spelunking the AR code to see if there's an inflection point when new connections are created that we might jump into, somehow.

from textacular.

benhamill avatar benhamill commented on May 31, 2024

Ugh. So, it looks like we could monkey patch ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::StatementPool#configure_connection or similar to send this call. I'm not super happy about that idea, but there are worse sins. What do folks think about this strategy?

from textacular.

simi avatar simi commented on May 31, 2024

I was thinking about this in bath. This is not so easy since set_limit can be changed everywhere and also connection pool can be reloaded on the fly. For example by reap.

textacular can't guarantee that set_limit (from config) will be used unless it will be set in every query. But if we set this per query, non-textacular queries can be affected, because set_limit will be changed often in some cases.

I think there's also some hook (I can't find it right now) to setup AR connection. But still set_limit can be overridden basically on "every line of code" by custom query.

There's only one way how to get this 100% deterministic:

  1. select current set_limit and save it
  2. set set_limit to desired value
  3. run query
  4. set set_limit back to original value

^ 2-4 queries needed for this (maybe in callbacks)

That's the reason why I think it is not easy to implement this into textacular.

Does it make sense?

from textacular.

Grantimus9 avatar Grantimus9 commented on May 31, 2024

I think evjan's solution is simple and acceptable for multiple frameworks besides rails. scope.fuzzy_search(query, 0.2)

Depending on how fuzzy_search is implemented under the hood, making the second param optional. A notice in the documentation that this performs two executions in order to set_limit then perform the search would be appropriate and probably not a dealbreaker for people looking to use this gem compared to an elasticsearch or pg_search alternative. I'd like to use this gem for basic fuzzy-search with indexing, for example.

If this is not backwards-compatible, please advise.

from textacular.

haggen avatar haggen commented on May 31, 2024

Why not follow the same pattern we use for searchable_language and define a class method in the model?

def self.similarity_threshold
    0.1
end

Then textacular itself executes set_limit() before each search.

from textacular.

michaeldever avatar michaeldever commented on May 31, 2024

What about adding it as another parameter to fuzzy_search... Object.fuzzy_search(..., similarity_threshold: 0.1)

If similarity_threshold isn't set, use the default, but it is then set_limit(similarity_threshold)?

from textacular.

bradrobertson avatar bradrobertson commented on May 31, 2024

@pawurb where did you put this code? It seems to fail in an initializer

from textacular.

pawurb avatar pawurb commented on May 31, 2024

In initializer. The problem is that it does not work on initial DB setup and CI systems because PG extension is not yet loaded so set_limit method is missing. You need to handle it by setting some ENV variables.

Let me know if the problem persists and send me the bug trace and maybe I can help.

from textacular.

bradrobertson avatar bradrobertson commented on May 31, 2024

It's actually not even that the set_limit function doesn't exist, I get a Name Error:

...
10: from /usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:614:in `block (2 levels) in <class:Engine>'
 9: from /usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:656:in `load_config_initializer'
 8: from /usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/notifications.rb:170:in `instrument'
 7: from /usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:657:in `block in load_config_initializer'
 6: from /usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:42:in `load'
 5: from /usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:42:in `load'
 4: from /Users/bradrobertson/Code/my-app/config/initializers/pg_trgm_limit.rb:7:in `<main>'
 3: from /usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/active_support.rb:42:in `load_missing_constant'
 2: from /usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/active_support.rb:53:in `rescue in load_missing_constant'
 1: from /usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/active_support.rb:8:in `without_bootsnap_cache'

/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.2.1/lib/bootsnap/load_path_cache/core_ext/active_support.rb:53:in `block in load_missing_constant': 
uninitialized constant ActiveRecord::ConnectionAdapters::PostgreSQLAdapter (NameError)

But I know it's not a typo because that code works in the console.

from textacular.

pawurb avatar pawurb commented on May 31, 2024

@bradrobertson looks like an issue with bootsnap not loading dependencies before the initialize file is executed. Maybe try to require active_record before? I've never used it so far so I cannot help in that matter.

from textacular.

simi avatar simi commented on May 31, 2024

@bradrobertson would you mind to contribute this to README.md?

from textacular.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.