Giter VIP home page Giter VIP logo

asari's Introduction

Asari

Note on 2013-01-01 Amazon API support

We are actively working on a 1.0 version with support for the latest api along with complete Active Asari support. It should be solid enough to use in production via

gem "asari", :git => "[email protected]:wellbredgrapefruit/asari.git", :branch => "1.0"

Description

Build Status

Asari is a Ruby wrapper for AWS CloudSearch, with optional ActiveRecord support for easy integration with your Rails apps.

Why Asari?

"Asari" is Japanese for "rummaging search." Seemed appropriate.

Usage

Your Search Domain

Amazon Cloud Search will give you a Search Endpoint and Document Endpoint. When specifying your search domain in Asari omit the search- for your search domain. For example if your search endpoint is "search-beavis-er432w3er.us-east-1.cloudsearch.amazonaws.com" the search domain you use in Asari would be "beavis-er432w3er". Your region is the second item. In this example it would be "us-east-1".

Basic Usage

asari = Asari.new("my-search-domain-asdfkljwe4") # CloudSearch search domain
asari.add_item("1", { :name => "Tommy Morgan", :email => "[email protected]"})
asari.search("tommy") #=> ["1"] - a list of document IDs
asari.search("tommy", :rank => "name") # Sort the search
asari.search("tommy", :rank => ["name", :desc]) # Sort the search descending
asari.search("tommy", :rank => "-name") # Another way to sort the search descending

Boolean Query Usage

asari.search(filter: { and: { title: "donut", type: "cruller" }})
asari.search("boston creme", filter: { and: { title: "donut", or: { type: "cruller|twist" }}}) # Full text search and nested boolean logic

Boolean Queries can also be provided as strings that will be directly used in the query request. This allows for more control over the query string as well as multiple uses of the same field name (where Ruby hashes don't allow for non-unique keys.)

asari.search(filter: "(or type:'donut' type:'bagel')")

For more information on how to use Cloudsearch boolean queries, see the documentation.

Geospatial Query Usage

While Cloudsearch does not natively support location search, you can implement rudimentary location search by representing latitude and longitude as integers in your search domain. Asari has a Geography module you can use to simplify the conversion of latitude and longitude to cartesian coordinates as well as the generation of a coordinate box to search within. Asari's Boolean Query syntax can then be used to search within the area. Note that because Cloudsearch only supports 32-bit unsigned integers, it is only possible to store latitude and longitude to two place values. This means very precise search isn't possible using Asari and Cloudsearch.

coordinates = Asari::Geography.degrees_to_int(lat: 45.52, lng: 122.68)
  #=> { lat: 2506271416, lng: 111298648 }
asari.add_item("1", { name: "Tommy Morgan", lat: coordinates[:lat], lng: coordinates[:lng] })
  #=> nil
coordinate_box = Asari::Geography.coordinate_box(lat: 45.2, lng: 122.85, meters: 7500)
  #=> { lat: 2505521415..2507021417, lng: 111263231..111334065 }
asari.search("tommy", filter: { and: coordinate_box }
  #=> ["1"] = a list of document IDs

For more information on how to use Cloudsearch for location search, see the documentation.

Sandbox Mode

Because there is no "local" version of CloudSearch, and search instances can be kind of expensive, you shouldn't have to have a development version of your index set up in order to use Asari. Because of that, Asari has a "sandbox" mode where it does nothing with add/update/delete requests and just returns an empty collection for any searches. This sandbox mode is enabled by default - any time you want to actually connect to the search index, just do the following:

Asari.mode = :production

You can turn the sandbox back on, if you like, by setting the mode to :sandbox again.

Pagination

Asari defaults to a page size of 10 (because that's CloudSearch's default), but it allows you to specify pagination parameters with any search:

asari.search("tommy", :page_size => 30, :page => 10)

The results you get back from Asari#search aren't actually Array objects, either: they're Asari::Collection objects, which are (currently) API-compatible with will_paginate:

results = asari.search("tommy", :page_size => 30, :page => 10)
results.total_entries #=> 5000
results.total_pages   #=> 167
results.current_page  #=> 10
results.offset        #=> 300
results.page_size     #=> 30

Retrieving Data From Index Fields

By default Asari only returns the document id's for any hits returned from a search. If you have result_enabled a index field you can have asari resturn that field in the result set without having to hit a database to get the results. Simply pass the :return_fields option with an array of fields

results = asari.search "Beavis", :return_fields => ["name", "address"]

The result will look like this

{"23" => {"name" => "Beavis", "address" => "One CNN Center,  Atlanta"},
"54" => {"name" => "Beavis C", "address" => "Cornholio Way, USA"}}

ActiveRecord

By default the ActiveRecord module for Asari is not included in your project. To use it you will need to require it via

require 'asari/active_record'

You can take advantage of that module like so:

class User < ActiveRecord::Base
  include Asari::ActiveRecord

  #... other stuff...

  asari_index("search-domain-for-users", [:name, :email, :twitter_handle, :favorite_sweater])
end

This will automatically set up before_destroy, after_create, and after_update hooks for your AR model to keep the data in sync with your CloudSearch index - the second argument to asari_index is the list of fields to maintain in the index, and can represent any function on your AR object. You can then interact with your AR objects as follows:

# Klass.asari_find returns a list of model objects in an
# Asari::Collection...
User.asari_find("tommy") #=> [<User:...>, <User:...>, <User:...>]
User.asari_find("tommy", :rank => "name")

# or with a specific instance, if you need to manually do some index
# management...
@user.asari_add_to_index
@user.asari_update_in_index
@user.asari_remove_from_index

You can also specify a :when option, like so:

asari_index("search-domain-for-users", [:name, :email, :twitter_handle,
:favorite_sweater], :when => :indexable)

or

asari_index("search-domain-for-users", [:name, :email, :twitter_handle,
:favorite_sweater], :when => Proc.new { |user| !user.admin && user.indexable })

This provides a way to mark records that shouldn't be in the index. The :when option can be either a symbol - indicating a method on the object - or a Proc that accepts the object as its first parameter. If the method/Proc returns true when the object is created, the object is indexed - otherwise it is left out of the index. If the method/Proc returns true when the object is updated, the object is indexed - otherwise it is deleted from the index (if it has already been added). This lets you be sure that you never have inappropriate data in your search index.

Because index updates are done as part of the AR lifecycle by default, you also might want to have control over how Asari handles index update errors - it's kind of problematic, if, say, users can't sign up on your site because CloudSearch isn't available at the moment. By default Asari just raises these exceptions when they occur, but you can define a special handler if you want using the asari_on_error method:

class User < ActiveRecord::Base
  include Asari::ActiveRecord

  asari_index(... )

  def self.asari_on_error(exception)
    Airbrake.notify(...)
    true
  end
end

In the above example we decide that, instead of raising exceptions every time, we're going to log exception data to Airbrake so that we can review it later and then return true so that the AR lifecycle continues normally.

AWS Region

By default, Asari assumes that you're operating in us-east-1, which is probably not a helpful assumption for some of you. To fix this, either set the aws_region property on your raw Asari object:

a = Asari.new("my-search-domain")
a.aws_region = "us-west-1"

...Or provide the :aws_region option when you call asari_index on an ActiveRecord model:

class User < ActiveRecord::Base
  include Asari::ActiveRecord

  asari_index("my-search-domain",[field1,field2], :aws_region => "us-west-1")

  ...
end

Get it

It's a gem named asari. Install it and make it available however you prefer.

Asari is developed on ruby 1.9.3, and the ActiveRecord portion has been tested with Rails 3.2. I don't know off-hand of any reasons that it shouldn't work in other environments, but be aware that it hasn't (yet) been tested.

Contributions

If Asari interests you and you think you might want to contribute, hit me up on Github. You can also just fork it and make some changes, but there's a better chance that your work won't be duplicated or rendered obsolete if you check in on the current development status first.

Gem requirements/etc. should be handled by Bundler.

Contributors

License

Copyright (C) 2012 by Tommy Morgan

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

asari's People

Contributors

amoslanka avatar anolson avatar cvincent avatar emilsoman avatar geoff-parsons avatar itstommymorgan avatar kaiuhl avatar lgleasain avatar madcowley avatar mmeys avatar roychri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asari's Issues

ActiveRecord indexed objects are removed when updating them if they don't match the asari_should_index? method

Hi,

I was wondering why an ActiveRecord object indexed using the asari_index method is removed from cloud search after and update if it doesn't match the asari_should_index? method?

The behavior I want is to update on cloud search the indexed object only when some attributes changed. Indeed I index users of my app with their name and their avatar. So I don't want to update them on cloud search if theses attributes doesn't change...and I don't want them to be deleted from cloud search neither.

That's why I suggest to remove the self.asari_remove_item(obj)line from the asari_update_item method.

def asari_update_item(obj)
  if self.asari_when
    unless asari_should_index?(obj)
      self.asari_remove_item(obj) # I suggest to remove this line
      return
    end
  end
  data = {}
  self.asari_fields.each do |field|
    data[field] = obj.send(field)
  end
  self.asari_instance.update_item(obj.send(:id), data)
rescue Asari::DocumentUpdateException => e
  self.asari_on_error(e)
end

direction of future architecture

Hi all,

I came upon Asari when looking for the best Rails gem for the best search solution given my architecture. Given the trend towards cloud compute, AWS's dominance, the amount of work they're putting into their platform, and Asari's position as the only gem compatiable with the newest API, I think this gem is well-positioned for a serious amount of interest over the next few years.

Compared to the most popular Rails search gems out there, Asari is missing a lot of functionality. The big name here has always been Sunspot, an adapter for Solr. Some things Sunspot can do that Asari can't:

  • All query options - Faceting, Native Geo, Stats, and More Like This. Some of these things have been added in Asari's forks, but not to the same level of usability as Sunspot
  • A block-oriented search function. This allows for better error checking, the ability to nest arbitrary ruby logic within the search block, and greater readability especially with nested logic
  • Various thread-safe, retrying, instrumented, logged, and asynchronous (with sunspot_index_queue) sessions
  • Config generators and indexing rake tasks
  • ActiveRecord objects attached to results retrieved through a single DB request
  • A local server version for development and testing - obviously Sunspot accomplishes this with a managed local Solr install, but I think it's worth considering integrating sunspot-solr as this feature is HUGE for saving development time, simplifying tests, and improving effective coverage. The AWS SDKs provide good stubbing ability though, if we chose to go down that route instead.
  • Schema configuration provided by class method on Searchable classes - schema file works too for many people I'm sure, but an in-object config function some may find to be a more logical ordering, provides more flexibility for index-related config functions (esp if adapted to a simpler interface than just knowing what's going on in CloudSearch), and can be adapted at runtime via meta programming

Ransack, which works with native DB search functions, has seen a lot of nice additions recently too - it has some nice form helpers (with translations) we may want to include.

So, establishing that there's a lot of functionality left to build and that hopefully a big user base will grow around this use case, how should we build this?

In my mind, the easiest solution would be to get Sunspot working directly with CloudSearch. That way we could get all this functionality without any of the coding or maintenance.

The question has been asked a few places - will AWS ever expose the native Solr API? If so, Sunspot will work out of the box. Something tells me this will never happen - CloudSearch provides a large amount of abstraction on top of Solr, i.e. the support for array fields and the simplified option set. I have a support request in to their team to see if they'll speak to current plans. I'll bring any replies to this thread.

Another option would be creating a new gem, say sunspot_cloudsearch, that monkey-patches Sunspot to work with CloudSearch. Most of the actual client functionality comes from RSolr which could be re-implemented using existing Asari code. (That said, I think any future implementation should use the AWS SDK V2 client instead - it supports every single operation we could want and takes care of all the HTTP stuff.) Once that's done, some patches to the Sunspot core could get much or all of the above working.

A final option would be copying the current code for most of this from Sunspot and building it into this repo. So far, this project has kept it super simple, which is great. Including all of the above makes things significantly more complex. Depending on how it's accomplished (and if the maintainers are interested in taking this next step), perhaps this should all be done in a separate gem.

After a month of taking a stab at extending Asari myself, I need to get back to tactical work. But I'd like to wrap up what I've done so far in a way that demonstrates where to go next. So I wanted to ask the question:

How do the maintainers of Asari envision it growing in the future? What's the architecture?

Sorry for the long explanation, I'm hoping to get schooled by someone who's spent a lot of time researching this problem.

Searching literals isn't supported

Searching a literal requires a different kind of search term (bq=literal-name:'literal-value'), but q= is currently hardcoded so they can't be currently searched (and maybe modified as well)

Option to index asynchronously?

Is there an option to index to cloudsearch asynchronously using resque? This would be really useful for asari_index in models when there are communication problems between cloudsearch and the server. Also, if there are bulk updates, it won't slow down things.

Handle ActiveSupport::TimeWithZone in convert_date_or_time

I'm having some trouble submitting a pull request since my fork is based off of @cbilgili's right now, but I wanted to bring this up anyhow. Right now the convert_date_or_time method in lib/asari.rb doesn't work for subclasses of Time, etc. One very common example of this is the Rails class ActiveSupport::TimeWithZone. I didn't realize this in testing and ended up with my docs oddly sorted because their created_at fields got indexed with odd values like "24" instead of the result of calling to_i.

I propose that line get changed like this:

return obj unless [Time, Date, DateTime].any? {|klass| obj.is_a? klass }

Thoughts?

License missing from gemspec

RubyGems.org doesn't report a license for your gem. This is because it is not specified in the gemspec of your last release.

via e.g.

spec.license = 'MIT'
# or
spec.licenses = ['MIT', 'GPL-2']

Including a license in your gemspec is an easy way for rubygems.org and other tools to check how your gem is licensed. As you can imagine, scanning your repository for a LICENSE file or parsing the README, and then attempting to identify the license or licenses is much more difficult and more error prone. So, even for projects that already specify a license, including a license in your gemspec is a good practice. See, for example, how rubygems.org uses the gemspec to display the rails gem license.

There is even a License Finder gem to help companies/individuals ensure all gems they use meet their licensing needs. This tool depends on license information being available in the gemspec. This is an important enough issue that even Bundler now generates gems with a default 'MIT' license.

I hope you'll consider specifying a license in your gemspec. If not, please just close the issue with a nice message. In either case, I'll follow up. Thanks for your time!

Appendix:

If you need help choosing a license (sorry, I haven't checked your readme or looked for a license file), GitHub has created a license picker tool. Code without a license specified defaults to 'All rights reserved'-- denying others all rights to use of the code.
Here's a list of the license names I've found and their frequencies

p.s. In case you're wondering how I found you and why I made this issue, it's because I'm collecting stats on gems (I was originally looking for download data) and decided to collect license metadata,too, and make issues for gemspecs not specifying a license as a public service :). See the previous link or my blog post about this project for more information.

How to specify q.parser type?

It appears that q.parser is set to structured for all queries somehow, but not seeing that anywhere in codebase here. Is there a way to override this through asari?

Thanks,
Sean

authentication?

Hey, I was just wondering how you go about authentication via IAMs with cloudsearch, is this possible?

Little doubt

Wouldn't it be better if asari only updated documents when the indexed fields changed, instead of the whole object? I can do a pull request with that if you want.

Search Spec failing

I'm getting the following failure when running the specs. Can anyone else confirm they're passing? If so I'll dig in and figure out what's wrong on my side, but I suspect it's just a minor thing related to cleanup of geo

.................................................................F

Failures:

  1) Asari geography searching builds a proper query string
     Failure/Error: HTTParty.should_receive(:get).with("http://search-testdomain.us-east-1.cloudsearch.amazonaws.com/2011-02-01/search?q=&bq=%28and+lat%3A2505771415..2506771417+lng%3A111275735..111322958%29&size=10")
       Double received :get with unexpected arguments
         expected: ("http://search-testdomain.us-east-1.cloudsearch.amazonaws.com/2011-02-01/search?q=&bq=%28and+lat%3A2505771415..2506771417+lng%3A111275735..111322958%29&size=10")
              got: ("http://search-testdomain.us-east-1.cloudsearch.amazonaws.com/2011-02-01/search?q=&bq=%28and+lat%3A2505771415..2506771417+lng%3A2358260777..2359261578%29&size=10")
     # ./spec/search_spec.rb:149:in `block (3 levels) in <top (required)>'

Finished in 0.04652 seconds
66 examples, 1 failure

Failed examples:

rspec ./spec/search_spec.rb:148 # Asari geography searching builds a proper query string

'or' syntax clarification

From the readme, this is how an 'or' query is suggested:

asari.search("boston creme", filter: { and: { title: "donut", or: { type: "cruller", type: "twist" }}})

Notice the { type: "cruller", type: "twist" }

Is that really how an 'or' query is intended to be done in Asari? It won't work, the resulting hash will be {:type=>"twist"} and the query will only return those of type 'twist'

EOFError

Hi guys, have you ever sumbled at this error?
Asari::DocumentUpdateException: EOFError: end of file reached

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.