Giter VIP home page Giter VIP logo

google-site-search's Introduction

Gem: google-site-search

Description

This gem was created to aid in the querying and parsing of the Google Site Search api.

In the simplest use case it will query your google site search for a term and supply you with an object containing the results. However I’ve built this gem with the intention that you will want to explicitly handle what and how the specific results are stored.

Installation

Add the following to your projects Gemfile.

gem 'google-site-search', :git => "[email protected]:dvallance/google-site-search.git"

Require the code if necessary (note: some frameworks like rails are set to auto-require gems for you by default)

require 'google-site-search'

Usage

The simpliest way to use the gem is by providing just a search query term and your search engine unique id code (e.g. looks like this 00255077836266642015:u-scht7a-8i and is located in your google site search control panel)

# just assign the query to an object
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i")

# object has search attributes like
puts search.next_results_url
puts search.previous_results_url
puts search.xml
puts search.spelling
puts search.spelling_url

# object has an array of each specific result that contains title, description and its link by default
search.results.each do |result|
  puts result.title
  puts result.description
  puts result.link
end

The query method expects a valid url so if you wanted to supply your own you can! However I have created a builder class to help with proper url creation and to help do some of the work for you.

Multiple Search

Since google only allows a max of 20 returned results I have added a method that will capture up to n number of results.

# the array will be up to 5 search objects if the query actually has that many results.
# has soon as a search doesn't have a next_results_url the method stops.
array_of_search_results = GoogleSiteSearch.query_multiple(5, *YOUR_URL*)

Caching

To aid in caching I added a a caching_key method which sorts the query parameters, removes some unique parameters that google adds (i.e. an ie parameter which has something to do with related searchs), and compresses the result. It should be a unique representation of a search url, and used to cache results.

Blocks (and a caching example)

Both the query and query_multiple methods can take a block which executes for each Search object found.

# here is a rails example using caching and supplying a block
url = GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i")

@search = Rails.cache.fetch(GoogleSiteSearch::caching_key(url.to_s), :expires => 1.day) do |search|
  GoogleSiteSearch.query(url, SearchResult) do |search|
    # I can do something with the search objects.
    # possibly some custom analytics for our searchs?
    search.url # we can access the object
  end
end

Advanced Usage

An important requirement for this gem was to be able to use structured data for:

  • querying the search api itself (i.e. filtering and sorting )

  • displaying specific information in views (i.e. display a specific field like’author’, or ‘product_type’)

Therefore I allow the developer to supply his own “Results” class to the query and allow them to parse each result xml element explicitly.

The default Result class is as follows:

class Result
  attr_reader :title, :link, :description
    def initialize(node)
    @title = node.find_first("T").content
    @link = node.find_first("UE").content
    @description = node.find_first("S").content
  end
end

As you can see it is very simple. Your class simply needs an initialize method that will recieve an xml node, which it can then do with as it pleases. After it is initialized it is added to the search.results array as shown previously.

See

Pagination

The google search api does the work of pagination for us, supplying the next and previous urls. The urls are relative paths and contain the search engine id parameter. Since this is a security concern I strip out the search engine id when I store them in the Search.next_results_url and Search.previous_results_url methods. This makes them safe to put in links on views and it is why you must supply the search engine id again on the paginate call; so the full url can be rebuilt for the query call.

search2 = GoogleSiteSearch.query(GoogleSiteSearch.paginate(search1.next_results_url, "00255077836266642015:u-scht7a-8i"))

Pagination Simple Example

This works and is fairly straight forward.

In your controller:

if params[:move]
  @search = GoogleSiteSearch.query(GoogleSiteSearch.paginate(params[:move], "00255077836266642015:u-scht7a-8i"))
else
  @search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i", :num => 5))
end

In your view:

<% if @search.previous_results_url %>
  <%= link_to "Previous", search_url(:move => @search.previous_results_url) %>
<% end %>
<% if @search.next_results_url %>
  <%= link_to "More", search_url(:move => @search.next_results_url) %>
<% end %>

Escaping

If you start passing around the url’s in parameters you may run into issues if you don’t escape/unescape the url. If so try…

View adds escape:

<%= link_to "Previous", search_url(:move => CGI::escape(@search.previous_results_url)) %>

Controller unescapes:

@search = GoogleSiteSearch.query(GoogleSiteSearch.paginate(CGI::unescape(params[:move]), "00255077836266642015:u-scht7a-8i"))

Filtering and Sorting

See Filtering and sorting search results.

Filtering

Google expects filtering to be on the “search query” itself. However I feel my end users won’t and shouldn’t be aware of all the possible filtering options (most of my filtering will be based off of dataobject values I supply myself). So I try and keep the filters and actual “search term” separate.

From the google reference link above an example filter search query is halloween more:pagemap:document-author:lisamorton

# using the example above would look like this.
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween", "00255077836266642015:u-scht7a-8i", :filter =>  "more:pagemap:document-author:lisamorton")

Separate Search Term From Filters

The full “search query” is returned by google’s api and stored in the Search object in a few spots. (i.e @search.search_query method and @search.spelling_q).

To separate the search term from the filter use:

search_term, filters = GoogleSiteSearch.separate_search_term_from_filters(@search.search_query)

Sorting

Sorting would also be done by specifing a sort option.

search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween", "00255077836266642015:u-scht7a-8i", :filter =>  "more:pagemap:document-author:lisamorton", :sort => "data-sdate")

Other Params

Any [param=value] query string additions you want to add can be assigned like the sorting above. For example to limit the search results return, to 5, would look like…

# get only 5 search results with the filtering and sorting from above still applyed.
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween more:pagemap:document-author:lisamorton", "00255077836266642015:u-scht7a-8i", :sort => "date-sdate", :num => "5" )

Author

David Vallance

google-site-search's People

Contributors

dvallance avatar gbh avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.