Giter VIP home page Giter VIP logo

serpapi-ruby's Introduction

SerpApi Ruby Library

Gem Version serpapi-ruby serpapi-ruby-alternative serpapi-ruby-sanity-1 serpapi-ruby-sanity-2

Integrate search data into your Ruby application. This library is the official wrapper for SerpApi (https://serpapi.com).

SerpApi supports Google, Google Maps, Google Shopping, Baidu, Yandex, Yahoo, eBay, App Stores, and more.

Installation

Ruby 1.9.3 (or more recent), JRuby 9.1.17 (or more recent), or TruffleRuby 19.3.0 (or more recent) is required.

Bundler

gem 'serpapi', '~> 1.0.0'

Gem

$ gem install serpapi

Ruby Gem page

Simple Usage

require 'serpapi'
client = SerpApi::Client.new api_key: "serpapi_api_key"
results = client.search q: "coffee", engine: "google"
pp results

This example runs a search for "coffee" on Google. It then returns the results as a regular Ruby Hash. See the playground to generate your own code.

Advanced Usage

Search API

# load gem
require 'serpapi'

# serpapi client created with default parameters
client = SerpApi::Client.new(api_key: "secret_key", engine: "google")

# We recommend that you keep your keys safe.
# At least, don't commit them in plain text.
# More about configuration via environment variables: 
# https://hackernoon.com/all-the-secrets-of-encrypting-api-keys-in-ruby-revealed-5qf3t5l

# search query overview (more fields available depending on search engine)
params = {
  # select the search engine (full list: https://serpapi.com/)
  engine: "google",
  # actual search query
  q: "Coffee",
  # then adds search engine specific options.
  # for example: google specific parameters: https://serpapi.com/search-api
  google_domain: "Google Domain",
  location: "Location Requested", # example: Portland,Oregon,United States [see: Location API](#Location-API)
  device: "desktop|mobile|tablet",
  hl: "Google UI Language",
  gl: "Google Country",
  safe: "Safe Search Flag",
  num: "Number of Results",
  start: "Pagination Offset",
  tbm: "nws|isch|shop",
  tbs: "custom to be client criteria",
  # tweak HTTP client behavior
  async: false, # true when async call enabled.
  timeout: 60, # HTTP timeout in seconds on the client side only.
}

# formated search results as a Hash
#  serpapi.com converts HTML -> JSON 
results = client.search(params)

# raw search engine html as a String
#  serpapi.com acts a proxy to provive high throughputs, no search limit and more.
raw_html = client.html(parameter)

Google search documentation. More hands on examples are available below.

Documentations

Location API

require 'serpapi'
client = SerpApi::Client.new() 
location_list = client.location(q: "Austin", limit: 3)
puts "number of location: #{location_list.size}"
pp location_list

it prints the first 3 locations matching Austin (Texas, Texas, Rochester)

[{
  :id=>"585069bdee19ad271e9bc072",
  :google_id=>200635,
  :google_parent_id=>21176,
  :name=>"Austin, TX",
  :canonical_name=>"Austin,TX,Texas,United States",
  :country_code=>"US",
  :target_type=>"DMA Region",
  :reach=>5560000,
  :gps=>[-97.7430608, 30.267153],
  :keys=>["austin", "tx", "texas", "united", "states"]
  }
  # ...
]

NOTE: api_key is not required for this endpoint.

Search Archive API

This API allows retrieving previous search results. To fetch earlier results from the search_id.

First, you need to run a search and save the search id.

require 'serpapi'
client = SerpApi::Client.new(api_key: 'secret_api_key', engine: 'google')
results = client.search(q: "Coffee", location: "Portland")
search_id = results[:search_metadata][:id]

Now let's retrieve the previous search results from the archive.

require 'serpapi'
client = SerpApi::Client.new(api_key: 'secret_api_key')
results = client.search_archive(search_id)
pp results

This code prints the search results from the archive. :)

Account API

require 'serpapi'
client = SerpApi::Client.new(api_key: 'secret_api_key')
pp client.account

It prints your account information.

Bulk Search API

If you have high volume of searches (e.g., >= 1 million) and they don't need to be live, you can use our Bulk Search API. You just have to use the async parameter:

client = SerpApi::Client.new api_key: 'secret_api_key', async: true
searches = [
  { engine: "google", q: "coffee" },
  { engine: "google", q: "tea" },
  { engine: "google", q: "hot chocolate milk" },
  # ...
]
# Submit async searches
async_searches = searches.map do |search|
  async_search = client.search search
  async_search
end
# Get an ETA using the last search scheduled time
bulk_search_eta = async_search_results.last[:search_metadata][:scheduled_at]
# After the searches are done processing (i.e., `bulk_search_eta`)
async_search_results = async_searches.map do |search|
  results = client.search_archive search[:id]
  results
end

Basic example per search engine

Search bing

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'bing')
results = client.search({
  'q': 'coffee'
})
pp results[:organic_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search baidu

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'baidu')
results = client.search({
  'q': 'coffee'
})
pp results[:organic_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search yahoo

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'yahoo')
results = client.search({
  'p': 'coffee'
})
pp results[:organic_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search youtube

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'youtube')
results = client.search({
  'search_query': 'coffee'
})
pp results[:video_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search walmart

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'walmart')
results = client.search({
  'query': 'coffee'
})
pp results[:organic_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search ebay

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'ebay')
results = client.search({
  '_nkw': 'coffee'
})
pp results[:organic_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search naver

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'naver')
results = client.search({
  'query': 'coffee'
})
pp results[:ads_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search home depot

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'home_depot')
results = client.search({
  'q': 'table'
})
pp results[:products]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search apple app store

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'apple_app_store')
results = client.search({
  'term': 'coffee'
})
pp results[:organic_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search duckduckgo

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'duckduckgo')
results = client.search({
  'q': 'coffee'
})
pp results[:organic_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search google search

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'google_search')
results = client.search({
  "q": "coffee",
  "engine": "google"
})
pp results[:organic_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search google scholar

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'google_scholar')
results = client.search({
  'q': 'coffee'
})
pp results[:organic_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search google autocomplete

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'google_autocomplete')
results = client.search({
  'q': 'coffee'
})
pp results[:suggestions]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search google product

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'google_product')
results = client.search({
  'q': 'coffee',
  'product_id': '4172129135583325756'
})
pp results[:product_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search google reverse image

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'google_reverse_image')
results = client.search({
  'image_url': 'https://i.imgur.com/5bGzZi7.jpg'
})
pp results[:image_sizes]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search google events

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'google_events')
results = client.search({
  'q': 'coffee'
})
pp results[:events_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search google local services

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'google_local_services')
results = client.search({
  'q': 'Electrician',
  'data_cid': 'ChIJOwg_06VPwokRYv534QaPC8g'
})
pp results[:local_ads]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search google maps

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'google_maps')
results = client.search({
  'q': 'pizza',
  'll': '@40.7455096,-74.0083012,15.1z',
  'type': 'search'
})
pp results[:local_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search google jobs

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'google_jobs')
results = client.search({
  'q': 'coffee'
})
pp results[:jobs_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search google play

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'google_play')
results = client.search({
  'q': 'kite',
  'store': 'apps'
})
pp results[:organic_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Search google images

require 'serpapi'
client = SerpApi::Client.new(api_key: ENV['API_KEY'], engine: 'google_images')
results = client.search({
  "engine": "google",
  "tbm": "isch",
  "q": "coffee"
})
pp results[:images_results]
# ENV['API_KEY'] captures the secret user API available from http://serpapi.com

Migration quick guide

if you were already using (google-search-results-ruby gem)[https://github.com/serpapi/google-search-results-ruby], here are the changes.

# load library
# old way 
require 'google_search_results'
# new way
require 'serpapi'

# define a search
# old way to describe the search
search = GoogleSearch.new(search_params)
# new way 
default_parameter = {api_key: "secret_key", engine: "google"}
client = SerpApi::Client.new(default_parameter)
# an instance of the serpapi client is created
# where the default parameters are stored in the client.
#   like api_key, engine
#  then each subsequent API call can be made with additional parameters.

# override an existing parameter
# old way
search.params[:location] = "Portland,Oregon,United States"
# new way
# just provided the search call with the parameters.
results = client.search({location: "Portland,Oregon,United States", q: "Coffe"})

# search format return as raw html
# old way
html_results = search.get_html
# new way
raw_html = client.html(params)
# where params is Hash containing additional key / value

# search format returns a Hash
# old way
hash_results = search.get_hash
# new way
results = client.search(params)
# where params is the search parameters (override the default search parameters in the constructor). 

# search as raw JSON format
# old way
json_results = search.get_json
# new way
results = client.search(params)

# The prefix get_ is removed from all other methods.
#  Because it's evident that a method returns something.
# old -> new way
search.get_search_archive -> client.search_archive
search.get_account -> client.account
search.get_location -> client.location

Most notable improvements:

  • Removing parameters check on the client side. (most of the bugs)
  • Reduce logic complexity in our implementation. (faster performance)
  • Better documentation.

Advanced search API usage

Highly scalable batching

Search API features non-blocking search using the option: async=true.

  • Non-blocking - async=true - a single parent process can handle unlimited concurrent searches.
  • Blocking - async=false - many processes must be forked and synchronized to handle concurrent searches. This strategy is I/O usage because each client would hold a network connection.

Search API enables async search.

  • Non-blocking (async=true) : the development is more complex, but this allows handling many simultaneous connections.
  • Blocking (async=false) : it's easy to write the code but more compute-intensive when the parent process needs to hold many connections.

Here is an example of asynchronous searches using Ruby

require 'serpapi'
# target MAANG companies
company_list = %w(meta amazon apple netflix google)
client = SerpApi::Client.new({engine: 'google', async: true, api_key: ENV['API_KEY']})
search_queue = Queue.new
company_list.each do |company|
  # store request into a search_queue - no-blocker
  result = client.search({q: company})
  if result[:search_metadata][:status] =~ /Cached|Success/
    puts "#{company}: client done"
    next
  end

  # add results to the client queue
  search_queue.push(result)
end

puts "wait until all searches are cached or success"
while !search_queue.empty?
  result = search_queue.pop
  # extract client id
  search_id = result[:search_metadata][:id]

  # retrieve client from the archive - blocker
  search_archived = client.2(search_id)
  if search_archived[:search_metadata][:status] =~ /Cached|Success/
    puts "#{search_archived[:search_parameters][:q]}: client done"
    next
  end

  # add results to the client queue
  search_queue.push(result)
end

search_queue.close
puts 'done'

This code shows a simple solution to batch searches asynchronously into a queue. Each search takes a few seconds before completion by SerpApi service and the search engine. By the time the first element pops out of the queue. The search result might be already available in the archive. If not, the search_archive method blocks until the search results are available.

Supported Ruby version.

Ruby versions validated by Github Actions:

Change logs

  • [2023-02-20] 1.0.0 Full API support

Developer Guide

Key goals

  • Brand centric instead of search engine based
    • No hard-coded logic per search engine
  • Simple HTTP client (lightweight, reduced dependency)
    • No magic default values
    • Thread safe
  • Easy extension
  • Defensive code style (raise a custom exception)
  • TDD
  • Best API coding practice per platform
  • KiSS principles

Inspirations

This project source code and coding style was inspired by the most awesome Ruby Gems:

Code quality expectations

  • 0 lint offense: rake lint
  • 100% tests passing: rake test
  • 100% code coverage: rake test (simple-cov)

Developer Guide

Design : UML diagram

Class diagram

classDiagram
  Application *-- serpapi 
  serpapi *-- Client
  class Client {
    engine String
    api_key String
    params Hash
    search() Hash
    html() String
    location() String
    search_archive() Hash
    account() Hash
  }
  openuri <.. Client
  json <.. Client
  Ruby <.. openuri
  Ruby <.. json
Loading

search() : Sequence diagram

sequenceDiagram
    Client->>SerpApi.com: search() : http request 
    SerpApi.com-->>SerpApi.com: query search engine
    SerpApi.com-->>SerpApi.com: parse HTML into JSON
    SerpApi.com-->>Client: JSON string payload
    Client-->>Client: decode JSON into Hash
Loading

where:

  • The end user implements the application.
  • Client refers to SerpApi:Client.
  • SerpApi.com is the backend HTTP / REST service.
  • Engine refers to Google, Baidu, Bing, and more.

The SerpApi.com service (backend)

  • executes a scalable search on engine: "google" using the search query: q: "coffee".
  • parses the messy HTML responses from Google on the backend.
  • returns a standardized JSON response. The class SerpApi::Client (client side / ruby):
  • Format the request to SerpApi.com server.
  • Execute HTTP Get request.
  • Parse JSON into Ruby Hash using a standard JSON library. Et voila!

Continuous integration

We love "true open source" and "continuous integration", and Test Drive Development (TDD). We are using RSpec to test [our infrastructure around the clock]) using Github Action to achieve the best QoS (Quality Of Service).

The directory spec/ includes specification which serves the dual purposes of examples and functional tests.

Set your secret API key in your shell before running a test.

export API_KEY="your_secret_key"

Install testing dependency

$ bundle install
# or
$ rake dependency

Check code quality using Lint.

$ rake lint

Run regression.

$ rake test

To flush the flow.

$ rake

Open coverage report generated by rake test

open coverage/index.html

Open ./Rakefile for more information.

Contributions are welcome. Feel to submit a pull request!

TODO

  • [] Release version 1.0.0

serpapi-ruby's People

Contributors

hartator avatar ilyazub avatar jvmvik avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

gutoarraes

serpapi-ruby's Issues

This gem requires a significant refactor

Hi,

Few minutes into a code and this is what strikes me:

  1. File structure.
    • serpapi/serpapi.rb that hosts just a Client class - should be serpapi/client.rb
    • SerpApi::Client::VERSION that is used as a gem version - should be in a separate file serpapi/version.rb using SerpApi::VERSION namespace
    • serpapi/error.rb does not include corresponding Error module, just SerpApiException class - should be enclosed in unified Errors module, e.g.:
      # serpapi/errors.rb
      module SerpApi
        module Errors
          # all your custom errors
        end
      end
    • specs should represent actual file structure, e.g.:
      spec/serpapi/client_spec.rb    // client config specs
      spec/serpapi/client/*_spec.rb  // all the integration tests that are currently just laying around
      spec/serpapi_spec.rb           // should include VERSION check only
      
  2. Don't use open-uri. It creates temporary file for each response, i.e. redundant IO operation that just degrades performance. BTW, why do you have different HTTP clients in different gems? Some uses faraday, some open-uri, some http etc.
  3. This:
    attr_accessor :timeout
    # Default parameters provided in the constructor (hash)
    attr_accessor :params

    def initialize(params = {})
    # set default read timeout
    @timeout = params[:timeout] || params['timeout'] || 120
    @timeout.freeze
    # delete this client only configuration keys
    params.delete('timeout') if params.key? 'timeout'
    params.delete(:timeout) if params.key? :timeout
    # set default params safely in memory
    @params = params.clone || {}
    @params.freeze
    end

    def engine
    @params['engine'] || @params[:engine]
    end

    def api_key
    @params['api_key'] || @params[:api_key]
    end

    Can be simplified to:
    class SerpApi # should be Client
      DEFAULT_TIMEOUT = 120
    
      attr_accessor :timeout
      attr_reader :params, :engine, :api_key
    
      def initialize(params = {})
        @params = params.transform_keys(&:to_sym)
    
        @timeout = @params[:timeout] || DEFAULT_TIMEOUT
        @engine  = @params[:engine]
        @api_key = @params[:api_key]
      end
    end
  4. So which one?
    # HTTP timeout in seconds (default: 120)

    # timeout [Integer] HTTP read max timeout in seconds (default: 60s)
  5. No need in nil as third param, it's optional:
    get("/searches/#{search_id}.#{format}", format, nil)
  6. This:
    query = (@params || {}).merge(params || {})

    Can be simplified to:
    query = @params.merge(params)
    # @params can't be nil because of default value {} in #initialize
    # params can't be nil because of default value {} in #get
  7. This:
    query.delete_if { |_, value| value.nil? }

    Can be simplified to:
    query.compact!
  8. SerpApiException should not be raised for HTTP transport errors, it's misleading just from the naming convention. There should be a separate error class or just a transparent bypass.
    rescue OpenURI::HTTPError => e
    data = JSON.parse(e.io.read)
    raise SerpApiException, "error: #{data['error']} from url: #{url}" if data.key?('error')
    raise SerpApiException, "fail: get url: #{url} response: #{data}"
    rescue => e
    raise SerpApiException, "fail: get url: #{url} caused by: #{e}"
    end
  9. No full branch coverage in simplecov, you just ignore bunch of untested places:
    # do this
    SimpleCov.start do
      enable_coverage :branch
    
      add_filter "/spec/"
    end
  10. describe/it blocks are just bad. describe should refer to namespaced objects, currently they do context role.
  11. No VCRs and dependence on specific API_KEY with high request limit makes testing impossible for other contributors rather than those who work in company.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.