serpapi / google-search-results-python Goto Github PK

View Code? Open in Web Editor NEW

566.0 15.0 92.0 243 KB

Google Search Results via SERP API pip Python Package

License: MIT License

Python 97.46% Makefile 2.54%

python serp-api bing-image google-crawler google-images scraping serpapi web-scraping

google-search-results-python's People

Contributors

Stargazers

Watchers

Forkers

hartator nyilmaz1 jvmvik hytsang waitingfy renaldoberkeley keyz702 kfmeeks jmendes1995 deen-abdullah aorzh ziwangdeng taojingfen iridesc gblazq madebydaniz meera1953 barrybyers349 kaspydev theshenzu joy121 ra2003 sirmnemonic chozo99 mdp0999 san-dra russlandry21 danielberrones mahmoodsamadi clarencetjf admariner arevolutioner em5813 taha86347 brocolicos gogora6 prettywork2021 thoughtfulmind manoj-nathwani tsam3000 enhao25 nguyenhonghaidang ag027592 nayopr ibpad samuelhaysom swipswaps 13360806188 tecnologiaintegrada mowmow20 pualien nivharelexplorium tjcmfrnc satzcloud nagaraj-gohush vitaliishvets jenniferouyang ajsierra117 dimitryzub kekewind robert215 preeti2095 wkye gitvick kinleycarlson dldinternet lkampoli epierce78 dodcorp zach-two yezhwi mohammadreza-sheykhmousa andrewibp hblink gaosongs arpitjain799 apollohuang1 arjunsharma1234 mawazawa paplorinc corny813 sorokinvld adrianjuliusaluoch muflhi001 swmeyer1979 ashishkulkarnii radearthlyurn2 abdulrahman305 gutoarraes wliu40 ikerowland mostrub

google-search-results-python's Issues

"error": "Missing query `q` parameter."

The url link gives me this message.

Cannot increase the offset between returned results using pagination

I am trying to use the pagination feature based on the code at (https://github.com/serpapi/google-search-results-python#pagination-using-iterator). I want to request 20 results per API call but pagination by default iterates by 10 results only instead of 20, meaning I my requests end up overlapping.

I think I have found a solution to this. Looking in the package, the pagination.py file has a Pagination class which takes a page_size variable that changes the size of the offset between returned results.
The Pagination class is imported in the serp_api_client.py file within the pagination method starting on line 170 but here the page_size variable wasn't included. I just added page_size = 10 on lines 170 and 174 and now I can use the page_size variable if I call search.pagination(page_size = 20). Can this change be made in the code?

[Google Jobs API] Support for Pagination

As Google Jobs does not return serpapi_pagination key but expects start param to paginate, this iteration of the library does not support pagination in Google Jobs. Pagination Support to be added for Google Jobs.

# stop if backend miss to return serpapi_pagination
if not 'serpapi_pagination' in result:
  raise StopIteration

# stop if no next page
if not 'next' in result['serpapi_pagination']:
    raise StopIteration

SSLCertVerificationError [SSL: CERTIFICATE_VERIFY_FAILED] error

A user reported receiving this error:

SSLCertVerificationError Traceback (most recent call last)
/opt/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
698 # Make the request on the httplib connection object.
--> 699 httplib_response = self._make_request(
700 conn,

SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1125)

The solution for them was to turn off the VPN.

ImportError: cannot import name 'GoogleSearch' from 'serpapi'

Hello,

I am trying to run the example on your git page https://github.com/serpapi/google-search-results-python#batch-asynchronous-searches
But constantly getting this error.

Thanks

How to add proxy and ssl certs while using the Search API? I am currently trying to access it via my corporate network.

in the below code snippet, I can't find any option to add proxy settings or ssl certs like how you do with requests library.

from serpapi import GoogleSearch
search = GoogleSearch({
    "q": "coffee", 
    "location": "Austin,Texas",
    "api_key": "<your secret api key>"
  })
result = search.get_dict()

organic_results is missing the last result

You get one fewer than the value you specify for num in the organic_results list

macOS installation issue

When installing the package via pip it fails.

Collecting google-search-results
  Using cached https://files.pythonhosted.org/packages/08/eb/38646304d98db83d85f57599d2ccc8caf325961e8792100a1014950197a6/google_search_results-1.5.2.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/3m/91gj9l890y71886_7sfndl3r0000gn/T/pip-install-YVqFKL/google-search-results/setup.py", line 7, in <module>
        with open(path.join(here, 'SHORT_README.rst'), encoding='utf-8') as f:
      File "/usr/local/Cellar/python@2/2.7.15_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 898, in open
        file = __builtin__.open(filename, mode, buffering)
    IOError: [Errno 2] No such file or directory: '/private/var/folders/3m/91gj9l890y71886_7sfndl3r0000gn/T/pip-install-YVqFKL/google-search-results/SHORT_README.rst'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/3m/91gj9l890y71886_7sfndl3r0000gn/T/pip-install-YVqFKL/google-search-results/

Running macOS catalina and python 2.7

~ ❯❯❯ pip --version
pip 19.0.2 from /usr/local/lib/python2.7/site-packages/pip (python 2.7)
~ ❯❯❯ python --version
Python 2.7.15

Provide a more convenient way to paginate via the Python package

Currently, the way to paginate searches is to get the serpapi_pagination.current and increase the offset or start parameters in the loop. Like with regular HTTP requests to serpapi.com/search without an API wrapper.

import os
from serpapi import GoogleSearch

params = {
    "engine": "google",
    "q": "coffee",
    "tbm": "nws",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

print(f"Current page: {results['serpapi_pagination']['current']}")

for news_result in results["news_results"]:
    print(f"Title: {news_result['title']}\nLink: {news_result['link']}\n")

while 'next' in results['serpapi_pagination']:
    search.params_dict[
        "start"] = results['serpapi_pagination']['current'] * 10
    results = search.get_dict()

    print(f"Current page: {results['serpapi_pagination']['current']}")

    for news_result in results["news_results"]:
        print(
            f"Title: {news_result['title']}\nLink: {news_result['link']}\n"
        )

A more convenient way for an official API wrapper would be to provide some function like search.paginate(callback: Callable) which will properly calculate offset for the specific search engine and loop through pages until the end.

import os
from serpapi import GoogleSearch

def print_results(results):
  print(f"Current page: {results['serpapi_pagination']['current']}")

  for news_result in results["news_results"]:
    print(f"Title: {news_result['title']}\nLink: {news_result['link']}\n")

params = {
    "engine": "google",
    "q": "coffee",
    "tbm": "nws",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
search.paginate(print_results)

@jvmvik @hartator What do you think?

{'error':'We couldn't find your API Key.'}

`from serpapi.google_search_results import GoogleSearchResults

client = GoogleSearchResults({"q": "coffee", "serp_api_key": "************************"})

result = client.get_dict()`

I tried giving my API key from serpstack. Yet I am left with this error. Any help could be much useful.

Knowledge Graph object not being sent in response.

Some queries which return a knowledge graph in both my own google search and when tested in the SerpApi Playground are not returning the 'knowledge_graph' key in my own application.

Code:

params = {
    'q': 'Aspen Pumps Ltd',
    'engine': 'google',
    'api_key': <api_key>,
    'num': 100
  }

result_set = GoogleSearchResults(params).get_dict()

print(result_set.keys())

Evaluation:

dict_keys(['search_metadata', 'search_parameters', 'search_information', 'ads', 'shopping_results', 'organic_results', 'related_searches', 'pagination', 'serpapi_pagination'])

Manual Results:

https://www.google.com

https://serpapi.com/playground

SerpApiClient.get_search_archive fails with format='html'

SerpApiClient.get_search_archive assumes all results must be loaded as a JSON, so it fails when using format='html'

GoogleSearchResults({}).get_search_archive(search_id='5df0db57ab3f5837994cd5a1', format='html')

---------------------------------------------------------------------------                                                                                                                                   JSONDecodeError                           Traceback (most recent call last)
<ipython-input-8-b6d24cb47bf7> in <module>
----> 1 GoogleSearchResults({}).get_search_archive(search_id='5df0db57ab3f5837994cd5a1', format='html')

C:\ProgramData\Anaconda3\lib\site-packages\serpapi\serp_api_client.py in get_search_archive(self, search_id, format)
78             dict|string: search result from the archive
79         """
---> 80         return json.loads(self.get_results("/searches/{0}.{1}".format(search_id, format)))
81
82     def get_account(self):

C:\ProgramData\Anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352             parse_int is None and parse_float is None and
353             parse_constant is None and object_pairs_hook is None and not kw):
--> 354         return _default_decoder.decode(s)
355     if cls is None:
356         cls = JSONDecoder

C:\ProgramData\Anaconda3\lib\json\decoder.py in decode(self, s, _w)
337
338         """
--> 339         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
340         end = _w(s, end).end()
341         if end != len(s):

C:\ProgramData\Anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
355             obj, end = self.scan_once(s, idx)
356         except StopIteration as err:
--> 357             raise JSONDecodeError("Expecting value", s, err.value) from None
358         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

requests.exceptions.SSLError: HTTPSConnectionPool(host='serpapi.com', port=443)

from langchain import SerpAPIWrapper
from langchain.requests import RequestsWrapper
import requests
#verfy=False
search = SerpAPIWrapper(serpapi_api_key="api")
results = search.run("how to make a cake")`
```

Google lens can't find my image

Connexion issue

Hi,
One of the user using my code get the following error when creating a client.

I suppose it is machine settings related as it doesn't happen to other users.
Thanks for helping
P.S. I am fairly new to coding.

Provide direct access to the underlying raw requests.Response object

Please provide a getter on SerpApiClient to get a reference to the raw underlying requests.Response object after an http call.

Setting Timeout on Instantiation for GoogleSearch

Currently it is not possible to set a timeout threshold on Instantiation of a GoogleSearch client and instead it defaults to a hard-coded value of 60,000.

Api Exception and Tests on Api Exception missing

Error loading library: No module named 'lib.google_search_results'

Check this:

https://snipboard.io/

[Feature Request] Add Async Implementation

One of our users asked for async requests implementation as in this example here.

https://www.twilio.com/blog/asynchronous-http-requests-in-python-with-aiohttp

It is possible that this function can accept a list of URLs to request as argument.

ImportError: cannot import name 'GoogleSearch' from 'serpapi'

After creating a subscriber account in serpapi, I have been given an API key. I Installed the "pip install google-search-results "

but whenever I tried to run my django app I get this error:

File "/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/site-packages/django/utils/autoreload.py", line 64, in wrapper
fn(*args, **kwargs)
File "/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/site-packages/django/core/management/commands/runserver.py", line 125, in inner_run
autoreload.raise_last_exception()
File "/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/site-packages/django/utils/autoreload.py", line 87, in raise_last_exception
raise _exception[1]
File "/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/site-packages/django/core/management/init.py", line 394, in execute
autoreload.check_errors(django.setup)()
File "/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/site-packages/django/utils/autoreload.py", line 64, in wrapper
fn(*args, **kwargs)
File "/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/site-packages/django/init.py", line 24, in setup
apps.populate(settings.INSTALLED_APPS)
File "/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/site-packages/django/apps/registry.py", line 116, in populate
app_config.import_models()
File "/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/site-packages/django/apps/config.py", line 269, in import_models
self.models_module = import_module(models_module_name)
File "/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 843, in exec_module
File "", line 219, in _call_with_frames_removed
File "/Users/MyProjects/topsearch/topsearch/searchapp/models.py", line 3, in
from serpapi import GoogleSearch
ImportError: cannot import name 'GoogleSearch' from 'serpapi' (/Users/nazibabdullah/opt/miniconda3/envs/topsearch/lib/python3.8/site-packages/serpapi/init.py)

how to resolve the Connection aborted error when calling the serpapi

Hi,
A new scrapper here.
in my api call, i have the following error. Would you please let me know if i am doing anything wrong here? Thanks a lot

https://serpapi.com/search
---------------------------------------------------------------------------
ConnectionResetError                      Traceback (most recent call last)
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    676                 headers=headers,
--> 677                 chunked=chunked,
    678             )

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    380         try:
--> 381             self._validate_conn(conn)
    382         except (SocketTimeout, BaseSSLError) as e:

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/connectionpool.py in _validate_conn(self, conn)
    977         if not getattr(conn, "sock", None):  # AppEngine might not have  `.sock`
--> 978             conn.connect()
    979 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/connection.py in connect(self)
    370             server_hostname=server_hostname,
--> 371             ssl_context=context,
    372         )

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/util/ssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data)
    385         if HAS_SNI and server_hostname is not None:
--> 386             return context.wrap_socket(sock, server_hostname=server_hostname)
    387 

/anaconda/envs/azureml_py36/lib/python3.6/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
    406                          server_hostname=server_hostname,
--> 407                          _context=self, _session=session)
    408 

/anaconda/envs/azureml_py36/lib/python3.6/ssl.py in __init__(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context, _session)
    816                         raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 817                     self.do_handshake()
    818 

/anaconda/envs/azureml_py36/lib/python3.6/ssl.py in do_handshake(self, block)
   1076                 self.settimeout(None)
-> 1077             self._sslobj.do_handshake()
   1078         finally:

/anaconda/envs/azureml_py36/lib/python3.6/ssl.py in do_handshake(self)
    688         """Start the SSL/TLS handshake."""
--> 689         self._sslobj.do_handshake()
    690         if self.context.check_hostname:

ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

ProtocolError                             Traceback (most recent call last)
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    448                     retries=self.max_retries,
--> 449                     timeout=timeout
    450                 )

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    726             retries = retries.increment(
--> 727                 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
    728             )

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    409             if read is False or not self._is_method_retryable(method):
--> 410                 raise six.reraise(type(error), error, _stacktrace)
    411             elif read is not None:

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/packages/six.py in reraise(tp, value, tb)
    733             if value.__traceback__ is not tb:
--> 734                 raise value.with_traceback(tb)
    735             raise value

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    676                 headers=headers,
--> 677                 chunked=chunked,
    678             )

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    380         try:
--> 381             self._validate_conn(conn)
    382         except (SocketTimeout, BaseSSLError) as e:

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/connectionpool.py in _validate_conn(self, conn)
    977         if not getattr(conn, "sock", None):  # AppEngine might not have  `.sock`
--> 978             conn.connect()
    979 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/connection.py in connect(self)
    370             server_hostname=server_hostname,
--> 371             ssl_context=context,
    372         )

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/urllib3/util/ssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data)
    385         if HAS_SNI and server_hostname is not None:
--> 386             return context.wrap_socket(sock, server_hostname=server_hostname)
    387 

/anaconda/envs/azureml_py36/lib/python3.6/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
    406                          server_hostname=server_hostname,
--> 407                          _context=self, _session=session)
    408 

/anaconda/envs/azureml_py36/lib/python3.6/ssl.py in __init__(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context, _session)
    816                         raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 817                     self.do_handshake()
    818 

/anaconda/envs/azureml_py36/lib/python3.6/ssl.py in do_handshake(self, block)
   1076                 self.settimeout(None)
-> 1077             self._sslobj.do_handshake()
   1078         finally:

/anaconda/envs/azureml_py36/lib/python3.6/ssl.py in do_handshake(self)
    688         """Start the SSL/TLS handshake."""
--> 689         self._sslobj.do_handshake()
    690         if self.context.check_hostname:

ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
<ipython-input-26-45ac328ca8f8> in <module>
      1 question = 'where to get best coffee'
----> 2 results = performSearch(question)

<ipython-input-25-5bc778bad4e2> in performSearch(question)
     12 
     13     search = GoogleSearch(params)
---> 14     results = search.get_dict()
     15     return results

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/serpapi/serp_api_client.py in get_dict(self)
    101             (alias for get_dictionary)
    102         """
--> 103         return self.get_dictionary()
    104 
    105     def get_object(self):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/serpapi/serp_api_client.py in get_dictionary(self)
     94             Dict with the formatted response content
     95         """
---> 96         return dict(self.get_json())
     97 
     98     def get_dict(self):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/serpapi/serp_api_client.py in get_json(self)
     81         """
     82         self.params_dict["output"] = "json"
---> 83         return json.loads(self.get_results())
     84 
     85     def get_raw_json(self):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/serpapi/serp_api_client.py in get_results(self, path)
     68             Response text field
     69         """
---> 70         return self.get_response(path).text
     71 
     72     def get_html(self):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/serpapi/serp_api_client.py in get_response(self, path)
     57             url, parameter = self.construct_url(path)
     58             print(url)
---> 59             response = requests.get(url, parameter, timeout=self.timeout)
     60             return response
     61         except requests.HTTPError as e:

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/requests/api.py in get(url, params, **kwargs)
     73     """
     74 
---> 75     return request('get', url, params=params, **kwargs)
     76 
     77 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/requests/api.py in request(method, url, **kwargs)
     59     # cases, and look like a memory leak in others.
     60     with sessions.Session() as session:
---> 61         return session.request(method=method, url=url, **kwargs)
     62 
     63 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    540         }
    541         send_kwargs.update(settings)
--> 542         resp = self.send(prep, **send_kwargs)
    543 
    544         return resp

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/requests/sessions.py in send(self, request, **kwargs)
    653 
    654         # Send the request
--> 655         r = adapter.send(request, **kwargs)
    656 
    657         # Total elapsed time of the request (approximately)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    496 
    497         except (ProtocolError, socket.error) as err:
--> 498             raise ConnectionError(err, request=request)
    499 
    500         except MaxRetryError as e:

ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

You need a valid browser to continue exploring our API

This is the error message you get when you don't supply a private key. I think information on this site should be provided regarding:

How to get an API key
Is a key free or how much does it cost
Are there limits to using the key (hits/hour or whatever)

The service provided by the repo is very valuable, but can I use it or not depends on the answers to these questions.

AttributeError during the import of the module: `initialized module 'serpapi' has no attribute 'GoogleSearch' (most likely due to a circular import)`

The first line of your code example:
from serpapi import GoogleSearch
here
gives me
AttributeError: partially initialized module 'serpapi' has no attribute 'GoogleSearch' (most likely due to a circular import)

get_html() Returns JSON Instead of HTML

A customer reported the get_html() method for this library returns a JSON response instead of the expected HTML.

I may be misunderstanding something about what the get_html method is intended to do, but I checked this locally and the customer's report appears to be correct:

Python 3.8+, Fatal Python error: Segmentation fault when calling requests.get(URL, params) with docker python-3.8.2-slim-buster/openssl 1.1.1d and python-3.9.10-slim-buster/openssl 1.1.1d

Here's the trace:

Python 3.9.10 (main, Mar  1 2022, 21:02:54) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> requests.get('https://serpapi.com', {"api_key":VALID_API_KEY, "engine": "google_jobs", "q": "Barista"})
Fatal Python error: Segmentation fault

Current thread 0x0000ffff8e999010 (most recent call first):
  File "/usr/local/lib/python3.9/ssl.py", line 1173 in send
  File "/usr/local/lib/python3.9/ssl.py", line 1204 in sendall
  File "/usr/local/lib/python3.9/http/client.py", line 1001 in send
  File "/usr/local/lib/python3.9/http/client.py", line 1040 in _send_output
  File "/usr/local/lib/python3.9/http/client.py", line 1280 in endheaders
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 395 in request
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 496 in _make_request
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 790 in urlopen
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 486 in send
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 703 in send
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 589 in request
  File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 59 in request
  File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 73 in get
  File "<stdin>", line 1 in <module>
Segmentation fault

This is not specific to one engine, it also applies to google_images if I swap the engine.

Dockerfile:

FROM python:3.9.10-slim-buster

ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1

# OLD: RUN apt-get update && apt-get upgrade -y && apt-get install gcc -y && apt-get install apt-utils -y

# Install build-essential for celery worker otherwise it says gcc not found
RUN apt-get update \
  # dependencies for building Python packages
  && apt-get install -y build-essential \
  # psycopg2 dependencies
  && apt-get install -y libpq-dev \
  # Additional dependencies
  && apt-get install -y telnet netcat \
  # cleaning up unused files
  && apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
  && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY ./compose/local/flask/start /start
RUN sed -i 's/\r$//g' /start
RUN chmod +x /start

# COPY ./compose/local/flask/celery/worker/start /start-celeryworker
# RUN sed -i 's/\r$//g' /start-celeryworker
# RUN chmod +x /start-celeryworker

# COPY ./compose/local/flask/celery/beat/start /start-celerybeat
# RUN sed -i 's/\r$//g' /start-celerybeat
# RUN chmod +x /start-celerybeat

# COPY ./compose/local/flask/celery/flower/start /start-flower
# RUN sed -i 's/\r$//g' /start-flower
# RUN chmod +x /start-flower

COPY . .

# COPY entrypoint.sh /usr/local/bin/
# ENTRYPOINT ["entrypoint.sh"]

docker-compose.yml:

version: "3.9"

services:
  flask_app:
    restart: always
    container_name: flask_app
    image: meder/flask_live_app:1.0.0
    command: /start
    build: .
    ports:
      - "4000:4000"
    volumes:
      - .:/app
    env_file:
      - local.env
    environment:
      - FLASK_ENV=development
      - FLASK_APP=app.py
    depends_on:
      - db
  db:
    container_name: flask_db
    image: postgres:16.1-alpine
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_USER=USER
      - POSTGRES_PASSWORD=PW
      - POSTGRES_DB=DB
    volumes: 
      - postgres_data:/var/lib/postgresql/data
  redis:
    container_name: redis
    image: redis:7.2-alpine
    ports:
      - "6379:6379"
volumes:
  postgres_data: {}

And requirements.txt though I didn't update requirements.txt after trying 3.9.10 from the original 3.8.2:

flask==3.0.0
psycopg2-binary==2.9.9
google-search-results==2.4.2

The above trace came from bashing into my docker instance and running requests.get after importing it like so:

docker exec -it flask_app bash

The host machine runs this fine, but uses LibreSSL 2.8.3 / Python 3.8.16 - based on other tickets/issues here it seems like there's possibly something on the SSL side of the backend that's triggering this - would appreciate some insight

Someone ran into this on SO and the selected answer was updating the timeout: https://stackoverflow.com/questions/74774784/cheerypy-server-is-timing-out but no guarantee this is the same issue, just a reference.

[Pagination] Pagination isn't correct and it skips index by one

Since the start value starts from 0, the correct second page should be 10 and not 11.

This behaviour is causing a skip in pages also. The customers are getting confusing results:

Intercom Link
First recognized by @marm123.

I think this part needs to be replaced by:

self.client.params_dict['start'] += 0

Whether it would cause any error on other engines is something I don't know. But it may also fix it for every other engine.

Does the api need money to provide service?

[Version] Update PyPi to include the most up-to-date version

Currently, the PyPi allows users to install our library using pip easily. However, the library has been updated not to include the print method (screenshot below). The PyPi version still includes it, causing confusion among users. Some of them think that the printed URL should contain their data from search and contacting us about SerpApi not working, while others simply ask for this to be removed for clarity.

Current state:

The user confused about the data not being available in the printed link.
Another user confused about the data not being available in the printed link

The user asking to remove the print method for clarity (they installed it through PyPi)
Another user asking to remove the print method

Link for account details for PyPi

..

...

[Feature request] Make `async: True` do everything under the hood

From a user perspective, the less setup required the better. I personally find the second example (example.py) more user-friendly especially for non-very technical users.

The user has to just add an async: True and don't bother tinkering/figuring out stuff for another ~hour about how Queue or something else works.

@jvmvik @ilyazub @hartator what do you guys think?

@aliayar @marm123 @schaferyan have you guys noticed similar issues for the users or have any users requested similar things?

What if instead of this:

# async batch requests: https://github.com/serpapi/google-search-results-python#batch-asynchronous-searches

from serpapi import YoutubeSearch
from queue import Queue
import os, re, json

queries = [
    'burly',
    'creator',
    'doubtful'
]

search_queue = Queue()

for query in queries:
    params = {
        'api_key': '...',                 
        'engine': 'youtube',              
        'device': 'desktop',              
        'search_query': query,          
        'async': True,                   # ❗
        'no_cache': 'true'
    }
    search = YoutubeSearch(params)       
    results = search.get_dict()         
    
    if 'error' in results:
        print(results['error'])
        break

    print(f"Add search to the queue with ID: {results['search_metadata']}")
    search_queue.put(results)

data = []

while not search_queue.empty():
    result = search_queue.get()
    search_id = result['search_metadata']['id']

    print(f'Get search from archive: {search_id}')
    search_archived = search.get_search_archive(search_id)
    
    print(f"Search ID: {search_id}, Status: {search_archived['search_metadata']['status']}")

    if re.search(r'Cached|Success', search_archived['search_metadata']['status']):
        for video_result in search_archived.get('video_results', []):
            data.append({
                'title': video_result.get('title'),
                'link': video_result.get('link'),
                'channel': video_result.get('channel').get('name'),
            })
    else:
        print(f'Requeue search: {search_id}')
        search_queue.put(result)

Users can do something like this and we handle everything under the hood:

# example.py
# testable example
# example import: from serpapi import async_search

from async_search import async_search
import json

queries = [
    'burly',
    'creator',
    'doubtful',
    'minecraft' 
]

# or as we typically pass params dict
data = async_search(queries=queries, api_key='...', engine='youtube', device='desktop')

print(json.dumps(data, indent=2))
print('All searches completed')

Under the hood code example:

# async_search.py
# testable example

from serpapi import YoutubeSearch
from queue import Queue
import os, re

search_queue = Queue()

def async_search(queries, api_key, engine, device):
    data = []
    for query in queries:
        params = {
            'api_key': api_key,                 
            'engine': engine,              
            'device': device,              
            'search_query': query,          
            'async': True,                  
            'no_cache': 'true'
        }
        search = YoutubeSearch(params)       
        results = search.get_dict()         
        
        if 'error' in results:
            print(results['error'])
            break

        print(f"Add search to the queue with ID: {results['search_metadata']}")
        search_queue.put(results)

    while not search_queue.empty():
        result = search_queue.get()
        search_id = result['search_metadata']['id']

        print(f'Get search from archive: {search_id}')
        search_archived = search.get_search_archive(search_id)
        
        print(f"Search ID: {search_id}, Status: {search_archived['search_metadata']['status']}")

        if re.search(r'Cached|Success', search_archived['search_metadata']['status']):
            for video_result in search_archived.get('video_results', []):
                data.append({
                    'title': video_result.get('title'),
                    'link': video_result.get('link'),
                    'channel': video_result.get('channel').get('name'),
                })
        else:
            print(f'Requeue search: {search_id}')
            search_queue.put(result)
            
    return data

Is there a specific reason we haven't done it before?

Pagination iterator doesn't work for APIs with token-based pagination

For several APIs, parsing the serpapi_pagination.next is the only way to update params_dict with correct values. An increment of params.start won't work for Google Scholar Profiles, Google Maps, YouTube.

google-search-results-python/serpapi/pagination.py

Lines 26 to 27 in ed7797c

 # increment page 

 self.start += self.page_size

Google Scholar Profiles

Google Scholar Profiles API have pagination.next_page_token instead of serpapi_pagination.next.

pagination.next is a next page URI like https://serpapi.com/search.json?after_author=0QICAGE___8J&engine=google_scholar_profiles&hl=en&mauthors=label%3Asecurity where after_author is set to next_page_token.

Google Maps

In Google Maps Local Results API there's only serpapi_pagination.next with a URI like https://serpapi.com/search.json?engine=google_maps&ll=%4040.7455096%2C-74.0083012%2C14z&q=Coffee&start=20&type=search

YouTube

In YouTube Search Engine Results API there's serpapi_pagination.next_page_token similar to Google Scholar Profiles. serpapi_pagination.next is a URI with sp parameter set to next_page_token.

@jvmvik What do you think about parsing serpapi_pagination.next in Pagination#__next__?

- self.start += self.page_size
+ self.client.params_dict.update(dict(parse.parse_qsl(parse.urlsplit(result['serpapi_pagination']['next']).query)))

Here's an example of endless pagination of Google Scholar Authors (scraped 190 pages and manually stopped).

google scholar pagination not returning final results page

I am using the paginate method with google scholar engine to return all results for a search term. When I use a for loop to iterate the pagination and put the results a list, it doesn't return the final page of results, instead stopping at the penultimate page (code snippet and terminal output below).

import serpapi
import os
from loguru import logger
from dotenv import load_dotenv

load_dotenv()

search_string = '"Singer Instruments" PhenoBooth'

# Pagination allows iterating through all pages of results
logger.info("Initialising search through serpapi")
search = serpapi.GoogleSearch(
    {
        "engine": "google_scholar",
        "q": search_string,
        "api_key": os.getenv("SERPAPI_KEY"),
        "as_ylo": 1900,
    }
)
pages = search.pagination(start=0, page_size=20)

# get dict for each page of results and store in list
results_list = []
page_number = 1
for page in pages:
    logger.info(f"Retrieving results page {page_number}")
    results_list.append(page)
    page_number += 1

gscholar_results = results_list[0]["search_information"]["total_results"]
print(f"results reported by google scholar: {gscholar_results}")

paper_count = 0
for page in results_list:
    for paper in page["organic_results"]:
        paper_count += 1

print(f"number of papers in results: {paper_count}")

If I check my searches on serpAPI.com, results are being generated for all pages (see below for example in code). So the problem is not that the result isn't generated, its just not coming out of the pagination iterator for some reason.

Exception not handled on SerpApiClient.get_json

I am experiencing unexpected behaviors when getting thousands of queries. For some reason, sometimes the API returns an empty response. It happens at random (1 time out of 10000 perhaps).

When this situation happens, the method SerpApiClient.get_json does not handle the empty response. In consecuence, the json.loads() raises an exception causing a JSONDecodeError.

I attach an image to clarify the issue.

It seems a problem with the API service. Not sure if the problem should be solved with an Exception handling, handling the code 204 (empty response), or if there is any bug with servers.

to reproduce the exception:

import json json.loads('')
Do you recommend any guidelines to handle the problem in the meanwhile you review the issue on the source code?

Thanks.

How to get "related articles" links from google scholar via serpapi?

I am using SERP API to fetch google scholar papers, although there is always a link called "related articles' under each article but SERP API doesn't have any SERP URL to fetch data of those links?

Serp API result :

Can I directly call this URL https://scholar.google.com/scholar?q=related:gemrYG-1WnEJ:scholar.google.com/&scioq=Multi-label+text+classification+with+latent+word-wise+label+information&hl=en&as_sdt=0,21 using serp API?

Python package should not include tests

When installing via pip, the installation includes the tests directory:

mkdir deps
python3 -m pip install --target deps "google-search-results==2.4.2"
ls -1 deps/tests

Outputs:

__init__.py
__pycache__
test_account_api.py
(etc)

Tests should be excluded.

ImportError: cannot import name 'GoogleSearch' from 'serpapi' python 3.10

from serpapi import GoogleSearch
ImportError: cannot import name 'GoogleSearch' from 'serpapi' (C:\Users\XXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\serpapi\__init__.py)

What's the fix?

google scholar pagination skips result 20

When retrieving results from Google Scholar using the pagination() method, the first article on the second page of google scholar is always missing.

I think this is caused by the following snippet in the update() method of google-search-results-python/serpapi/pagination.py:

def update(self):
        self.client.params_dict["start"] = self.start
        self.client.params_dict["num"] = self.num
        if self.start > 0:
            self.client.params_dict["start"] += 1

This seems to mean that for all pages except the first, paginate increases start by 1. So while for the first page it requests results starting at 0 and ending at 19 (if page_size=20). For the second page it requests results starting at 21 and ending at 40, skipping result 20.

If I delete the if statement, the code seems to work as intended and I get result 19 back.

KeyError when Calling Answer Box

I've attempted to get the results from the answer box using the documentation here.

I noticed the Playground does not return these results either.

Is there any way to get this URL also?

Output Returned when Attempting to Run the Sample Provided:

from serpapi import GoogleSearch

params = {
  "q": "What's the definition of transparent?",
  "hl": "en",
  "gl": "us",
  "api_key": ""
}

search = GoogleSearch(params)
results = search.get_dict()
answer_box = results['answer_box']

Different results from serpapi (Google Trends) versus Google Trends site

I'm having an issue (which is causing a serious headache / project issue for us) where the results from serpapi are different versus those returned from Google, when querying the website directly.

Below is a simple code snippet to reproduce:

from serpapi import GoogleSearch
import pandas as pd

PARAMS = {'engine': 'google_trends',
          'data_type' : 'RELATED_QUERIES',
          'q' : "health insurance",
          'geo': "IE",
          'date' : "2022-01-01 2022-12-31",
          'hl' : 'en-GB',
          'csv' : True,
          'api_key' : '[Key]'}

search = GoogleSearch(PARAMS) 
results = search.get_dict() 

rel = results['related_queries']['top']
df = pd.DataFrame(rel)

df[["query", "value"]]

df.to_csv("serpapi_results.csv")

Below is an image with the difference in results (and also attached are the files)

This is an image of the URL in Google Trends:

serpapi_results.csv
relatedQueries_google_results.csv

Can you let me know if this is a known issue or if I've made some mistake in my API call?
Thanks,
Ronan

[Discuss] Wrapper longer response times caused by some overhead/additional processing

@jvmvik this issue is for discussion.

I'm not 100% sure what the cause is, but there's might be some overhead or additional processing in the wrapper that causes longer response times. Or it is as it should be? Let me know if it's the case.

Table shows results when making 50 requests:

Making direct requests to serpapi.com/search.json	Making a request to serpapi.com through API wrapper	Making a request with `async` batch requests with `Queue`
~7.192448616027832 seconds	~135.2969319820404 seconds	~24.80349826812744 seconds

Making a direct request to `serpapi.com/search.json`:

import aiohttp
import asyncio
import os
import json
import time

async def fetch_results(session, query):
    params = {
        'api_key': '...',
        'engine': 'youtube',
        'device': 'desktop',
        'search_query': query,
        'no_cache': 'true'
    }
    
    url = 'https://serpapi.com/search.json'
    async with session.get(url, params=params) as response:
        results = await response.json()

    data = []

    if 'error' in results:
        print(results['error'])
    else:
        for result in results.get('video_results', []):
            data.append({
                'title': result.get('title'),
                'link': result.get('link'),
                'channel': result.get('channel').get('name'),
            })

    return data

async def main():
    # 50 queries
    queries = [
        'burly',
        'creator',
        'doubtful',
        'chance',
        'capable',
        'window',
        'dynamic',
        'train',
        'worry',
        'useless',
        'steady',
        'thoughtful',
        'matter',
        'rotten',
        'overflow',
        'object',
        'far-flung',
        'gabby',
        'tiresome',
        'scatter',
        'exclusive',
        'wealth',
        'yummy',
        'play',
        'saw',
        'spiteful',
        'perform',
        'busy',
        'hypnotic',
        'sniff',
        'early',
        'mindless',
        'airplane',
        'distribution',
        'ahead',
        'good',
        'squeeze',
        'ship',
        'excuse',
        'chubby',
        'smiling',
        'wide',
        'structure',
        'wrap',
        'point',
        'file',
        'sack',
        'slope',
        'therapeutic',
        'disturbed'
    ]

    data = []

    async with aiohttp.ClientSession() as session:
        tasks = []
        for query in queries:
            task = asyncio.ensure_future(fetch_results(session, query))
            tasks.append(task)

        start_time = time.time()
        results = await asyncio.gather(*tasks)
        end_time = time.time()

        data = [item for sublist in results for item in sublist]

    print(json.dumps(data, indent=2, ensure_ascii=False))
    print(f'Script execution time: {end_time - start_time} seconds') # ~7.192448616027832 seconds

asyncio.run(main())

Same code but using the wrapper `YoutubeSearch` (not 100% sure if valid comparison):

import aiohttp
import asyncio
from serpapi import YoutubeSearch
import os
import json
import time

async def fetch_results(session, query):
    params = {
        'api_key': '...',
        'engine': 'youtube',
        'device': 'desktop',
        'search_query': query,
        'no_cache': 'true'
    }
    search = YoutubeSearch(params)
    results = search.get_json()

    data = []

    if 'error' in results:
        print(results['error'])
    else:
        for result in results.get('video_results', []):
            data.append({
                'title': result.get('title'),
                'link': result.get('link'),
                'channel': result.get('channel').get('name'),
            })

    return data

async def main():
    queries = [
        'burly',
        'creator',
        'doubtful',
        'chance',
        'capable',
        'window',
        'dynamic',
        'train',
        'worry',
        'useless',
        'steady',
        'thoughtful',
        'matter',
        'rotten',
        'overflow',
        'object',
        'far-flung',
        'gabby',
        'tiresome',
        'scatter',
        'exclusive',
        'wealth',
        'yummy',
        'play',
        'saw',
        'spiteful',
        'perform',
        'busy',
        'hypnotic',
        'sniff',
        'early',
        'mindless',
        'airplane',
        'distribution',
        'ahead',
        'good',
        'squeeze',
        'ship',
        'excuse',
        'chubby',
        'smiling',
        'wide',
        'structure',
        'wrap',
        'point',
        'file',
        'sack',
        'slope',
        'therapeutic',
        'disturbed'
    ]

    data = []

    async with aiohttp.ClientSession() as session:
        tasks = []
        for query in queries:
            task = asyncio.ensure_future(fetch_results(session, query))
            tasks.append(task)
        
        start_time = time.time()
        results = await asyncio.gather(*tasks)
        end_time = time.time()

        data = [item for sublist in results for item in sublist]

    print(json.dumps(data, indent=2, ensure_ascii=False))
    print(f'Script execution time: {end_time - start_time} seconds') # ~135.2969319820404 seconds

Using `async` batch requests with `Queue`:

from serpapi import YoutubeSearch
from urllib.parse import (parse_qsl, urlsplit)
from queue import Queue
import os, re, json
import time

# 50 queries
queries = [
    'burly',
    'creator',
    'doubtful',
    'chance',
    'capable',
    'window',
    'dynamic',
    'train',
    'worry',
    'useless',
    'steady',
    'thoughtful',
    'matter',
    'rotten',
    'overflow',
    'object',
    'far-flung',
    'gabby',
    'tiresome',
    'scatter',
    'exclusive',
    'wealth',
    'yummy',
    'play',
    'saw',
    'spiteful',
    'perform',
    'busy',
    'hypnotic',
    'sniff',
    'early',
    'mindless',
    'airplane',
    'distribution',
    'ahead',
    'good',
    'squeeze',
    'ship',
    'excuse',
    'chubby',
    'smiling',
    'wide',
    'structure',
    'wrap',
    'point',
    'file',
    'sack',
    'slope',
    'therapeutic',
    'disturbed'
]

search_queue = Queue()

for query in queries:
    params = {
        'api_key': '...',                 
        'engine': 'youtube',              
        'device': 'desktop',              
        'search_query': query,          
        'async': True,                   
        'no_cache': 'true'
    }

    search = YoutubeSearch(params)       # where data extraction happens
    results = search.get_dict()          # JSON -> Python dict
    
    if 'error' in results:
        print(results['error'])
        break

    print(f"Add search to the queue with ID: {results['search_metadata']}")
    search_queue.put(results)

data = []

start_time = time.time()

while not search_queue.empty():
    result = search_queue.get()
    search_id = result['search_metadata']['id']

    print(f'Get search from archive: {search_id}')
    search_archived = search.get_search_archive(search_id)
    
    print(f"Search ID: {search_id}, Status: {search_archived['search_metadata']['status']}")

    if re.search(r'Cached|Success', search_archived['search_metadata']['status']):
        for video_result in search_archived.get('video_results', []):
            data.append({
                'title': video_result.get('title'),
                'link': video_result.get('link'),
                'channel': video_result.get('channel').get('name'),
            })
    else:
        print(f'Requeue search: {search_id}')
        search_queue.put(result)
        
print(json.dumps(data, indent=2))
print('All searches completed')

execution_time = time.time() - start_time
print(f'Script execution time: {execution_time} seconds') # ~24.80349826812744 seconds