anthonyhseb / googlesearch Goto Github PK
View Code? Open in Web Editor NEWPython library for scraping google search results
License: MIT License
Python library for scraping google search results
License: MIT License
Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.
$ ./go
from: too many arguments
./go: line 2: syntax error near unexpected token `('
./go: line 2: `response = GoogleSearch().search("something")'
googlesearch/googlesearch/googlesearch.py
Line 26 in 099550d
prefetch_pages
parameter. Readme ( https://github.com/anthonyhseb/googlesearch/blob/3dc96936ab857c3173ecb4c7a4856e11a2d346f5/README.rst#features ) mentions prefetch_results
parameter that appears to not be existing.
I attempted to analyse why I get 503 errors so often (worse than with a manual searches) and encountered this mismatch.
Traceback (most recent call last):
File "c:\Abhi\installer\Anaconda3\Lib\site-packages\googlesearch\tempCodeRunnerFile.py", line 1, in
urllib.request
NameError: name 'urllib' is not defined
File "c:\Abhi\installer\Anaconda3\Lib\site-packages\googlesearch\googleseller\Anaconda3\Lib\site-packages\googlesearch\googlesearch.py" response = search.search(query, count)
File "c:\Abhi\installer\Anaconda3\Lib\site-packages\googlesearch\googlesearch.py", line 39, in search
totalText = soup.select(GoogleSearch.TOTAL_SELECTOR)[0].children.next().encode('utf-8')
PS C:\Users\Abhishek Singh\Desktop\Automate Tasks> python -u "c:\Users\Abhishek Singh\Downloads\test2.py"
Traceback (most recent call last):
File "c:\Users\Abhishek Singh\Downloads\test2.py", line 2, in
response = GoogleSearch().search("something")
File "C:\Abhi\installer\Anaconda3\lib\site-packages\googlesearch\googlesearch.py", line 39, in search
ch1 = soup.select(GoogleSearch.TOTAL_SELECTOR)[0].children
IndexError: list index out of range
PS C:\Users\Abhishek Singh\Desktop\Automate Tasks> python -u "c:\Users\Abhishek Singh\Downloads\test2.py"
Traceback (most recent call last):
File "c:\Users\Abhishek Singh\Downloads\test2.py", line 2, in
response = GoogleSearch().search("something")
File "C:\Abhi\installer\Anaconda3\lib\site-packages\googlesearch\googlesearch.py", line 39, in search
totalText = soup.select(GoogleSearch.TOTAL_SELECTOR)[0].children.next().encode('utf-8')
IndexError: list index out of range
Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.
from googlesearch.googlesearch import GoogleSearch
response = GoogleSearch().search("something")
for result in response.results:
print("Title: " + result.title)
print("Content: " + result.getText())
Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
For a couple of days now, search has not been working, this is due to a failing selector:
It would be nice if we could be able to override the constants. Currently, they are bound to GoogleSearch
class. So if we try to extend the class, it still doesn't change the functionality since the class attributes are accessed using GoogleSearch.
instead of self.
Hi all. I'm using googlesearch in a project and recently received the error:
HTTP Error 429: Too Many Requests
Anybody else getting this? How did you resolve it? Do I need to switch to the custom search API?
Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.
Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
I just installed google-search from PyPi for Python3.5. Importing the library causes an ImportError: No module named 'urllib2'
. I think the package needs some update in this regard, this might help.
merose@Panther:~$ sudo -H pip3 install --upgrade google-search
Collecting google-search
Collecting lxml (from google-search)
Downloading lxml-4.1.0-cp35-cp35m-manylinux1_x86_64.whl (5.5MB)
100% |████████████████████████████████| 5.6MB 239kB/s
Requirement already up-to-date: beautifulsoup4 in /usr/local/lib/python3.5/dist-packages (from google-search)
Installing collected packages: lxml, google-search
Found existing installation: lxml 3.5.0
Uninstalling lxml-3.5.0:
Successfully uninstalled lxml-3.5.0
Successfully installed google-search-1.0.2 lxml-4.1.0
merose@Panther:~$ python3
Python 3.5.2 (default, Sep 14 2017, 22:51:06)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from googlesearch.googlesearch import GoogleSearch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/googlesearch/googlesearch.py", line 6, in <module>
import urllib2
ImportError: No module named 'urllib2'
googlesearch.__version__
, but install says google-search-1.1.1
Simple search.
I get no results, no errors, and nothing.
sander@brixit:~$ python3 -m pip install google-search
Collecting google-search
Downloading google_search-1.1.1-py2.py3-none-any.whl (6.5 kB)
Requirement already satisfied: soupsieve in ./.local/lib/python3.8/site-packages (from google-search) (2.2.1)
Requirement already satisfied: lxml in ./.local/lib/python3.8/site-packages (from google-search) (4.6.2)
Requirement already satisfied: beautifulsoup4 in ./.local/lib/python3.8/site-packages (from google-search) (4.9.3)
Installing collected packages: google-search
Successfully installed google-search-1.1.1
sander@brixit:~$ python3
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from googlesearch.googlesearch import GoogleSearch
>>> response = GoogleSearch().search("something")
>>> response.results
[]
>>>
Other machine same same:
>>> from googlesearch.googlesearch import GoogleSearch
>>> response = GoogleSearch().search("something")
>>> print(response.results)
[]
>>> response.
response.results response.total
>>> response.total
4
>>>
Am I doing something wrong?
It's giving this error when i execute: ModuleNotFoundError: No module named 'urllib2'.
I'm trying to execute this code that i got on the lib page on pypi.org.
from googlesearch.googlesearch import GoogleSearch
response = GoogleSearch().search("something")
for result in response.results:
print("Title: " + result.title)
print("Content: " + result.getText())
Hi 👊
This is my first visit to this fine repo, but it seems you have been working hard to keep all dependencies updated so far.
Once you have closed this issue, I'll create separate pull requests for every update as soon as I find one.
That's it for now!
Happy merging! 🤖
Hi, I just ran the sample code, and I got the error message of "Service Unavailable".
Could you please help me understand it? Is it just Google blocked it? Is there other way to load Google search result to Python?
Thank you very much!
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
HTTPError: Service Unavailable
Hello,
how can i set default search language?
Is there a planned support for python 3?
Thanks,
Robert
When I try to run the snippet in the example, I get the following error:
FileNotFoundError: [Errno 2] No such file or directory: '/Projects/PycharmProjects/webcrawler/venv/lib/python3.8/site-packages/googlesearch/browser_agents.txt'
gs = GoogleSearch()
results = gs.search(term, 20)
for result in results:
print("Title: " + result.title)
print("Content: " + result.getText())
print()
The issue is in MANIFEST.in
. By default, when building python packages, non .py files are not included. See the image below:
Adding include googlesearch/browser_agents.txt
to MANIFEST.in should resolve the issue.
Hi everyone :),
I have an issue, I have a csv file in input whose gives the keyword that I want to search. I set up my search to get the first 2 urls. But I want to clean the url. If one url it's not good I want to get to the next url.
Is it possible to do that via the google-search library ?
Thank you
Always get Http Error (429?)
File "C:\Users\hp\Documents\pythonPractice\googleSearchResult.py", line 9, in
response = GoogleSearch().search("okayu",num_results=5,prefetch_pages=False,num_prefetch_threads=10)
File "C:\Users\hp\anaconda3\lib\site-packages\googlesearch\googlesearch.py", line 57, in search
with closing(opener.open(GoogleSearch.SEARCH_URL +
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 563, in error
result = self._call_chain(*args)
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 755, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 563, in error
result = self._call_chain(*args)
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 755, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Users\hp\anaconda3\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
HTTPError: Too Many Requests
from googlesearch.googlesearch import GoogleSearch
response = GoogleSearch().search("okayu",num_results=5,prefetch_pages=False,num_prefetch_threads=10)
for result in response.results:
print("Title: " + result.title)
print("Content: " + result.getText())
HTTPError: HTTP Error 500: Internal Server Error
from googlesearch.googlesearch import GoogleSearch
response = GoogleSearch().search("tacita dean", num_results=150)
for i, result in enumerate(response.results):
if 'test' in result.url.lower():
print i, resut.url
break
Hi, this document a little small. Please come new big document.
Ali İlteriş Keskin
__version__
from __init__.py
says 1.0.0?Like what I am saying in title, when using default prefetch_pages
argument and trying to get more than 10 results, it cause program feels like running forever!
Here picture with prefetch_pages=False
:
Here picture with prefetch_pages=True
, almost 3 minutes and program still running
Trying to get more than 10 results
I'm using a simple test script, barely modified from the example in the readme. I think the TOTAL_SELECTOR is missing from the results page, which seems to happen when Google displays "card" results (or whatever those are called). You can see an example with the query "danny elfman emmy award":
This could presumably be fixed by checking for the presence of the selector. However, I am interested in the total count as well, so I'm wondering if you have any ideas for another way to get that, for queries that trigger the "card" display.
Traceback (most recent call last):
File "google_search_test.py", line 21, in <module>
r = GoogleSearch().search(query, prefetch_pages=False)
File "/Users/abernstein/venvs/0/lib/python2.7/site-packages/googlesearch/googlesearch.py", line 39, in search
totalText = soup.select(GoogleSearch.TOTAL_SELECTOR)[0].children.next().encode('utf-8')
IndexError: list index out of range
encoding error : input conversion failed due to input error, bytes 0x81 0x84 0x35 0x29
from googlesearch.googlesearch import GoogleSearch
response = GoogleSearch().search("tacita dean", num_results=50)
for i, result in enumerate(response.results):
print i, result.url
I've been trying to create a way to handle the 429 error that comes up when google tries to throttle you, but no matter what I do the app still crashes. Any idea why or what I can do to handle it?
The code I'm using is:
try:
k=search(query, num=1, stop=1, pause=20)
except urllib.error.HTTPError:
print('failed')`
Thanks,
Dan
response = GoogleSearch().search("something")
File "C:\ProgramData\Anaconda3\lib\site-packages\googlesearch\googlesearch.py", line 64, in search
totalText = soup.select(GoogleSearch.TOTAL_SELECTOR)[0].children.__next__()
IndexError: list index out of range
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.