riddlerq / simple_image_download Goto Github PK
View Code? Open in Web Editor NEWPython script that lets you auto download images from google images using tags
License: MIT License
Python script that lets you auto download images from google images using tags
License: MIT License
I made a progress bar to show the progress of the keywords because I downloaded like a thousand pictures. I think this will be a great addition to your program.
I have to parse the filenames and foldernames generated by simple_image_download and I would like to be able to pass a flag that replaces spaces with underscores, as that makes things more consistent and programmatic
I've been testing the library and there are 3 important problems.
1- different exceptions causes the program to stop that could be fixed just by handling different possible exceptions of requests and urllib libraries.
as a temporary solution inside simple_image_download.py file under def check_webpage(url):
below
try: request = requests.get(url, allow_redirects=True, timeout=10) if 'html' not in str(request.content): checked_url = request
add these exceptions:
except requests.exceptions.RequestException as e: print("requests exception:", url) pass
this will fix the exceptions problems letting the code contiune it's execution even if some error happens fetching pages or downloading files.
2- giving multiple keywords to download function that includes spaces between words will cause the program to just get the first word before space as keyword and leave the rest of string.
as a temporary solution edit line keywords_to_search = [str(item).strip() for item in keywords.split(',')][0].split()
under def generate_search_url(keywords):
inside simple_image_download.py file to keywords_to_search = keywords.split(',')
3- google not sending respond to some requests. as i checked and perforemd some tests inside simple_image_download.py file under HEADERS = {
change the content of dictionary to 'User-Agent': "Mozilla/5.0 (Windows; U; Windows NT 6.1; WOW64) AppleWebKit/602.42 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36", "Accept-Encoding": "*", "Connection": "keep-alive"
this helps.
On Line 37, if the search query does NOT return any results on google images, python goes into an infinite loop.
` 35 try:
36 new_line = raw_html.find('"https://', end_object + 1)
---> 37 end_object = raw_html.find('"', new_line + 1)
38
39 buffor = raw_html.find('\', new_line + 1, end_object)
KeyboardInterrupt: `
Maybe find the string "did not match any image results" in html file first and raise error?
HTTPSConnectionPool(host='upload.wikimedia.org', port=443): Max retries exceeded with url: /wikipedia/commons/thumb/7/71/2010-kodiak-bear-1.jpg/1200px-2010-kodiak-bear-1.jpg (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000001A335181C88>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
kindly help me out with this issue.
Thank you.
When I run this code, it only downloads files like this one:
But sometimes when I download a lot of images without typing '.gif', some gifs are downloaded with no problem.
from simple_image_download import simple_image_download as simp
response = simp.simple_image_download
response().download('cars', 10, '.gif' )
when trying to pip install simple_image_download , its asking for python-magic-bin. But the python-magic-bin is not able to install
This code seems to download the same n images over and over when I search for any term. That is, the x_{i} = x_{i+nk} where k is any natural number. For example, when I search up "eastern cottontail", I only get 84 unique images. This is a show stopper for me. Ideally, I would like the functionality to support duplicate image detection and ignore dupes.
When i try install on ubuntu 20.04, i'm getting some errors
ERROR: Could not find a version that satisfies the requirement python-magic-bin==0.4.14 (from versions: none)
ERROR: No matching distribution found for python-magic-bin==0.4.14
Hi, thanks for this wonderful library. I tried downloading the image and the URLs for those images but I got below mentioned error:
`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in
1 from simple_image_download import simple_image_download as simp
2 response = simp.simple_image_download()
----> 3 response.search_urls('Circular economy', limit=5)
4 for url in response.cache:
5 print(url)
AttributeError: 'simple_image_download' object has no attribute 'search_urls'`
I am trying on Google Colab and also tried on Windows but it is giving the same error. Can you please help me fix this? Thanks in advanced.
There is a bug regarding the extensions parameter. I am not able to input more than one extension as in the code you are doing set([value])
, thus if I pass an array of extensions as value, an error is thrown.
`from simple_image_download import simple_image_download as simp
response = simp.simple_image_download
queries = candidates['scientificName'].tolist()
for query in queries:
response().download(query, 1)
print(response().urls(query, 1))`
Does Google block an IP after so many calls? I have a for loop that attempts to get pictures from 100 queries, but there is only 33 returned.....Is there something I am missing?
The code worked fine under macos 10, Python 3.8 but under Windows 10 and the same Python version the download always gets stuck at 0%
Could you please add an option to set the size of images to be downloaded?
Getting error TypeError: download() got an unexpected keyword argument 'extensions'
when trying to run script on Linux (RPi OS). Error does not occur when running same code in Windows 11.
Code does a split on on strings and creates a separate URL and search for every word. Searching for "green apples" gives a folder with images of "green" and a separate folder of (red) "apples".
You would think it could be solved by quoting, "'green apples'", but that causes the package to create a url to search every character in that phrase -- ', g, r, e, ...
This needs to be fixed so the image search is anything that can be searched in images.google. E.g., " +'green apples' clipart ".
So, half of the images I try to download end up 'unreadable'. And a lot of the times, they are not even in the search. For example: I tried downloading on a day when Google had a special banner. It kept downloading the banner instead of the image. Any fix to this? Maybe updating the code?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.