riddlerq / simple_image_download Goto Github PK

View Code? Open in Web Editor NEW

132.0 7.0 55.0 25 KB

Python script that lets you auto download images from google images using tags

License: MIT License

Python 100.00%

simple_image_download's People

Contributors

Stargazers

Watchers

simple_image_download's Issues

Progress Bar

I made a progress bar to show the progress of the keywords because I downloaded like a thousand pictures. I think this will be a great addition to your program.

Feature Request: An API to replace the spaces in the downloaded files and folders with underscores

I have to parse the filenames and foldernames generated by simple_image_download and I would like to be able to pass a flag that replaces spaces with underscores, as that makes things more consistent and programmatic

problem with exception handling, keywords input and headers dict

I've been testing the library and there are 3 important problems.
1- different exceptions causes the program to stop that could be fixed just by handling different possible exceptions of requests and urllib libraries.
as a temporary solution inside simple_image_download.py file under def check_webpage(url): below
try: request = requests.get(url, allow_redirects=True, timeout=10) if 'html' not in str(request.content): checked_url = request
add these exceptions:
except requests.exceptions.RequestException as e: print("requests exception:", url) pass
this will fix the exceptions problems letting the code contiune it's execution even if some error happens fetching pages or downloading files.

2- giving multiple keywords to download function that includes spaces between words will cause the program to just get the first word before space as keyword and leave the rest of string.
as a temporary solution edit line keywords_to_search = [str(item).strip() for item in keywords.split(',')][0].split() under def generate_search_url(keywords): inside simple_image_download.py file to keywords_to_search = keywords.split(',')

3- google not sending respond to some requests. as i checked and perforemd some tests inside simple_image_download.py file under HEADERS = { change the content of dictionary to 'User-Agent': "Mozilla/5.0 (Windows; U; Windows NT 6.1; WOW64) AppleWebKit/602.42 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36", "Accept-Encoding": "*", "Connection": "keep-alive" this helps.

Goes into infinite loop if the google image search "did not match any image results"

On Line 37, if the search query does NOT return any results on google images, python goes into an infinite loop.

` 35 try:
36 new_line = raw_html.find('"https://', end_object + 1)
---> 37 end_object = raw_html.find('"', new_line + 1)
38
39 buffor = raw_html.find('\', new_line + 1, end_object)

KeyboardInterrupt: `

Maybe find the string "did not match any image results" in html file first and raise error?

cant download.

HTTPSConnectionPool(host='upload.wikimedia.org', port=443): Max retries exceeded with url: /wikipedia/commons/thumb/7/71/2010-kodiak-bear-1.jpg/1200px-2010-kodiak-bear-1.jpg (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000001A335181C88>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

kindly help me out with this issue.
Thank you.

No support for gif

When I run this code, it only downloads files like this one:

But sometimes when I download a lot of images without typing '.gif', some gifs are downloaded with no problem.

from simple_image_download import simple_image_download as simp
response = simp.simple_image_download
response().download('cars', 10, '.gif' )

Unable to install and use python-magic-bin==0.4.14 for installation

when trying to pip install simple_image_download , its asking for python-magic-bin. But the python-magic-bin is not able to install

Will repeatedly download the same images

This code seems to download the same n images over and over when I search for any term. That is, the x_{i} = x_{i+nk} where k is any natural number. For example, when I search up "eastern cottontail", I only get 84 unique images. This is a show stopper for me. Ideally, I would like the functionality to support duplicate image detection and ignore dupes.

ERROR: Could not find a version that satisfies the requirement python-magic-bin==0.4.14 (from versions: none)

When i try install on ubuntu 20.04, i'm getting some errors
ERROR: Could not find a version that satisfies the requirement python-magic-bin==0.4.14 (from versions: none)
ERROR: No matching distribution found for python-magic-bin==0.4.14

AttributeError: 'simple_image_download' object has no attribute 'search_urls'

Hi, thanks for this wonderful library. I tried downloading the image and the URLs for those images but I got below mentioned error:
`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in
1 from simple_image_download import simple_image_download as simp
2 response = simp.simple_image_download()
----> 3 response.search_urls('Circular economy', limit=5)
4 for url in response.cache:
5 print(url)

AttributeError: 'simple_image_download' object has no attribute 'search_urls'`

I am trying on Google Colab and also tried on Windows but it is giving the same error. Can you please help me fix this? Thanks in advanced.

Extensions cannot accept more than one entry

There is a bug regarding the extensions parameter. I am not able to input more than one extension as in the code you are doing set([value]), thus if I pass an array of extensions as value, an error is thrown.

Only 33/100 queries returned

`from simple_image_download import simple_image_download as simp

response = simp.simple_image_download
queries = candidates['scientificName'].tolist()
for query in queries:
response().download(query, 1)
print(response().urls(query, 1))`

Does Google block an IP after so many calls? I have a for loop that attempts to get pictures from 100 queries, but there is only 33 returned.....Is there something I am missing?

Download stuck at 0%

The code worked fine under macos 10, Python 3.8 but under Windows 10 and the same Python version the download always gets stuck at 0%

File size

Could you please add an option to set the size of images to be downloaded?

Unable to use "extensions" argument on linux.

Getting error TypeError: download() got an unexpected keyword argument 'extensions' when trying to run script on Linux (RPi OS). Error does not occur when running same code in Windows 11.

cannot search for "green apples"

Code does a split on on strings and creates a separate URL and search for every word. Searching for "green apples" gives a folder with images of "green" and a separate folder of (red) "apples".

You would think it could be solved by quoting, "'green apples'", but that causes the package to create a url to search every character in that phrase -- ', g, r, e, ...

This needs to be fixed so the image search is anything that can be searched in images.google. E.g., " +'green apples' clipart ".

Downloads corrupted and sometimes wrong images

So, half of the images I try to download end up 'unreadable'. And a lot of the times, they are not even in the search. For example: I tried downloading on a day when Google had a special banner. It kept downloading the banner instead of the image. Any fix to this? Maybe updating the code?

riddlerq / simple_image_download Goto Github PK

simple_image_download's People

Contributors

Stargazers

Watchers

Forkers

simple_image_download's Issues

Recommend Projects

Recommend Topics

Recommend Org