Giter VIP home page Giter VIP logo

image_search's Introduction

Image Search Python Package

Build Status

A simple python package for saving images from Bing and Google without using API keys. This package utilizes web browsers to help scrape images found on web searches.

This should only be used for educational and personal purposes only. I am not responsible for any issues that may arise by scraping such sources. All images are copyrighted and owned by their respective owners, I do not claim any ownership.

Ensure you have the appropriate version of ChromeDriver on your machine if you would like to scrape from Google Images.

Usage

usage: image_search [-h] [--limit LIMIT] [--json] [--url URL]
                [--adult-filter-off]
                engine query

Example: Google Images

image_search google cat --limit 10 --json

This will download 10 cat images and metadata from Google Images.

Example: Bing Images

image_search bing dog --limit 10 --json

This will download 10 dog images and metadata from Bing Images.

FAQs

How can I download a specific amount of images?

  • Use the --limit flag to define the amount of images you want to download.

How do I search with custom filters on Google Images?

  • Use the --url flag to define your own url with the search filter.

Disclaimer

This program lets you scrape/download many images from Bing and Google. Please do not download any image violating its copyright terms. Google Images and Bing Images are merely search engines that index images and allow you to find them. Neither Google nor Bing produce these images, and as such, they don't warrant a copyright on any of the images. The original creators of the images own the copyrights.

Images published in the United States are automatically copyrighted by their owners, even if they do not explicitly carry a copyright warning. You may not reproduce copyright images without their owner's permission, except in "fair use" cases. You could risk running into lawyer's warnings, cease-and-desist letters, and copyright suits. Please be careful, and make sure your are not violating any laws!

image_search's People

Contributors

arthurg avatar rushilsrivastava avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

image_search's Issues

Python 3 import error

Issue Template

Stacktrace:

Traceback (most recent call last):
  File "/home/cabox/workspace/images/image_search/bin/image_search", line 9, in <module>
    load_entry_point('image-search==0.0.1', 'console_scripts', 'image_search')()
  File "/home/cabox/workspace/images/image_search/lib/python3.4/site-packages/pkg_resources.py", line 351, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/home/cabox/workspace/images/image_search/lib/python3.4/site-packages/pkg_resources.py", line 2363, in load_entry_point
    return ep.load()
  File "/home/cabox/workspace/images/image_search/lib/python3.4/site-packages/pkg_resources.py", line 2088, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
  File "/home/cabox/workspace/images/image_search/lib/python3.4/site-packages/image_search/console.py", line 5, in <module>
    import _bing
ImportError: No module named '_bing'

Running Python3.4, seems to be an import error.

URL Filename for Image File

Hey, Is it possible to somehow get the URL as a name for the file instead of "Scraper_1" "Scraper_2" etc...?

Is there a way to not return the JSON file for each image?

I am returning lots of images and want to be able to simply copy and paste the folder of images by themselves after scraping them. I don't want to manually have to go and remove each json file. Is there a flag to stop the returning of the .json file for each image?

Error in stalling requirements

Could not find a version that satisfies the requirement urllib (from -r requirements.txt (line 3)) (from versions: )
No matching distribution found for urllib (from -r requirements.txt (line 3))

Integrate a Webdriver Manager to make it easier for users to use Google Images Scraping

Currently, you need to have Chromedriver in your path to run the script. I would like to integrate a webdriver manager, possibly this one, to make this easier to operate. This will download Chromedriver for a user that doesn't have it.

The reason this is an issue in the first place is because Chromedriver, like other web driver, is a binary file and is dependent on the OS and Chrome version.

Alternatively, this also works with Geckodriver, and since Google Scraping isn't browser dependent, Geckodriver should be a fallback option.

Encoding Issue

Charmap encoding errors while dumping scraped data into 'dataset/logs/google/source.html'. Explicitly mentioning encoding while opening the file using open() might be useful.

Not downloading

[%] Successfully launched Chrome Browser
[%] Successfully opened link.
[%] Scrolling down.
[%] Successfully clicked 'Show More Button'.
[%] Reached end of Page.
Traceback (most recent call last):
File "google_scraper.py", line 120, in
source = search(url)
File "google_scraper.py", line 55, in search
f = open('dataset/logs/google/source.html', 'w+')
FileNotFoundError: [Errno 2] No such file or directory: 'dataset/logs/google/source.html'

download 0 images?

Issue Template

Please provide what type of issue this is (i.e. bug, suggestion, etc.), your stacktrace (if applicable), and steps taken to try to fix or avoid this bug/issue (if applicable).

I try to google cat with limit 10 images

tumh@tumh-Predator-G9-592:~/images$ PATH=$PATH:/home/tumh/images image_search google cat --limit 10 --json
/usr/local/lib/python2.7/dist-packages/pyOpenSSL-19.1.0-py2.7.egg/OpenSSL/crypto.py:12: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in a future release.
  from cryptography import x509

===============================================

[%] Successfully launched ChromeDriver
[%] Successfully opened link.
/usr/local/lib/python2.7/dist-packages/selenium-4.0.0a6.post2-py2.7.egg/selenium/webdriver/remote/webdriver.py:592: UserWarning: find_element_by_* commands are deprecated. Please use find_element() instead
  warnings.warn("find_element_by_* commands are deprecated. Please use find_element() instead")
[%] Scrolling down.
/usr/local/lib/python2.7/dist-packages/selenium-4.0.0a6.post2-py2.7.egg/selenium/webdriver/remote/webdriver.py:392: UserWarning: find_element_by_* commands are deprecated. Please use find_element() instead
  warnings.warn("find_element_by_* commands are deprecated. Please use find_element() instead")
[%] Reached end of Page.
[%] Closed ChromeDriver.
[%] Indexed 0 Possible Images.

===============================================

[%] Getting Image Information.


[%] Done. Downloaded 0 images.

===============================================

any idea?

Cannot create dataset folder

When i run the bing / google scrape. the folder dataset cannot be created. after finishing download, there is no dataset folder

ModuleNotFoundError: No module named _bing

C:\Users\Calvi\Desktop>image_search google cat --limit 10 --json
Traceback (most recent call last):
File "C:\Users\Calvi\AppData\Local\Programs\Python\Python38-32\Scripts\image_search-script.py", line 11, in
load_entry_point('image-search==0.0.1', 'console_scripts', 'image_search')()
File "c:\users\calvi\appdata\local\programs\python\python38-32\lib\site-packages\pkg_resources_init_.py", line 490, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "c:\users\calvi\appdata\local\programs\python\python38-32\lib\site-packages\pkg_resources_init_.py", line 2855, in load_entry_point
return ep.load()
File "c:\users\calvi\appdata\local\programs\python\python38-32\lib\site-packages\pkg_resources_init_.py", line 2446, in load
return self.resolve()
File "c:\users\calvi\appdata\local\programs\python\python38-32\lib\site-packages\pkg_resources_init_.py", line 2452, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "c:\users\calvi\appdata\local\programs\python\python38-32\lib\site-packages\image_search\console.py", line 5, in
import _bing
ModuleNotFoundError: No module named '_bing'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.