Giter VIP home page Giter VIP logo

jmd_imagescraper's Introduction

Header image

Hello!

I'm Joe. I'm 47 and I'm a nerd. I've been a professional software developer, a nightclub DJ, a chef, and run a couple of businesses.

I've been coding for over 30 years (C++,VB,Asp,.NET,C#,HTML,Oracle,Sql Server,Python). I'm currently learning data science, machine learning, deep learning etc and right now I'm focussed on fast.ai. This is what I want to do until I retire.

I am not actively applying for positions at the moment, as I'm concentrating my time on finishing the fast.ai course but I'm open to conversation. If you wish to discuss a job in the London/Surrey area (or a remote working job), I'd be happy to hear from you.

:octocat: joedockrill.github.io/blog
๐Ÿ”— linkedin.com/in/joe-dockrill/
๐Ÿ“ง [email protected]


Joe's github stats

jmd_imagescraper's People

Contributors

butchland avatar joedockrill avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

jmd_imagescraper's Issues

Safe search off

Dont know if there is a parameter for safe search off, (like in the url "kp=-2" for safe off) but if there is can you implement it?

Add user agent at getting search token

I had a cron scrapping images from DuckDuckGo while suddenly stopped with the following error:

Traceback (most recent call last):
  File "/path/to/jmd_imagescraper/core.py", line 211, in duckduckgo_search
    links = duckduckgo_scrape_urls(keywords, max_results, img_size, img_type, img_layout, img_color)
  File "/path/to/jmd_imagescraper/core.py", line 82, in duckduckgo_scrape_urls
    assert match is not None, "Failed to obtain search token"
AssertionError: Failed to obtain search token

This was caused as DuckDuckGo returned a non-valid answer for https://duckduckgo.com/418.htm , which then was unable to obtain the token. I also saw other users of your library that obtained that error suddenly but they did not reported that.

This issue can be easily be solved at adding any user agent at that inital request. Here is my suggestion:

resp = requests.post(BASE_URL, data=params, headers={'user-agent': 'my-cool-user-agent/1.0.0'})

Tested that and worked again.

Feeding display_image_cleaner() path not working

I'm not sure what's going on. I can't get the image cleaner to work within my script. I've fed it an absolute path, a pathlib path from cwd, a os cwd path from changing dir, and no matter what, I get: TypeError: expected str, bytes or os.PathLike object, not NoneType

Full stack on the error:

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

[<ipython-input-8-88fd12b35813>](https://localhost:8080/#) in <module>
    553     print(imgdir, defaultprint=True)
    554     print("Image Management Browser:")
--> 555     display_image_cleaner(imgdir)
    556 
    557 

4 frames

[/usr/local/lib/python3.7/dist-packages/jmd_imagescraper/imagecleaner.py](https://localhost:8080/#) in display_image_cleaner(path)
    187     icln_create_widgets(ICLN_BATCH_SZ)
    188     _,_,_,_,_,ddlFolder,_ = icln_pager.children
--> 189     icln_render_batch(ddlFolder.value, 0)

[/usr/local/lib/python3.7/dist-packages/jmd_imagescraper/imagecleaner.py](https://localhost:8080/#) in icln_render_batch(folder, batch, force_reload)
    131 
    132   if(folder == "/"): folder = ""
--> 133   path = icln_base_path/folder
    134 
    135   if((icln_folder != folder) or (force_reload)):

[/usr/lib/python3.7/pathlib.py](https://localhost:8080/#) in __truediv__(self, key)
    923 
    924     def __truediv__(self, key):
--> 925         return self._make_child((key,))
    926 
    927     def __rtruediv__(self, key):

[/usr/lib/python3.7/pathlib.py](https://localhost:8080/#) in _make_child(self, args)
    702 
    703     def _make_child(self, args):
--> 704         drv, root, parts = self._parse_args(args)
    705         drv, root, parts = self._flavour.join_parsed_parts(
    706             self._drv, self._root, self._parts, drv, root, parts)

[/usr/lib/python3.7/pathlib.py](https://localhost:8080/#) in _parse_args(cls, args)
    656                 parts += a._parts
    657             else:
--> 658                 a = os.fspath(a)
    659                 if isinstance(a, str):
    660                     # Force-cast str subclasses to str (issue #21127)

TypeError: expected str, bytes or os.PathLike object, not NoneType

Output:

/content/drive/MyDrive/AI/Stable_Diffusion/images_out/time_to_stabilize
Image Management Browser:

As you can see, the path is correct, and leads to the images folder, but for some reason it always thinks the path isn't a string or bytes.

Search results different from what I see in browser

First of all, great repo!

I tried this out, but almost all of the downloaded images are different from the images I see in the browser for a query. What can be the cause for this?
The downloaded images are relevant to the query, just not what I see in the browser

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.