Comments (25)
Can you share command line you used so I can reproduce? Also not sure what you mean with web driver
from bulk-bing-image-downloader.
By web driver, I mean when we use selenium library to perform scrapping we need to provide chromedriver to it. but you are only using urllib here, maybe something like that can be provided here for urllib as well?
python bbid.py -f ./birds_name_bbid_new -o ~/dataset/images_bbid/ --limit 950
here is the file birds_name_bbid_new...
https://drive.google.com/open?id=1QnIZC--d4JoWEJzcHIoOX0uSbJ3zE6ns
from bulk-bing-image-downloader.
It works for me. What OS do you use?
from bulk-bing-image-downloader.
Ubuntu 18.04 LTS
from bulk-bing-image-downloader.
Are you able to download more than 100 images?
from bulk-bing-image-downloader.
Yep. There is 4202 images. Please check exit code of command ( via echo $?
after bbid finishes)
from bulk-bing-image-downloader.
I just ran the script again, it is not throwing any error but still, the exit code is 1 !!!
from bulk-bing-image-downloader.
Can you show me output of ulimit -n
, afterwards increase this limit via ulimit -n 1024
and see if it changes something for next command run?
from bulk-bing-image-downloader.
it's 1024
from bulk-bing-image-downloader.
Maybe you can also play with --threads option
from bulk-bing-image-downloader.
yes, but I do not have a very high-speed internet connection (although it's stable), that's why 20 is enough for me, I think.
On which OS, you test this script?
from bulk-bing-image-downloader.
I mean reduce it, not increase. I developed it on Ubuntu, but now I use Mac OS
from bulk-bing-image-downloader.
Surely, reducing the thread might help, will get back to you after the next run...
from bulk-bing-image-downloader.
Tried still the same issue...
from bulk-bing-image-downloader.
Have you also tried to use 1, or 2 threads only as well? If that doesn't make difference, perhaps use strace as a last resort to see what's going on.
from bulk-bing-image-downloader.
yes, I've used 1 thread as well, and now I'm thinking to change the script a little bit and include selenium and see how it works.
from bulk-bing-image-downloader.
selenium is very slow, but I guess in your case it would be better than just stopping entirely
from bulk-bing-image-downloader.
yes
from bulk-bing-image-downloader.
Related #15
I would need som help from somebody who can troubleshoot this. I can't fix it if I can't reproduce it
from bulk-bing-image-downloader.
surely, I'll help you to troubleshoot, I think you'll be able to reproduce it if you give the large keyword file and run the script once it ends, check the output directory it will neither download images for all the keywords mentioned in the input file nor create directories for all keywords.
you can use the input file I uploaded earlier to try to reproduce.
This weekend I'll try to debug, and see what's happening, and let you know. As of now, I didn't get the time to troubleshoot.
from bulk-bing-image-downloader.
I actually did use exact same file you posted here, with same CLI arguments and it downloads for me over 1000 images
from bulk-bing-image-downloader.
for all the keywords?
from bulk-bing-image-downloader.
Indeed, no. Can you try if increasing timeout on this line
Bulk-Bing-Image-downloader/bbid.py
Line 77 in 2c858cf
from bulk-bing-image-downloader.
@ostrolucky the script stops downloading after nearly 500 images even though I set the limit to 2000. Do you know how I can download more images?
(base) mona@mona:~/research/Bulk-Bing-Image-downloader$ ./bbid.py -s 'cat' --limit 2000
from bulk-bing-image-downloader.
You need to make sure there actually is more than 2000 unique images on bing first. After that, you might want to experiment with changing the sleep line of code I posted, it's a bing issue.
from bulk-bing-image-downloader.
Related Issues (20)
- Error In Using Script HOT 4
- download more pic HOT 4
- Dynamic output Directory HOT 1
- input file format HOT 1
- Error: socket.timeout: The read operation timed out HOT 1
- Cannot Download More Images HOT 10
- adult filter off doesn't work for me HOT 4
- File mode weirdness HOT 4
- problem HOT 1
- Having the same IP to scrape consistent images HOT 1
- Unable to bulk download from Win10 HOT 6
- Unpredictable what number of images will be scraped HOT 1
- Using with a CSV file HOT 1
- --limit combined with txt file with objects HOT 2
- limited image download HOT 1
- The value of user-agent affects search results HOT 1
- error: [Errno 2] No such file or directory: 'build/scripts-3.8/__init__.py' HOT 2
- Could You Please set Limit on pr keywords, ? HOT 1
- rename downloaded file by search string HOT 1
- -f, path to a string, how does it work ? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bulk-bing-image-downloader.