Comments (11)
@hotheadhacker Like what?
@shobrook I have bypassed bot detection. But in case it occurs the user is prompted with a link to stackoverflow to whitelist themselves. Works fine. You can check my forked repository.
Fork: https://github.com/hotheadhacker/rebound
from rebound.
Having the same issue here, just pip installed yesterday but getting the above mentioned error.
from rebound.
Hey all, I'm aware of this issue. It seems that Stack Overflow has gotten stricter about bot detection and is doing a captcha check every time rebound
makes a request. One solution is to use the StackExchange API instead of a web scraper, but this would require rebound
users to register for an API token. It would also require a refactor of rebound.py
. Please let me know if you have any other ideas or would like to work on this.
from rebound.
It would take lot if time to change to API module from a webscrapper and API calls won't be enough and will be a bottleneck.
Why don't we use advance webscrapper?
from rebound.
@hotheadhacker Like what?
from rebound.
Let me fork this and explain
from rebound.
@shobrook I think I know what the problem is. Doing many requests, in a short amount of time and using random user agents for every request will trigger the captcha every time. I suggest using a single user-agent / answer search, randomizing it only when the program is run (or just using a fixed UA, but that isn't' a very good idea). I don't know if it will fix it, but it certainly is a step in the right direction. Another option is to the the user's UA from the default browser, so it doesn't differ from normal browsing. I will look later and try to fix it. I changed the UA to be randomized only when the program is run, and also fixed some minor anti-pattern issues and cleaning up the code.
You can check my fork here: https://github.com/cristicretu/rebound
I will also try using a unique user agent: Google's Googlebot user agent (https://developers.google.com/search/blog/2019/10/updating-user-agent-of-googlebot). It sometimes fixed the capcha issues.
from rebound.
Thanks @cristicretu and @hotheadhacker. It seems like the user-agents are the issue here. Is there a reason why we can't just remove the list of user-agents and use the user's default agent when making the request to SO?
from rebound.
Is there a reason why we can't just remove the list of user-agents and use the user's default agent when making the request to SO?
That is the only solution, I think. Getting the user's default agent is a little bit tricky, but I will try to do it.
My idea is to open with webbrowser a tab where you can get the UA, then parse it to the script and then continue. This should be done only at the first time of executing, and then it should store the info.
Do you have another idea? @shobrook
from rebound.
Hi @shobrook, I somehow managed to work with the captcha but it has some dependencies.
Workflow:
- Run the current script
- If captcha page comes up, try to solve the captcha
- If captcha is only ticking a checkbox, it will pass. If advanced captcha shows up, it will redirect the user to the manual verification in chrome
I have to start Google Chrome in debugging mode first and use Selenium to interact with the captcha. Dependency on opening Google Chrome in debugging mode and using Selenium web driver. This may cause issues based on the device and platform. But, using this method, I find that captcha solving if done once, it will not occur until the Chrome in debugging mode is restarted or in best case, captcha also does not shows up after restart of Chrome.
from rebound.
Hi guys, did you come to any conclusions? Need help fixing the issue? I'm having the same problem and I thought the idea of ββthe project was very popular, I wanted to see it work...
from rebound.
Related Issues (20)
- Opening results in browser doesn't support google chrome HOT 4
- Possibility to install via dnf on Fedora HOT 1
- Doesn't work on Cygwin either HOT 1
- Issue installing via pip install HOT 1
- Wrong answer count is displayed if the count is not present on search page
- rebound command not available from the command line HOT 3
- IndexError: List index out of range HOT 13
- can we make this to run on windows without using Cygwin? HOT 1
- Code should be distributed across files
- File Not Found Error HOT 1
- from queue import Queue: No module named queue (macOS) HOT 2
- when I was testing rebound, IndexError: list index out of range Error occurs HOT 2
- locale.Error: unsupported locale setting HOT 1
- New Features for Rebound HOT 1
- Unable to run on Windows and IOS HOT 1
- New Features for Rebound HOT 1
- Windows 10 -Cygwin Error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position "
- Stackoverflow refused connection try again HOT 5
- LICENSE file missing from root directory
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rebound.