Describe the bug
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='duckduckgo.com', port=443): Read timed out. (read timeout=3.0) on line 26
in idt/duckgo.py
.
def search(self):
URL = 'https://duckduckgo.com/'
PARAMS = {'q': self.data}
HEADERS = {
'authority': 'duckduckgo.com',
'accept': 'application/json, text/javascript, */*; q=0.01',
'sec-fetch-dest': 'empty',
'x-requested-with': 'XMLHttpRequest',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'referer': 'https://duckduckgo.com/',
'accept-language': 'en-US,en;q=0.9'}
res = requests.post(URL, data=PARAMS, timeout=3.000) # exception occurs here after timeout is exhausted
search_object = re.search(r'vqd=([\d-]+)\&', res.text, re.M|re.I)
#print(search_object)
if not search_object:
return -1;
To Reproduce
Reproduction for valid URLs (i.e. URL = 'https://duckduckgo.com/') might take time, but invalid URLs can be used as well
Steps to reproduce the behavior:
- change line 26 in
idt/duckgo.py
to URL= 'https://duckduckgozzzzzzz1238971873 .com/'
- exception is not handled and downloading stops, instead of retrying for some set number of times
Expected behavior
The requests.exceptions.ReadTimeout should be handled and the query should retry requests and then move to the next keyword or class
Desktop (please complete the following information):
Currently we catch the error like this starting from line 26 in duckgo.py
cur_req_num = 0
max_req_num = 500
while True:
try:
res = requests.post(URL, data=PARAMS, timeout=3.000)
search_object = re.search(r'vqd=([\d-]+)\&', res.text, re.M|re.I)
#print(search_object)
if not search_object:
cur_req_num += 1
print(f"Attempt {cur_req_num}\nRequest failed occured for {URL}. Retrying again!")
if cur_req_num >= max_req_num:
print(f"Max request({max_req_num}) to {URL} reached. Moving to next keyword if any.")
return -1
continue
break
except Exception as e:
cur_req_num += 1
print(f"Attempt {cur_req_num}\nException {e} occured for {URL}. Retrying again!")
if cur_req_num >= max_req_num:
print(f"Max request({max_req_num}) to {URL} reached. Moving to next keyword if any.")
return -1
Should be a better way than this.