Giter VIP home page Giter VIP logo

cf-clearance-scraper's Introduction

Hi there ๐Ÿ‘‹

  • ๐Ÿ‘€ Iโ€™m interested in the internet, computers, programming, cybersecurity, and gaming
  • ๐ŸŒฑ Iโ€™m currently learning HTML and CSS
  • ๐Ÿ’ฌ Ask me about Python
  • โšก Fun fact: My favorite animals are cats and guinea pigs
  • ๐Ÿ“ซ You can reach me via Discord @ Xewdy#7378 or via email @ [email protected]

cf-clearance-scraper's People

Contributors

deepsource-io[bot] avatar dependabot[bot] avatar xewdy444 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cf-clearance-scraper's Issues

why can't run in docker container

the docker image is the following one

mcr.microsoft.com/playwright/python

you can try it , and you will get

root@c274d227588f:/home# python main.py -u https://nowsecure.nl -f cookies.json -v
[13:19:00] [INFO] Launching headless browser...
[13:19:01] [INFO] Going to https://nowsecure.nl...
[13:19:02] [INFO] Solving cloudflare challenge [JavaScript]...
[13:19:30] [ERROR] Failed to retrieve cf_clearance cookie.

Doesnt click the checkbox

The driver does not detect the checkbox, and therefore doesn't click it and fails... Also doesn't detect that it is interactive one which has to be clicked... Both on chrome driver and the other one...

Failed to retrieve cf_clearance cookie.

I get this error message when trying to run the main.py file. I do have the playwright 1.28 version:

$ python3.9 main.py -u https://nowsecure.nl -f cookies.json -v
[23:27:32] [INFO] Launching headless browser...
[23:27:35] [INFO] Going to https://nowsecure.nl...
[23:27:35] [INFO] Solving cloudflare challenge [Managed]...
[23:28:18] [ERROR] Failed to retrieve cf_clearance cookie.

import Error as PlaywrightError

I got this error log:
Traceback (most recent call last):
File "main.py", line 11, in
from playwright._impl._api_types import Error as PlaywrightError

I'm using python 3.8, could you tell me which version of python is recommended

Failed to retrieve the cf_clearence cookie in roobet.com

Hello, for some reason, it detects the challenge in roobet, but cant solve it, any idea why?
C:\Users\Usuario\Downloads\CFScraperOB2>python main.py -u https://roobet.com/ -v
[17:18:10] Checking for cloudflare challenge...
[17:18:10] Cloudflare challenge detected. Fetching cf_clearance cookie...
[17:18:11] Launching headless browser...
[17:18:11] Going to https://roobet.com/...
[17:18:13] Failed to retrieve cf_clearance cookie.

Feature request

Is your feature request related to a problem? Please describe.
Hello, i need a cf_clearance for each proxy i send, but they don't contain the challenge, hence i can't get the cookies to keep my program working

Describe the solution you'd like
A mode maybe, where if active, every request will have a challenge

Describe alternatives you've considered
I can do this using a network sniffer, once cloudflare sees it's capturing info, they always throw the challenge, but not in proxies, since the sniffer is not active there, i was wondering if the program could disguise as a sniffer so it throws the challenge

Additional context
.

failed on nhentai.net

  • OS: ubuntu22.04
  • python3.10.12
python3 main.py -v -f cookies.json https://nhentai.net/
[03:30:22] [INFO] Launching headless browser...
[03:30:23] [INFO] Going to https://nhentai.net/...
[03:30:24] [INFO] Solving Cloudflare challenge [Managed]...
[03:30:55] [ERROR] Failed to retrieve a Cloudflare clearance cookie.

help with user agent

So i'm trying to get the cf-cookie from a site and if I use the default user agent it gets the code no issues

However it only works works with firefox useragents. If I try to use a chrome useragent it doesn't work.
The useragent I need to use is

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36

No issues passing the verification in the chrome browser itself so I don't know why this script only works with firefox user agents

All my scraping scripts are set to use a portable chrome browser and driver so it has to work with the above user agent

Any advice?

Not pass checkbox

In the most recent commit, the checkbox did not pass. Iโ€™ve investigated your code and found that it does not detect the turnstile iframe.

Please check this.

[ERROR] Execution context was destroyed, most likely because of a navigation

But isn't it the normal behavior for CF checks? At least for me it reloads the page whether it lets me through or not even if I'm visiting CF-protected pages normally outside this tool. So it's expected that after passing the checks the page could be navigated away, because as far as I know, CF passes some data through the query string appended to the current URL.

error

10:10:21] [INFO] Solving Cloudflare challenge [Managed]...
[10:17:36] [ERROR] Timeout 15000ms exceeded.
=========================== logs ===========================
checking visibility of locator("#challenge-spinner")

[10:17:36] [ERROR] Failed to retrieve the Cloudflare clearance cookie.

useragent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36

how fix this?

[ERROR]

(No Cloudflare challenge detected.) hello Xewdy444, any method to resolve?

help needed

i have problem on adding this solver in my bot would you please help me on it because am facing turnstile hcaptcha in my target website

'playwright' module missing

from playwright._impl._api_types import Error as PlaywrightError
ModuleNotFoundError: No module named 'playwright'

Am I missing an update?

playwright import error

Was using this just fine before but suddenly got the following error.

Command: py main.py -v -f cookies.json https://flipd.gg

Error:
Traceback (most recent call last):
File "C:\Users\oSana\Desktop\Development\Python\Projects\FlipdBumper\CF-Clearance-Scraper\main.py", line 12, in
from playwright.sync_api import Frame, sync_playwright
File "C:\Users\oSana\AppData\Local\Programs\Python\Python310\lib\site-packages\playwright\sync_api_init_.py", line 25, in
import playwright.sync_api._generated
File "C:\Users\oSana\AppData\Local\Programs\Python\Python310\lib\site-packages\playwright\sync_api_generated.py", line 25, in
from playwright._impl._accessibility import Accessibility as AccessibilityImpl
File "C:\Users\oSana\AppData\Local\Programs\Python\Python310\lib\site-packages\playwright_impl_accessibility.py", line 17, in
from playwright._impl.connection import Channel
File "C:\Users\oSana\AppData\Local\Programs\Python\Python310\lib\site-packages\playwright_impl_connection.py", line 35, in
from pyee import EventEmitter
File "C:\Users\oSana\AppData\Local\Programs\Python\Python310\lib\site-packages\pyee_init
.py",
line 120, in
from pyee.trio import TrioEventEmitter as TrioEventEmitter # noqa
File "C:\Users\oSana\AppData\Local\Programs\Python\Python310\lib\site-packages\pyee\trio.py", line 7, in
import trio
File "C:\Users\oSana\AppData\Local\Programs\Python\Python310\lib\site-packages\trio_init
.py",
line 19, in
from .core import TASK_STATUS_IGNORED as TASK_STATUS_IGNORED # isort: skip
File "C:\Users\oSana\AppData\Local\Programs\Python\Python310\lib\site-packages\trio_core_init
.py", line 9, in
from ._entry_queue import TrioToken
File "C:\Users\oSana\AppData\Local\Programs\Python\Python310\lib\site-packages\trio_core_entry_queue.py", line 129, in
@attr.s(eq=False, hash=False, slots=True)
TypeError: attrs() got an unexpected keyword argument 'eq'

Playwright browser can not get through site even after solving

Hey, this was working well about a couple of days ago. It would be able to go to the site, detect the challenge and solve it then continue to a different page. At the moment it will just auto-solve the challenge and repeat the process. I attached some images and a video for visual context.

After the solution has been done it will log this error:
image

Vid of the browser repeating the solve instead of getting through to the site:
https://github.com/Xewdy444/CF-Clearance-Scraper/assets/93611007/c61c100b-fd7f-4a56-8d1c-37623c471ea3

Error in js solving

C:\Users\Usuario\Downloads\CF-Clearance-Scraper>python main.py -u https://www.bang.com/ -f cookies.txt -v
[00:33:07] Checking for cloudflare challenge...
[00:33:08] Cloudflare challenge detected. Fetching cf_clearance cookie...
[00:33:08] Launching headless browser...
[00:33:09] Going to https://www.bang.com/...
[00:33:10] Solving cloudflare challenge [JavaScript]...
Traceback (most recent call last):
File "main.py", line 208, in
main()
File "main.py", line 185, in main
cookies = get_cookies(args)
File "main.py", line 100, in get_cookies
solve_challenge(page)
File "main.py", line 49, in solve_challenge
verify_button = page.get_by_role("button", name=verify_button_pattern)
AttributeError: 'Page' object has no attribute 'get_by_role'

Issues changing User Agent

Hello, most of the times, when i want to change an user agent, it gets a handshake timeout
This is an example, using the default UA, and using Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36

C:\Users\Usuario\Downloads\CFScraperOB2>python main.py -u https://www.petflow.com/ -v
[15:20:52] Checking for cloudflare challenge...
[15:20:52] Cloudflare challenge detected. Fetching cf_clearance cookie...
[15:20:54] Launching headless browser...
[15:20:56] Going to https://www.petflow.com/...
[15:20:57] Solving cloudflare challenge [Managed]...
[15:20:58] Cookie: cf_clearance=ey4oPPQnPTDdZ6LHuC3tg_BiVHJV.BTJCsiUHt7TG78-1668882058-0-150

C:\Users\Usuario\Downloads\CFScraperOB2>python main.py -u https://www.petflow.com/ -v
[15:35:19] Checking for cloudflare challenge...
[15:35:34] _ssl.c:1108: The handshake operation timed out

chromedriver keeps clicking checkbox

I was testing with chromedriver on commit 57c1696. I did fix TURNSTILE_FRAME with correct XPath to find the new iframe on site https://nowsecure.nl. When running it with -d option, I can see it keeps clicking the checkbox but it cannot pass the validation. I tried different versions of Chrome but no luck with that. Is there any way to successfully get the cookie by using chromedriver?

error

hey - this used to work well before, but is now failing consistently. Wondering if you are facing the same problem too?

Tried in headful mode too. See this failure:
image

No Cloudflare challenge detected

Getting the message "No Cloudflare challenge detected." when using the scraper.

Command: python3 main.py -v -f cookies.json https://www.ebgames.co.nz

The site does have Cloudflare protection and I get a cf_clearance cookie browsing normally so not sure why this is happening.

Any help appreciated.

[Feature request] Show source data

Hello, sorry for making all these requests, i'd love to help you, you can contact me if you need anything
would it be possible to add an option to show the whole source data from the page?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.