Giter VIP home page Giter VIP logo

crackgeetest's Introduction

Python3 网络爬虫开发实战

本书介绍了如何利用 Python 3 开发网络爬虫。书中首先详细介绍了环境配置过程和爬虫基础知识;然后讨论了 urllib、requests 等请求库,Beautiful Soup、XPath、pyquery 等解析库以及文本和各类数据库的存储方法;接着通过多个案例介绍了如何进行 Ajax 数据爬取,如何使用 Selenium 和 Splash 进行动态网站爬取;接着介绍了爬虫的一些技巧,比如使用代理爬取和维护动态代理池的方法,ADSL 拨号代理的使用,图形、 极验、点触、宫格等各类验证码的破解方法,模拟登录网站爬取的方法及 Cookies 池的维护。 此外,本书还结合移动互联网的特点探讨了使用 Charles、mitmdump、Appium 等工具实现 App 爬取 的方法,紧接着介绍了 pyspider 框架和 Scrapy 框架的使用,以及分布式爬虫的知识,最后介绍了 Bloom Filter 效率优化、Docker 和 Scrapyd 爬虫部署、Gerapy 爬虫管理等方面的知识。

本书由图灵教育 - 人民邮电出版社出版发行,版权所有,禁止转载。

作者:崔庆才

购买地址:

加读者群:

视频资源:

Python3 爬虫三大案例实战分享

自己动手,丰衣足食!Python3 网络爬虫实战案例

crackgeetest's People

Contributors

germey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crackgeetest's Issues

获取的图片问题

没有滑动滑块时,含有缺口的图片就存在,获取的两张图是一样的,无法找出缺口的位置,有没有其他方法获取没有缺口的图片,或者别的方法破解的,求救

你如何确定阈值= 60和左= 60 ??

你好!

很棒的存储库

做得好。

我想知道一些事情..

在is_pixel_equal()和get_gap()函数中......

你如何确定阈值= 60和左= 60

非常感谢你的帮助!

有错误

验证码位置 171 330 304 562
验证码位置 171 330 304 562
缺口位置 60
滑动轨迹 [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 2, 2, 2, 2]
Traceback (most recent call last):
File "H:\crack.py", line 219, in
crack.crack()
File "H:\crack.py", line 206, in crack
EC.text_to_be_present_in_element((By.CLASS_NAME, 'geetest_success_radar_tip_content'), '验证成功'))
File "C:\Python\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

python 3.6.2
pillow 4.2.1
selenium 3.4.3
chromedriver 2.32
google chrome 61.0.3163.100

拖曳验证失败后,并不会重新拖曳验证

拖曳验证失败后,并不会重新拖曳验证,而是等待到timeout,直到报错????
`TimeoutException Traceback (most recent call last)
in ()
216 if name == 'main':
217 crack = CrackGeetest()
--> 218 crack.crack()

in crack(self)
204
205 success = self.wait.until(
--> 206 EC.text_to_be_present_in_element((By.CLASS_NAME, 'geetest_success_radar_tip_content'), '验证成功'))
207 print(success)
208

~\Anaconda3\lib\site-packages\selenium\webdriver\support\wait.py in until(self, method, message)
78 if time.time() > end_time:
79 break
---> 80 raise TimeoutException(message, screen, stacktrace)
81
82 def until_not(self, method, message=''):

TimeoutException: Message:
`

高分辨率屏幕下裁剪图片的修正办法

captcha = screenshot.crop((left, top, right, bottom))
作者的屏幕应该不是高分辨率,文本缩放是100%的。我的是1920*1080,为了看着舒服就把文本缩放调到了125%。这里把crop的各个参数都乘以相应的缩放倍数就好:
captcha = screenshot.crop((left*1.25,top*1.25,right*1.25,bottom*1.25))

crack()只能运行两次

验证码位置 171 330 436 694
验证码位置 171 330 436 694
缺口位置 60
滑动轨迹 [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 2, 2, 2, 2]
Traceback (most recent call last):
File "G:/spider_study/0816/spider081602.py", line 421, in
crack.crack()
File "G:/spider_study/0816/spider081602.py", line 409, in crack
EC.text_to_be_present_in_element((By.CLASS_NAME, 'geetest_success_radar_tip_content'), '验证成功'))
File "C:\Users\69039\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

selenium.common.exceptions.WebDriverException

selenium.common.exceptions.WebDriverException: Message: UnknownError: Cannot release a button when no button is pressed.'UnknownError: Cannot release a button when no button is pressed.' when calling method: [wdIMouse::up]

报错,我确定我的按钮的 id 是唯一的。

action = ActionChains(driver)
        action.click_and_hold(button).perform()
        distance = 250
        track = get_track(distance)
        for i in track:
            action.move_by_offset(xoffset=i, yoffset=0).perform()
            action.release(button).perform()
            time.sleep(0.5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.