Giter VIP home page Giter VIP logo

51job-spider's People

Contributors

alige32 avatar chenjiandongx avatar dongpoliu avatar rubinliudongpo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

51job-spider's Issues

我把需要的程序也启动了但没有出现预期的那样

2019-09-23 01:04:32,404 - 爬取第 748 条岗位详情
爬完之后 就一直停滞在这了,只出现post_require_new.txt不太懂哪里出了出了差错
中间有一段儿是:
Traceback (most recent call last):
File "src\gevent\greenlet.py", line 766, in gevent._greenlet.Greenlet.run
File "C:/Users/Administrator/Job/job_spider.py", line 106, in post_require
html = resp.content.decode("gbk")
UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 1060: illegal multibyte sequence
2019-09-22T17:03:11Z <Greenlet at 0x178d629c158: <bound method JobSpider.post_require of <main.JobSpider object at 0x00000178D6288C88>>> failed with UnicodeDecodeError

您能帮忙解惑吗?

薪资计算错误

x + (y - x) * 0.4 应该改为 (x + y) * 0.5 * 0.5,前一个0.5是取区间中值,后一个0.5是除去虚假招聘浮沫(乐观估计)

有个地方是不是错了

counter[seg]` = counter.get(seg, 1) + 1

默认seg这个地方是不是应该是0,因为你后面还有一个加1,你想如果没有找到seq的话,你把默认值搞成1,后面你又加一个1,那这个第一次出现的dict value不就是2了?

counter[seg]` = counter.get(seg, 0) + 1

是不是应该是上面的这样

爬100页左右的数据后就被封了

我改了改你的代码,想多爬点数据,结果大致爬了100页左右以后就被封了。之后就再也获取不到数据了,而且....开代理,改IP都,改浏览器标识都不行,51job反爬虫这么牛逼啊

AttributeError: 'NoneType' object has no attribute 'find_all'

Traceback (most recent call last):
File "job_spider.py", line 338, in
spider.run()
File "job_spider.py", line 329, in run
self.job_spider()
File "job_spider.py", line 90, in job_spider
bs = BeautifulSoup(html, "lxml").find("div", class_="dw_table").find_all("div", class_="el")
AttributeError: 'NoneType' object has no attribute 'find_all'

你好,作者

请问你的可视化制图工具是什么,图表制作的很好看

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.