Giter VIP home page Giter VIP logo

somepythonspider's Introduction

Some Python Spider

从互联网爬取一些实用或者有趣的信息

Get Code

git clone https://github.com/Chloe-Y/somePythonSpider.git

Jobs

Proxy Pool

Music

Picture

Books info

Novel


Lagou.com

🐣输入工作关键词爬取拉勾网工作信息并保存到 mysqlite3 数据库,默认深圳地区

setps

cd lagou
python run lagou.py 
# suggest run in sublime, cmd will cause unicode encode error and ignore job title & job company 
# you can choose change keyword in file or input the keyword
# and later, you can query job detail with sqlite3 database

preview lagou

51Job.com

🐝通过51job链接爬取页面的工作详情,并且保存到csv文件中

preview 51jobs

steps

cd 51Jobs
scrapy crawl get51Jobs
# search jobs in 51jobs.com and copy url
# run the scrapy
# paste 51job url, get jobs detail and save into csv file

在51job.com 查询工作信息,复制链接,爬虫运行后黏贴链接 search

爬虫下来的csv文件,需要整理一下 csv file


xicidaili.com

🐘爬取西刺代理的高匿代理,再使用线程池连接百度验证代理可用性

steps

cd proxy
python xici.py
input page number you want to crawl (100 proxies/page)

preview, i used pillow to merge this two pictures xici

gatherproxy.com

🐑爬取 gatherproxy的国内代理,再使用线程池连接百度验证代理可用性

steps

cd proxy
python getProxy.py

preview gatherProxy


app-echo.com

🎶 登陆个人账号获取 user id 然后爬取用户喜欢的音乐多进程下载歌曲

steps

cd music
python echoDownload.py
# input your user id
# start download

输入user id 下载 echo

下载如下 echoDL


unsplash.com

🌊关键词搜索 unsplash 图片,得到链接,然后用子进程添加链接到 IDM 任务列表下载,速度不错哟!

steps

cd picture
python unsplash.py
# enter keyword to search photo
# enter page number you want to crawl
# enter photo type you want to download

preview unsplash


douban

📚 输入标签或者关键词爬取豆瓣的图书信息,ID, 书名,作者,简介等

steps

cd douban
python doubanBook.py
# enter books keyword
# enter the number of books you want to save

query doubanbooks database querybooks


sto.cc

🌀爬取sto.cc网站的小说,需要使用代理连接

steps

cd novel
python stocc.py
# paste sto.cc website novel link

52shuku.com

🐳使用scrapy 爬虫 52shuku8.com的小说,保存成txt文件

steps

cd book/book52shuku
scrapy crawl get52shuku
# paste novel link and start download novel to txt file

somepythonspider's People

Contributors

chloe-y avatar

Watchers

James Cloos avatar Flyher D avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.