从互联网爬取一些实用或者有趣的信息
git clone https://github.com/Chloe-Y/somePythonSpider.git
🐣输入工作关键词爬取拉勾网工作信息并保存到 mysqlite3 数据库,默认深圳地区
cd lagou
python run lagou.py
# suggest run in sublime, cmd will cause unicode encode error and ignore job title & job company
# you can choose change keyword in file or input the keyword
# and later, you can query job detail with sqlite3 database
🐝通过51job链接爬取页面的工作详情,并且保存到csv文件中
cd 51Jobs
scrapy crawl get51Jobs
# search jobs in 51jobs.com and copy url
# run the scrapy
# paste 51job url, get jobs detail and save into csv file
在51job.com 查询工作信息,复制链接,爬虫运行后黏贴链接
🐘爬取西刺代理的高匿代理,再使用线程池连接百度验证代理可用性
cd proxy
python xici.py
input page number you want to crawl (100 proxies/page)
preview, i used pillow to merge this two pictures
🐑爬取 gatherproxy的国内代理,再使用线程池连接百度验证代理可用性
cd proxy
python getProxy.py
🎶 登陆个人账号获取 user id 然后爬取用户喜欢的音乐多进程下载歌曲
cd music
python echoDownload.py
# input your user id
# start download
🌊关键词搜索 unsplash 图片,得到链接,然后用子进程添加链接到 IDM 任务列表下载,速度不错哟!
cd picture
python unsplash.py
# enter keyword to search photo
# enter page number you want to crawl
# enter photo type you want to download
📚 输入标签或者关键词爬取豆瓣的图书信息,ID, 书名,作者,简介等
cd douban
python doubanBook.py
# enter books keyword
# enter the number of books you want to save
🌀爬取sto.cc网站的小说,需要使用代理连接
cd novel
python stocc.py
# paste sto.cc website novel link
🐳使用scrapy 爬虫 52shuku8.com的小说,保存成txt文件
cd book/book52shuku
scrapy crawl get52shuku
# paste novel link and start download novel to txt file