Giter VIP home page Giter VIP logo

winter's Introduction

简介

这是一个简单的scrapy的爬虫 在大神winter删除所有答案之前,备份winter目前为止[2015年05月31日]在知乎的所有答案 原因请戳--> winter的项目地址

本项目作为一个简单的scrapy练手项目,只需要改部分内容,即可爬取知乎任何用户的所有答案 如果您也在用scrapy欢迎交流指正:D

#环境 & Usage

目前实现的功能

命令行下使用scrapy list可以看到三个爬虫

  • q_test: 爬取winter答题首页的所有题目和题目链接
  • question: 进一步跟踪下一页的链接,爬取winter所哟回答过的题目及其链接并存储到数据库
  • answer: 从数据库取出所有链接,进入详情页面,爬题目的详细描述、winter答题的详细内容

todo

  • 题目描述太长的话,会被知乎折叠一部分,本项目目前并不能取到【显示更多】里的描述
  • 处理富文本:比如内容中的图片、a链接
  • winter专栏还没爬
  • winter原项目的【取消所有点赞,批量替换所有答案】功能,没作者权限做不了,后续可以这样玩自己

更新

  • 解决了todo:1描述太长不能获取全部描述的问题

winter's People

Contributors

freeyiyi1993 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.