Giter VIP home page Giter VIP logo

zhihu-spider's Introduction

答乎

爬取知乎题目,定时抓取评论数、回答数、阅读数,跟踪分析题目热度走势,并图表呈现。目前有如下功能:

  1. 知乎帐号登陆;
  2. 跟踪问题,定时爬取;
  3. 搜索问题;
  4. 发现问题。

如何使用

目前只支持自己搭建服务器使用

开发

  1. cd server && npm run dev
  2. cd client && npm run dev

线上

  1. /client/config/index.js 中设置好线上server地址
  2. cd server && npm start
  3. cd client && npm run build

注意与声明

此系统仅作学习与个人使用! 系统通过爬虫技术,使得可以通过知乎帐户登陆,但会在数据库存下用户的知乎cookie,请不要用此系统为非作歹。

zhihu-spider's People

Contributors

wuomzfx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

zhihu-spider's Issues

无法显示验证码

2019.12.4
server: npm start
client: npm run build
网站由caddy部署
打开网站之后不显示验证码

定时任务时间间隔与爬虫请求时间设计

需求:

1.定时爬取网站(带分页),所有分页内容爬取完成后存储到数据库内.

2.定时执行爬虫执行1步骤

问题是当分页很多,网络情况不稳定的时候,全部爬完所有页面可能会超过定时任务设定的重启爬虫时间.这样会导致之前的爬虫爬取不全或者出错.请问这种问题如何解决?谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.