Giter VIP home page Giter VIP logo

baidu_qx's Introduction

baidu_qx

百度迁徙数据爬虫

一、基本功能介绍

本爬虫用于爬取全国各个城市的迁入迁出数据,以及某各个省份的人数占该城市迁入或迁出人数百分比。也就是百度迁徙页面中的这个部分:

二、项目介绍

使用的库

  • datetime
  • json
  • scrapy
  • urllib

注意:为了该项目能够在你的计算机中运行,请先安装以上库。

在当前项目中已经爬取了2020.1.1至1.2日的每个城市的迁入迁出数据;但是省份所占百分比只爬取了一个城市1.1日的数据。

三、使用方式

  1. 首先克隆下载该项目至你的本地
  2. 安装python3.6或以上版本的python,并安装上述的库,使用 pip install datetime json scrapy urlib安装
  3. 安装完成后进入到该项目的目录中 cd baidu_qx (注意是该项目根目录,不是里面有spiders目录的baidu_qx)
  4. 执行 scrapy crawl city_rank -o city_rank.csv 爬取所有城市的迁徙数据
  5. 执行 scrapy crawl provincerank -o province_rank.csv 爬取每个城市的迁徙人数的省份百分比数。

注意:可以在baidu_qx/baidu_qx/spiders/city_rank以及baidu_qx/baidu_qx/spiders/provincerank下修改你想要爬取的日期端。默认为 2020.1.1–2020.2.10.

四、字段说明

city_rank表的字段说明:

city_name 当前爬取的城市
date 哪一天的迁徙数据
inOrOUt_city 表示人来自或进入那个城市
inOrout 迁入(move_in)或迁出(move_out)
inOrout_city_province_name inOrOUt_city所在的省份
value

provincerank表的字段说明:

city_name 爬取的城市
date 日期
inOrout 迁入(move_in)或迁出(move_out)
province_name 省份
value

baidu_qx's People

Contributors

xinzizi77 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.