Giter VIP home page Giter VIP logo

lianjia's Introduction

LianJiaSpider速度一分钟1000+

前言

  • 利用此网页接口实现功能
  • 目前支持的城市 上海 北京 广州 深圳 烟台 厦门 长沙 郑州 请把你需要的城市发为Issues我会时常看
  • 作者:Mrx ;WeChat:xwk245776832 ; 邮箱:[email protected] 有任何问题请发邮件 我会尽可能帮助你
  • 此接口通过网页js脚本计算出get所需参数,攻破了此难点,接口调用次数无限,速度不限,上海市100000+数据不会被反爬

运行

  • 示例代码

1. 地区区域范围数据库准备

import Lianjia.lianjia as lj
lj.SaveCityBorderIntoDB('上海')
#保存上海市的所有区域边缘经纬度并保存在目录下district.db文件内

district.db文件表结构如下

create table 城市名 
(
  id int PRIMARY KEY ,
  name text,
  longitude text,
  latitude text,
  border text,
  unit_price int,
  count int
)

2. 爬取区域内二手房楼盘数据

import Lianjia.lianjia as lj
#lj.SaveCityBorderIntoDB('上海')
lj.HoleCityDown('上海')
#保存市区内所有在售楼盘的信息并保存在目录下LianJia_area.db文件内

LianJia_area.db文件表结构如下

create table 城市名 
(
  id int PRIMARY KEY ,
  district text,
  name text,
  longitude text,
  latitude text,
  unit_price int,
  count int
)

3. 爬取区域内楼盘中每个在售房屋的信息

import Lianjia.lianjia as lj
#lj.SaveCityBorderIntoDB('上海')
#lj.HoleCityDown('上海')
lj.GetCompleteHousingInfo('上海')
#保存所有在售楼盘的每套房屋信息并保存在目录下DetailInfo.db文件内

DetailInfo.db文件表结构如下

create table 城市名 
(houseId PRIMARY  KEY , 
houseCode, title, appid, 
source, imgSrc, layoutImgSrc, 
imgSrcUri,layoutImgSrcUri, 
roomNum, square, buildingArea, 
buildYear, isNew, ctime,
mtime, orientation, floorStat, 
totalFloor, decorateType, 
hbtName,isYezhuComment, 
isGarage, houseType, isFocus, 
status, isValid, signTime,
signSource, signSourceCn, 
isDisplay, address, community, 
communityId,communityName, 
communityUrl, communityUrlEsf, 
districtId, districtUrldistrictName, 
regionId, regionUrl, regionName, 
bbdName, bbdUrl, houseCityId,
subwayInfo, schoolName, schoolArr, 
bizcircleFullSpell, house_video_info , 
price,unitPrice, viewUrl, listPrice, 
publishTime, isVilla, villaNoFloorLevel,
villaName, tags)
  • 以上1,2,3步骤 请依次执行,否则会出现错误

  • 或者直接运行以下代码,但耗时会很久

import Lianjia.lianjia as lj
city='上海'
lj.SaveCityBorderIntoDB(city)
lj.HoleCityDown(city)
lj.GetCompleteHousingInfo(city)

2. 高级用法

  • 示例
#稍后更新,先写这么多

3. 版本历史

  • 1.1.0:
  1. 实现链家地图api协议的逆向实现经纬度区域找房
  2. 简单上海市区爬虫
  • 1.1.5
  1. 新增pip,使用此项目可以直接pip install LianJiaSpider安装
  2. 新增城市
  • 1.1.6
  1. 删去js模块模拟获取authorization 由 @Wen Peiyu 做出修改
  2. 增加并修改相应函数,直接调用md5函数获取authorization
  • 1.2.0
  1. 更正由于删去js模块模拟获取authorization导致lj.GetCompleteHousingInfo(city)报错的问题
  • 1.2.1
  1. 增加广州 深圳城市数据

BTW,上海有无工作推荐,现在干的运维真的太枯燥了!

lianjia's People

Contributors

xjkj123 avatar wenpeiyu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.