dataspider's Introduction

DataSpider

让大家方便的使用各种数据

前言

数据获取最脏最累的活就是下载和清洗数据，其中下载各种各样的数据其实是很要命的事情。这个爬虫系统所做的事情就是将肮脏的部分包裹起来，能通过代码获取干净的数据（至于怎么存储就不是在下关心的问题了）

说是爬虫系统，其实并不是传统意义上的爬虫，而是更加倾向于搜索和收集信息的一个接口。

希望大家能玩得开心。

有一些爬虫因为不可描述的原因我不会放文档，见谅。

财经爬虫

财新网爬虫

财新网爬虫与其说是爬虫，不如说是一个<搜索-下载>系统。首先要获取所有的文章链接，请使用这个接口：

from bdata.finance.caixin_news import query_urls
query_urls(from_date, to_date, query_words)

该函数的作用是搜索所有的含有query_words的文章超链接，其中： from_date和to_date是开始和结束时间，query_words是关键词。时间格式：yyyy-mm-dd 使用样例：

query_urls('2016-09-01', '2016-09-30', '英镑')

社交网络爬虫

豆瓣爬虫

豆瓣爬虫使用了豆瓣的API，但是貌似获取的频次有限制，如果有豆瓣的API Key的希望能贡献一下。目前仅仅支持书籍和电影。 API格式如下(以获取书籍的JSON结构体为例)：

from bdata.social_network.douban import get_book_json
get_book_json(ID)

请直接查看文件的注释，函数命名的格式是：

get_[movie/book]_[json/info](id)

其它爬虫

DNC邮件泄露事件

这个文件可以下载2016年美国**党邮件服务器泄露事件所泄露出的所有邮件，大约有两万多封，但是由于服务器比较特殊，需要翻墙才能下载。其中包含get_mail_data和save_mail两个接口，一个仅仅读取为字符串，另一个仅仅保存到本地。

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

barneywang / dataspider Goto Github PK