Giter VIP home page Giter VIP logo

spider's Introduction

Python Web Crawler

douban movie

Scraping douban movie web site with an indicated region to collect and extract the information of each film which is on now, and then sort them by descending order of the film score.
To execute:

set the url in main_2.py (ex. url="http://movie.douban.com/nowplaying/beijing/")
./main_2.py

You need to install the libraries urllib2, bs4, and chardet if you don't have.

dytt8 movie

Scraping dytt8 movie web site with an indicated region to collect and extract the information of each film which is on now, and then sort them by descending order of the film score.
To execute:

set the url in main_2.py (ex. url="http://www.dytt8.net/html/gndy/dyzz/index.html")
./main_2.py

You need to install the libraries urllib2, bs4, and chardet if you don't have.

douban movie

豆瓣电影中某地区正在上映的电影爬下来,并按得分高低顺序排列,执行:

在main_2.py中设置url(ex. url="http://movie.douban.com/nowplaying/beijing/")
./main_2.py

main_2.py与spider_2.py基于Python 2.x, spider_3.py基于Python 3.x(待更新) 所用到的库:urllib2, bs4, chardet. 没有请自行下载

dytt8 movie

电影天堂上的最新电影及其下载地址爬下来,执行:

在main_2.py中设置url(ex. url="http://www.dytt8.net/html/gndy/dyzz/index.html")
./main_2.py

main_2.py与spider_2.py基于Python 2.x, spider_3.py基于Python 3.x(待更新) 所用到的库:urllib2, bs4, chardet. 没有请自行下载

两者返回的形式均为[列表]=[{字典},{字典},{字典}...]

##Remark 电影天堂用了一串js来反爬虫,所以不得已用正则把js的函数挑出来再用python处理.

微信公众号能在搜狗上搜索了,所以爬下来也就不是一个难事了,这里抓取的是公众号碉堡的图片链接. 处理好的demo放在coding上了.

##TODO Python 3.x下的douban_movie与dytt8_movie

spider's People

Contributors

omengye avatar pnpie avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.