Giter VIP home page Giter VIP logo

spider's Introduction

URL:http://weixin.sogou.com/
Technology: PyQuery, MongoDB, requests
Statement: Use sogou.com search for weixin articles that from official account. Use IP ProxyPool to deal the limit ip system. spider all articles of a keyword and save some information to MongoDB


URL: http://www.budejie.com/
Technology: md5, MongoDB, requests, BeautifulSoup
Statement: spider the small video from budejie and save some information to MongoDB


URL: http://www.toutiao.com/search_content/
Technology: MongoDB, requests, BeautifulSoup, multiprocessing, md5
Statement: spider the images group or images article. save the images to MongoDB and save the mp4 video to local


URL: https://www.qcloud.com/community/all/

Technology: requests, BeautifulSoup, MongoDB, multiprocessng, jieba, wordcloud, matplotlib

Statement: use crawler spider all of articles from qcloud/community,

use 'jieba' split word system spilt all articles and made work counter

then use 'matplotlib' make a word cloud, can see the main topic for qcloud/community

save all articles to MongoDB


URL: http://www.17k.com/list/

Technology: requests

Statement: use crawler spider the novels, and save the novel to 'html' type


URL: https://www.taobao.com

Technology: MongoDB, selenium, PyQuery

Statement: Use crawler spider all of the goods information that the goods from taobao search the keyword

Use Selenium simulate the browser(Chrome) operator, then get page source code, Use PyQuery parse the html code and extract goods information and save those information to MongoDB database


URL: http://online.zhihuishu.com/CreateCourse/learning/videoList?courseId=

Technology: Selenium, PyQuery

Statement: Use Selenium simulate the browser(Chrome) operator, to achieve use program automatic see the course.

Progress deal the tip window, close them and judge the course over time.


URL: http://172.21.160.114:8080/portal/templatePage/20160629121614757/login_custom.jsp?userip=172.18.13.152&userurl=http://172.18.13.1 (The URL is dynamic, so need connection the school WiFi(STBU-EDU), get the IP and PORT)

Technology: Selenium

Statement: Use Selenium simulate the browser(Chrome) operator to achieve the login, is force crack, use all of student number try it (like 2015201001) and the password is 8888

URL: https://book.douban.com/latest

Technology: re, requests

Statement: use requests get the douban latest book page's source code, and use regex, extract the new book information.

URL: http://maoyan.com/board/4?offset=

Technology: re, requests

Statement: use requests get the maoyan movie leaderboard, and use regex extract the movies inforamtion

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.