URL:http://weixin.sogou.com/
Technology: PyQuery, MongoDB, requests
Statement: Use sogou.com search for weixin articles that from official account.
Use IP ProxyPool to deal the limit ip system.
spider all articles of a keyword
and save some information to MongoDB
URL: http://www.budejie.com/
Technology: md5, MongoDB, requests, BeautifulSoup
Statement: spider the small video from budejie and save some information to MongoDB
URL: http://www.toutiao.com/search_content/
Technology: MongoDB, requests, BeautifulSoup, multiprocessing, md5
Statement: spider the images group or images article. save the images to MongoDB and save the mp4 video to local
URL: https://www.qcloud.com/community/all/
Technology: requests, BeautifulSoup, MongoDB, multiprocessng, jieba, wordcloud, matplotlib
Statement: use crawler spider all of articles from qcloud/community,
use 'jieba' split word system spilt all articles and made work counter
then use 'matplotlib' make a word cloud, can see the main topic for qcloud/community
save all articles to MongoDB
Technology: requests
Statement: use crawler spider the novels, and save the novel to 'html' type
Technology: MongoDB, selenium, PyQuery
Statement: Use crawler spider all of the goods information that the goods from taobao search the keyword
Use Selenium simulate the browser(Chrome) operator, then get page source code, Use PyQuery parse the html code and extract goods information and save those information to MongoDB database
URL: http://online.zhihuishu.com/CreateCourse/learning/videoList?courseId=
Technology: Selenium, PyQuery
Statement: Use Selenium simulate the browser(Chrome) operator, to achieve use program automatic see the course.
Progress deal the tip window, close them and judge the course over time.
URL: http://172.21.160.114:8080/portal/templatePage/20160629121614757/login_custom.jsp?userip=172.18.13.152&userurl=http://172.18.13.1 (The URL is dynamic, so need connection the school WiFi(STBU-EDU), get the IP and PORT)
Technology: Selenium
Statement: Use Selenium simulate the browser(Chrome) operator to achieve the login, is force crack, use all of student number try it (like 2015201001) and the password is 8888
URL: https://book.douban.com/latest
Technology: re, requests
Statement: use requests get the douban latest book page's source code, and use regex, extract the new book information.
URL: http://maoyan.com/board/4?offset=
Technology: re, requests
Statement: use requests get the maoyan movie leaderboard, and use regex extract the movies inforamtion