Data Analysis of Lagou Job
This repository holds the code for job data analysis of Lagou. The main functions included are listed as follows:
- Crawling job data from Lagou, and get the latest information of jobs about Internet.
- Data analysis and visualization.
- Crawling job details info and generate word cloud as Job Impression.
- In order to train a NLP task with machine learning, the data of interviewee's comments will be stored in mongodb
-
Install 3rd party libraries
sudo pip3 install -r requirements.txt
-
[optional] Install mongodb and start mongodb service
sudo service mongod start
- clone this project from github.
- run m_lagou_spider.py to crawl job data, it will generate a collection of Excel files in
./data
directory. - run hot_words_generator.py to cut sentences, it will return TOP-30 hot words and wordcloud figure.