xichen-de / parallelwebcrawler Goto Github PK
View Code? Open in Web Editor NEWThis is a multi-threaded web crawler that can be used to crawl a website and extract all the links on the website. Then it analyzes the links until it reaches the depth limit. Finally, it calculates the most popular words on the website and saves the result in a file.
License: MIT License