This project demonstrates the well-known PageRank and HITS algorithms.
The project depends on JSoup and GraphStream jars.
Bare-bones version of Breadth-First Search Crawler is implemented for crawling the web.
assign seed urls to crawl from.
specify a .txt file containing keywords to filter which URLs to put in Frontier.
specify number of threads that the crawlers will be assigned and number of iterations for crawling.
The graph of (incoming, outgoing links) the Web is stored as HashMap<WebURL, HashMap<WebUrls, Integer>> internally (in a concurrent fashion.). PageRank and HITS algorihtms can be applied on this graphs. Graphs can be exported and imported later. It is possible to visualize a tiny part of the Web that contributes the most by specifying the number of nodes to be drawn based on their ranks.