Giter VIP home page Giter VIP logo

python-webcrawler's Introduction

Python-WebCrawler

This is a web crawler written in Python. You will have to install BeautifulSoup before you can use it.

Install BeautifulSoup

The steps are given below:

  1. Go to this site: http://pypi.python.org/pypi/beautifulsoup4

  2. Download the file "beautifulsoup4-4.1.3.tar.gz"

  3. Unpack the file into a comfortable location

  4. Open terminal and go to the unpacked folder

  5. Execute the following commands:

    python setup.py build

    python setup.py install

  6. If the install is successful, you will not see any errors on the terminal.

Running the crawler

Download the crawler.py file from the repo. This file is used to crawl a given site. I have listed a few use cases below:

The following command will display the total number of links found on a particular website after crawling:

python crawler.py http://website.com

If you want to crawl only upto a particular depth, then:

python crawler.py -d 2 http://website.com

If you want the links which are only found on this particular url:

python crawler.py -l http://website.com

There are many other options you can explore. Execute the following in the terminal and you will see a bunch of options:

python crawler.py --help

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.