Giter VIP home page Giter VIP logo

pyscrapper's Introduction

PyScrapper

Project Status: WIP โ€“ Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. Packagist Twitter URL Twitter Follow

WIP DISCLAIMER

Some of the projects inside this repo are broken due to updates on the websites used, so they are being reworked to be fully functional. Contributions are welcome. Just fork the repo and pull request your updates.

Web Scrapping series in python.

Forked and mantained by Ivan Nieto [email protected]

Original work by Shivam Bansal [email protected]

Module dependencies:

mechanize, BeautifulSoup (for Python 2.x) | bs4 (for Python 3.x), json, re, requests, urlparse, urllib

    pip install <module_name>

Projects

Google Movies

    Script to scrap google movies, retrieving a list of theaters, their address, movies list, 
    movies genere and showtimes for a given location. 
         
    This script outputs a JSON file with the response. 

Zomato Top Restaurants

    Script to scrap the top 25 trending restaurants with their rank, rating, details... 
    for the mentioned cities on the zomato.com website.
    
    It outputs a separate JSON response for each city.

Finance and Stock

    Scrapping the last closing price for all the quotes from various sites 
    like google, yahoo, bloomberg etc

Live Weather

    Scrap the weather details for morning, afternoon and night time for a particular website.

Daily Horoscope

    Scrapping the daily horoscope details for each sign and creating the output as text files. 
    Multiple websites are scrapped to get the details.

Train Details

    Scrap the details of train from irctc by inputting train number.

Website Top Keywords

    Create a list of most occured words in a website.
    Also counts thier frequency.

News Scrapping

    Scrap the news from various news sources.

Alexa Top Websites

    Get the list of top 25 websites of a country.

Movie Details

    Get the movie details from IMDB and RottenTomatoes.

US President State of Union Speech

    Scrap the speech transcripts of all Us Presidents from 1700 to Present.

Spider Algorithm

    Spider algorithm is a typical web scrapping technique to fetch all urls (etc) of a webpage.
    By all means, even those urls which are not part of the requested page. 
    It fetches all urls of current urls as well.
    Implemented using two ways, one normal and second using mechanize.

Rework ToDo

  • Google Movies
  • Zomato Top Restaurants
  • Finance and Stock
  • Live Weather
  • Daily Horoscope
  • Train Details
  • Website Top Keywords
  • News Scrapping
  • Alexa Top Websites
  • Movie Details
  • US President State of Union Speech
  • Spider Algorithm

pyscrapper's People

Contributors

shivam5992 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.