Giter VIP home page Giter VIP logo

crunchbasewrapper's Introduction

Project Name

CrunchbaseWrapper

Description

CrunchbaseWrapper provides functionality to:

  • scrape data from crunchbase.com using the requests library of Python along with the crunchbase api
  • store the data into an SQLite database using the SQLAlchemy library which provides ORM functionality to Python
  • store the data into a JSON file using the json library of Python
  • read data from the SQLite database filtered on unique company name
  • read data from the JSON file filtered on unique company name

Code has been modularized into functions and written in OOP manner with usage of classes.

There are five files:

  • crunchbase.py is the main file containing the Crunchbase class which provides an interface to scrape data and read-write into a database / JSON file. Regex are used to extract information from the response.text string of requests.
  • database_orm.py contains the ORMInterface class which provides functionality to read and write data in an SQLite database using SQLAlchemy ORM. Additional classes Company and SocialAccount help define the two tables of the database namely, companies and social_accounts which are linked using a foreign key.
  • database_json.py contains the JsonDB class which provides functionality to read and write data in a JSON file using the json library.
  • simple_scraper.py contains the Scraper class which uses the requests library to scrape data from a given URL.
  • re_demo.py is an additional separate file for experimental purposes to show the functionality of the StanfordCoreNLP parser in sentence splitting and tokenization. This is used to obtain the first five sentences of a text body.

Installation

A setup.py file is present which installs the required dependencies automatically on running: python3 setup.py develop

Usage

To test the project:

  • run python3 crunchbase.py which will provide a menu driven interface
  • choose option 1 to get data from crunchbase and write it into an SQLite database and JSON file
  • choose option 3 to retrieve the data based on a company name (ex: apple) which is also entered
  • choose option 1 again to show that data is not added if it already exists
  • choose option 4 to exit

On exit, crunchbase_json.db and crunchbase_data.json would have been created.

crunchbasewrapper's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.