Giter VIP home page Giter VIP logo

db_utils's Introduction

db_utils library

Introduction

db-utils is a Python package that standardizes interactions with various types of databases. db-utils is a collection of modules that lowers the bar to viewing, extracting, and analyzing data from various sources including:

  • Redshift
  • Snowflake
  • Postgres
  • Mysql
  • Sqlserver
  • sqlite
  • s3
  • dynamoDB

Docker

One line command to start a Docker container with db-utils installed. This will run on any system that is running Docker. A jupyter notebook will open up on port 8888, just copy and paste the url from the terminal into your preferred browser.

docker run -p 8888:8888 hannaj06/db-utils:latest

docker-compose

Below is an example docker-compose.yaml file. With this configuration jupyter notebook changes persist in the location defined by <local_notebooks>. Below is a sample of the .databases.conf file which will be bindmounted to the docker image.

docker-compose.yaml

version: "2.1"
services:
  dbutils_jupyter:
    image: hannaj06/db-utils
    ports:
      - 8888:8888
    volumes:
      - ${HOME}/.databases.conf:/root/.databases.conf
      - <local_notebooks>:/notebooks

.databases.conf

[redshift]
host=<redshift_host>
user=<user>
password=<redshift_password>
port=<port>
database=<db>

[s3]
aws_access_key_id=<access_id>
aws_secret_access_key=<secret_access>
region=<aws_region>
default_bucket=<default_bucket>

Installation on local environment

  • sudo apt-get update

Required system packages:

  • sudo apt-get install python3-dev (Ubuntu)
  • sudo apt-get apt-get install g++ (Ubuntu)
  • sudo apt-get install libpq-dev (Ubuntu)
  • sudo apt-get install unixodbc-dev (Ubuntu)
  • brew install postgresql (MacOS)
pip install db_utils

pg_connect class (previously DBUtil)

A database connection class to interact with Postgres or Redshift

Basic Usage:

  • create database configuration file
  • example below is called .databases.conf
    [redshift_example]
    host=redshift.example.com
    user=test_user
    password=password
    port=5439
    database=test_db

    >>> from db_utils.pg_connect import pg_connect
    >>>
    >>> db = pg_connect('redshift_example', '.databases.conf')
    >>> db.get_arr_from_query('select * from test', pprint=True)

snowflake_connect class

A database connection class to interact with snowflake

Basic Usage:

  • create database configuration file
  • example below is called .databases.conf
    [snowflake]
    account=abc123.us-east-1
    host=abc123.us-east-1.snowflakecomputing.com
    user=test_user
    password=password
    port=443
    database=test_db
    aws_access_key_id=<key_id>
    aws_secret_access_key=<secret_key>

snowflake_s3 class

A child class of snowflake_connect class used to retrieve large datasets in small chunks

Basic Usage:

  • create database configuration file
  • example below is called .databases.conf
  • note the additional fields required
    [snowflake]
    account=abc123.us-east-1
    host=abc123.us-east-1.snowflakecomputing.com
    user=test_user
    password=password
    port=443
    database=test_db
    aws_access_key_id=<key_id>
    aws_secret_access_key=<secret_key>
    default_bucket=

example) Loading large data set into memory in chunks

    >>> from db_utils.snowflake_connect import snowflake_s3
    >>> import os
    >>>
    >>> file_format = '''
    TYPE = CSV
    COMPRESSION = NONE
    '''
    >>>
    >>>
    >>> with snowflake_s3('snowflake', '.databases.conf') as db:
    >>>     db.cursor('SELECT * FROM example_large_table', file_format=file_format, pprint=True)
    >>>
    >>>     while True:
    >>>         file = db.fetch(contents=True)
    >>>
    >>>         if file:
    >>>             for row in file:
    >>>                 print(row)
    >>>
    >>>         else:
    >>>             break


sqlite_connect class

A database connection class to interact with SQLite


    >>> from db_utils.sqlite_connect import sqlite_connect
    >>>
    >>> db = sqlite_connect('test.db')
    >>> db.get_df_from_query('select * from test_table', pprint=True)

s3_connect class

Connection library for interacting with S3

Basic Usage:

  • add s3 section to .databases.conf file (created in previous example)
    [s3]
        aws_access_key_id=<key_id>
        aws_secret_access_key=<secret_key>
        default_bucket=<bucket>


    >>> from db_utils.s3_connect import s3_connect
    >>>
    >>> s3 = s3_connect('.databases.conf', 's3')
    >>> s3.list_keys(prefix='examples')

example) grab file from s3 into memory as stringIO object

    >>> from db_utils.s3_connect import s3_connect
    >>>
    >>> s3 = s3_connect('.databases.conf', 's3')
    >>> s3.get_contents('example_file', stringIO=True)
    >>> s3.read()

sql_server connect class

Requirements:

Basic Usage:

  • add sql server section to .databases.conf file (created in previous example)
    [sql_server]
    driver=ODBC Driver 17 for SQL Server
    server=127.0.0.1
    user=bill
    password=gates
    database=master

    >>> from db_utils.sql_server_connect import sql_server_connect
    >>> db = sql_server_connect('sql_server', 'databases.conf')
    >>>
    >>> db.get_arr_from_query('''SELECT * FROM SYSOBJECTS''', pprint=True))

dynamodb_connect class

Connection library for interacting with Dynamodb

timer class

Helper class to time long running processes

Basic Usage:

from db_utils.timer import timer

t = timer()
t.lap('s')
> 5.469961

db_utils's People

Contributors

hannaj06 avatar ptjames avatar briankinney avatar luseleafpaper avatar

Stargazers

 avatar

Watchers

James Cloos avatar Cloud Cray avatar Cameron French avatar  avatar  avatar  avatar  avatar

Forkers

hannaj06

db_utils's Issues

Error on install

When I try to install db_utils with pip install db_utils I have this error :

 error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [25 lines of output]
      /usr/lib/python3/dist-packages/setuptools/config/setupcfg.py:515: SetuptoolsDeprecationWarning: The license_file parameter is deprecated, use license_files instead.
        warnings.warn(msg, warning_class)
      running egg_info
      creating /tmp/pip-pip-egg-info-ydtruado/psycopg2.egg-info
      writing /tmp/pip-pip-egg-info-ydtruado/psycopg2.egg-info/PKG-INFO
      writing dependency_links to /tmp/pip-pip-egg-info-ydtruado/psycopg2.egg-info/dependency_links.txt
      writing top-level names to /tmp/pip-pip-egg-info-ydtruado/psycopg2.egg-info/top_level.txt
      writing manifest file '/tmp/pip-pip-egg-info-ydtruado/psycopg2.egg-info/SOURCES.txt'
      
      Error: pg_config executable not found.
      
      pg_config is required to build psycopg2 from source.  Please add the directory
      containing pg_config to the $PATH or specify the full executable path with the
      option:
      
          python setup.py build_ext --pg-config /path/to/pg_config build ...
      
      or with the pg_config option in 'setup.cfg'.
      
      If you prefer to avoid building psycopg2 from source, please install the PyPI
      'psycopg2-binary' package instead.
      
      For further information please check the 'doc/src/install.rst' file (also at
      <https://www.psycopg.org/docs/install.html>).
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.

My Operating system : ubuntu
My pip version : 23.0.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.