Giter VIP home page Giter VIP logo

aridanalysis_py's Introduction

aridanalysis

DRY out your regression analysis!

build codecov Release Documentation Status

Python Package for Inferential Regression and EDA Analysis!

As Data Scientists, being able to perform Exploratory Data Analysis as well as Regression Analysis are paramount to the process of analyzing trends in data. Moreover, following the DRY (Do Not Repeat Yourself) principle is regarded as a majority priority for maximizing code quality. Yet, often times Data Scientists facing these tasks will start the entire process from scratch, wasting both time and effort while compromising code quality. The aridanalysis package strives to remedy this problem by giving users an easy-to-implement EDA function alongside 3 robust statistical tests that will simplify these analytical processes and produce an easy to read interpretation of the input data. Users will no longer have to write many lines of code to explore their data effectively.

Package Functions

arid_eda

This function takes in the data frame of interest and generates summary statistics as well as basic exploratory data analysis plots to helps users understand the overall behaviour of the explanatory and response variables.

arid_linreg

This function takes in the data frame of interest and performs a regular linear regression with the given regularization and features. The function then outputs an sklearn regression model for prediction and an equivalent statsmodel regression model to provide inference.

arid_logreg

This function takes in a data frame and performs either binomial or multinomial classification based on user inputs. The function then outputs an sklearn logistic regression model for prediction and an equivalent statsmodel logit regression model to provide inference.

arid_countreg

This function takes a dataframe, its categorical and continuous variables and other user inputs to perform a Poisson regression. The function will return a sklearn Poisson regressor model for prediction and a wrapper statsmodel for inference purposes.

Usage

import aridanalysis as aa
from vega_datasets import data
>>> dataframe, plots = aa.arid_eda(house_prices,
                                    'price',
                                    'continuous,
                                    ['rooms', 'age','garage'])
>>> dataframe, plots = aa.arid_eda(iris_data,
                                    'species',
                                    categorical,
                                    ['petalWidth', 'sepalWidth','petalLength'])
tdf = pd.DataFrame(
    {
         "x1": [1, 0, 0],
         "x2": [0, 1.0, 0],
         "x3": [0, 0, 1],
         "x4": ["a", "a", "b"],
         "y": [1, 3, -1.0],
    }
)
>>> aa.arid_linreg(tdf, y)

data = [
    [32, "male", 80, 0],
    [26, "female", 65, 1],
    [22, "female", 75, 1],
    [36, "male", 85, 0],
    [45, "male", 82, 1],
    [18, "female", 57, 0],
    [57, "male", 60, 1],
]

df = pd.DataFrame(
    data, 
    columns=[
        "x1", 
        "x2", 
        "x3", 
        "y"
    ]
)
>>> aa.arid_logreg(df, y)

df = pd.DataFrame(
    {
        "x1": ["bad", "good", "bad"],
        "x2": [34.56, 34. 21, 19.57],
        "y": [6,8,14,],
    }
)
>>> aa.arid_countreg(df, y, con_features=[x2], cat_features=[x1], model="additive", alpha=1)

Python Ecosystem Role

This package will build off the EDA and statistical analysis provided by the Pandas, SKLearn and Statsmodels Python packages to streamline data visualization and model analysis functionality. There are some existing packages that help you with this, however the aridanalysis package aims to ease the job of going through pandas profiling as well as providing different regression analysis interpretations.

Related Packages

  • Edapython: This package is similar to Pandas profiling without creating an HTML report as an output. Our package aims to gather the best of Pandas profiling with missing values analysis and most important visualization including a correlation heatmap.
  • regression (PyPI): This package is a web app for loading tabular data to perform regression analysis. It differs from our package in that it only performs the regression modelling without any analysis or EDA.
  • mlinsights (PyPI): This package is an extension to SKLearn and implements a number of specialized models such as quantile regression. Unlike our package, it does not combine any EDA or analysis, and is meant to simply mimic the SKLearn environment while adding additional modelling features.

Installation

$ pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple aridanalysis

Dependencies

  • python = "^3.7"
  • pandas = "^1.2.2"
  • scikit-learn = "^0.24.1"
  • altair = "^4.1.0"
  • seaborn = "^0.11.1"
  • statsmodels = "^0.12.2"
  • vega-datasets = "^0.9.0"
  • pytest = "^6.2.2"

Documentation

The official documentation is hosted on Read the Docs: https://aridanalysis.readthedocs.io/en/latest/

Contributors

Group 8 Members:
Craig McLaughlin : @cmmclaug
Daniel Ortiz Nunez : @danielon-5
Neel Phaterpekar : @nphaterp
Santiago Rugeles Schoonewolff : @ansarusc

We welcome and recognize all contributions. You can see a list of all current contributors in the contributors tab.

Credits

This package was created with Cookiecutter and the UBC-MDS/cookiecutter-ubc-mds project template, modified from the pyOpenSci/cookiecutter-pyopensci project template and the audreyr/cookiecutter-pypackage.

aridanalysis_py's People

Contributors

actions-user avatar ansarusc avatar cmmclaug avatar danielon-5 avatar nphaterp avatar

Stargazers

 avatar

Watchers

 avatar

aridanalysis_py's Issues

Python arid_eda() Exception Handling Verification

Suggest we assign a person that did not write the function (Neel) to validate:

  • Exception handling
    • Handles incorrect inputs
    • Handles errors during execution via throwing exceptions
    • Error messages are clear and informative
    • Exceptions are covered by tests

Milestone 2 Tasks

https://pages.github.ubc.ca/MDS-2020-21/DSCI_524_collab-sw-dev_students/materials/assignments/milestone2.html

Submission

  • Create new Python release named v0.2.0
  • Create new R release named v0.1.0
  • Canvas submission (including repo url of public repo for R repo and Python Repo, url of release for R and Python)

Checklist

  • 1. Write test cases and code iteratively
    • arid_eda function
      • Revise function
      • Write test function
      • Write function
      • Handles exceptions with suitable error messages
    • arid_linreg function
      • Revise function
      • Write test function
      • Write function
      • Handles exceptions with suitable error messages
    • arid_logreg function
      • Revise function
      • Write test function
      • Write function
      • Handles exceptions with suitable error messages
    • arid_countreg function
      • Revise function
      • Write test function
      • Write function
      • Handles exceptions with suitable error messages
  • 2. Make sure all functions pass pytest package
  • 3. Exception handling
    • Erroneous Inputs are handled for all functions
    • Exceptions are tested and provide useful messages
  • 4. Verify package functions can be installed and used
  • 5. Create project structure for R repo
    • R project structure
    • name related to package
    • MIT license
    • DESCRIPTION file.
    • CONTRIBUTING.md
    • usethis::use_code_of_conduct
    • copy README.md containing:
      • summary paragraph
      • bulleted list of functions
      • package fitting in R ecosystem
  • 6. R Function specifications
    • Neel Function
    • Craig Function
    • Daniel Function
    • Santiago Function
  • 7. R Project GitHub Mechanics
    • Always add meaningful commit messages
    • Use GitHub Flow workflow
    • Manage GitHub issues for communication
      • Copy this task list or a subset over to R project
    • Verify use of proper grammar and full sentences throughout project.

Complete Python arid_countreg() Function

Finish the Python package arid_countreg function requirements

  • arid_countreg function
    • Revise function specifications if necessary
    • Write test_arid_countreg function before writing main function that covers all code branches
    • Write function to perform the stated task
    • Handles exceptions and erroneous inputs with suitable error messages

Python arid_countreg() Exception Handling Verification

Suggest we assign a person that did not write the function (Santiago) to validate:

  • Exception handling
    • Handles incorrect inputs
    • Handles errors during execution via throwing exceptions
    • Error messages are clear and informative
    • Exceptions are covered by tests

Milestone 1 meeting notes

Tuesday feb 23rd 5pm

  • Introduced ourselves
  • Went through Milestone 1 document
  • Discussed the team contract
  • Set up provisional github group repo: https://github.com/UBC-MDS/DSCI524_Group8
  • Discussed possible topics and chose on possible regression and EDA package
  • Agreed on following things to do before next meeting:
    • Milestone 1 checklist and meeting notes: Daniel
    • Topic proposal draft: Santiago
    • Code of conduct: Craig
    • Contributing: Neel

Next meeting: Thursday February 25th 2pm PT during lab

Milestone 2 Meeting Minutes

March 1st 2021 Milestone 2 Kickoff:

  • Make sure Python package is usable by end of milestone
  • Revise functions first
  • Write tests before functions
  • Iterate functions/tests
  • Neel/Daniel to set up R repo
  • Wait until lecture before assigning all issues
  • Aim to have Python test and function beta for Thursday
  • Create the R repo structure first before working on R tasks
  • We should continue to have early group kickoff meetings
  • Update issues with notes when creating PRs

Complete Python arid_linreg() Function

Finish the Python package arid_linreg function requirements

  • arid_linreg function
    • Revise function specifications if necessary
    • Write test_arid_linreg function before writing main function that covers all code branches
    • Write function to perform the stated task
    • Handles exceptions and erroneous inputs with suitable error messages

Package Review: ReadTheDocs Link Appears to be Broken

Package Review Comment:

I don't see a docs when I click the docs badge into the read the Docs website

Something happened to our readthedocs links. I can't see the docs from the badge or from the links anymore?

Complete Python arid_logreg() Function

Finish the Python package arid_logreg function requirements

  • arid_logreg function
    • Revise function specifications if necessary
    • Write test_arid_logreg function before writing main function that covers all code branches
    • Write function to perform the stated task
    • Handles exceptions and erroneous inputs with suitable error messages

Complete Python arid_eda() Function

Finish the Python package arid_eda function requirements

  • arid_eda function
    • Revise function specifications if necessary
    • Write test_arid_eda function before writing main function that covers all code branches
    • Write function to perform the stated task
    • Handles exceptions and erroneous inputs with suitable error messages

Python arid_logreg() Exception Handling Verification

Suggest we assign a person that did not write the function (Daniel) to validate:

  • Exception handling
    • Handles incorrect inputs
    • Handles errors during execution via throwing exceptions
    • Error messages are clear and informative
    • Exceptions are covered by tests

Python arid_linreg() Exception Handling Verification

Suggest we assign a person that did not write the function (Craig) to validate:

  • Exception handling
    • Handles incorrect inputs
    • Handles errors during execution via throwing exceptions
    • Error messages are clear and informative
    • Exceptions are covered by tests

Milestone 1 tasks

https://pages.github.ubc.ca/MDS-2020-21/DSCI_524_collab-sw-dev_students/materials/assignments/milestone1.html

Submission

  • Create new release named v0.1.0
  • Canvas submission (including repo url of public repo, url of release and url of team contract)

Checklist

  • 1. Team work contract: https://docs.google.com/document/d/1Mj09ipWSXNT89dr6_NltGJKo0Yzihm4130N_78ANriE/edit#
  • 2. Pick a topic
  • 3. Create project structure for Python:
    • project structure
    • name related to package
    • MIT license
    • update CONTRIBUTORS.md (should be moved to README)
    • edit CONDUCT.md
    • agree and edit CONTRIBUTING.md
    • build README.md containing:
      • summary paragraph
      • bulleted list of functions
      • package fitting in Python ecosystem
  • 4. Function specifications:
    • Neel Function
    • Craig Function
    • Daniel Function
    • Santiago Function
  • 5. Manage issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.