Giter VIP home page Giter VIP logo

corrpy's Introduction

Build Status

CorrPy

Latest Update Date: 2019 Feb.

Overview

This package is developed to help users calculate correlation coefficients and covariance matrix of a given data with missing values. In order to implement correlation coefficients and covariance matrix, the standard deviation of the data is needed however the world of data is not always clean and tidy. Python's numpy fails to return standard deviation and calculation of the correlation coefficients when the data has missing values. This package aims to overcome this obstacle and help users handle missing values when calculating correlation coefficients and covariance matrix. CorrPy uses likewise deletion method to handle missing values: removing the rows of a data frame where the missing values are present.

Note: If the course timeline permits, CorrPy will handle missing values via single manipulation with mean value: replacing the missing values with the mean of existing values.

Team

Name Slack Handle Github.com Link
KERA YUCEL @KERA YUCEL @K3ra-y Kera's link
GOPALAKRISHNAN ANDIVEL @Krish @Gopsathvik Krish's link
WEISHUN DENG @Wilson Deng @xiaoweideng Wilson's link
Mengda Yu @Mengda(Albert) Yu @mru4913 Albert's link

Installation

CorrPy can be installed with pip in a command window:

pip install git+https://github.com/UBC-MDS/CorrPy.git

Branch Coverage Test

To test branch coverage, we use coverage.py. You can install by pip install coverage.

We also create a Makefile to automate the process. You can try the following to observe branch coverage.

make report_branch

The results are shown below.

Name                            Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------------------------
CorrPy/__init__.py                  4      0      0      0   100%
CorrPy/corr_plus.py                26      0     12      0   100%
CorrPy/cov_mx.py                   20      0      8      0   100%
CorrPy/std_plus.py                 15      0      8      0   100%
CorrPy/test/__init__.py             0      0      0      0   100%
CorrPy/test/test_corr_plus.py      41      0      0      0   100%
CorrPy/test/test_cov_mx.py         45      0      0      0   100%
CorrPy/test/test_std_plus.py       35      0      0      0   100%
---------------------------------------------------------------------------

Test

To test all the files, we use pytest by make test_all.

The results are shown below.

Functions

Standard Deviation (std_plus)

Standard deviation calculates how close the data points to the mean, in which an insight for the variation of the data points. This function would automatically handle the missing values in the input.



std_plus will omit frustration from workflows.


Example:

>>> import CorrPy
>>> x = [1,2, np.nan, 4, np.nan, 6]
>>> std_plus(x)
array([1.920286436967152])

>>> y = [1,2, np.inf, 4, np.nan, 6, "a"]
>>> np.std_plus(y)
array([1.920286436967152])

Correlation Coefficients (corr_plus)

Correlation coefficients calculates the relationship between two variables as well as the magnitude of this relationship. This function would automatically handle the missing values in the input.


Example:

>>> import CorrPy
>>> x = [1,2,np.nan,4,5]
>>> y = [-6,-7,-8,9,True]
>>> corr_plus(x,y)
array([0.7391090892601785])

Covariance Matrix (cov_mx)

A Covariance matrix displays the variance and covariance together. This function would use the above two functions.



A covariance matrix displays the variance and covariance together. The diagonal elements represent the variances and the covariances are represented by the other elements in the matrix shown below.


Example:

>>> import CorrPy
>>> x = [1,2,np.nan,4,5]
>>> y = [-6,-7,-8,9,True]
>>> cov_mx([x,y])
array([[ 2.33333333, 12.66666667],
       [12.66666667, 80.33333333]])

How does CorrPy package fits into the Python ecosystem?

Following functions are already present in Python ecosystem. However, missing values are not being handles for the following functions and CorrPy package will implement calculation of standard deviation, correlation coefficients and covariance matrix.

Python Standard Deviation: https://docs.scipy.org/doc/numpy-1.14.2/reference/generated/numpy.std.html

Python Correlation Coefficients: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.corrcoef.html

Python Covariance Matrix: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.cov.html

Milestone Progress

Milestone Tasks
Milestone 1 Proposal
Milestone 2 Function Code
Test Code

corrpy's People

Contributors

gopsathvik avatar k3ra-y avatar mru4913 avatar xiaoweideng avatar

Stargazers

 avatar

Watchers

 avatar  avatar

corrpy's Issues

Milestone 2 - Python functions (high priority)

The following summaries expectations for this milestone:

  • 3 Python function (by Feb 12th.)
  • std_plus (Albert)
  • corr_plus (Wilson)
  • cov_mx
    (While doing this, if you think of new test cases you are not required to write the new tests but instead create an issue for them which you can complete during milestone 3.)

Following GitHub Flow workflow

Starting from this milestone, you must follow the GitHub Flow workflow. Each team member must do at least one review and each member must have some part of their code reviewed by other team members.

  • Kera
  • Krish
  • Wilson
  • Albert

In particular, each team member will

  • create a branch
  • work on the function you are responsible for in this branch
  • add commits
  • open a pull request (you should be at this step by the end of the lab on Feb 12th)
  • wait for the code review and feedback from other team member
  • review code of other team member via their pull request
  • deploy your changes
  • merge if your branch is not causing any problems

Just some notes:

  1. In order to improve our coding format, it is better to limit to 120 characters per line. For further information, please read https://stackoverflow.com/questions/88942/why-does-pep-8-specify-a-maximum-line-length-of-79-characters
  2. To toggle the warning line in Studio. https://support.rstudio.com/hc/en-us/community/posts/207625357-Toggle-80-character-warning-line
  3. For atom. https://stackoverflow.com/questions/49616864/limiting-line-length-in-atom
  4. For VS code. https://stackoverflow.com/questions/29968499/vertical-rulers-in-visual-studio-code

Milestone 3 - Documentation in CorrPy

  • Documentation
  • Py package README (including usage and installation instructions)
    pip install git+PACKAGE_URL.git
    devtools::install_github("PACKAGE_NAME") for R packages
  • Code documentation
  • ensure that the test suite for each function provides 100% branch coverage (if possible). Please include documentation (e.g. as we did on the board in class) that shows you've checked for this.
  • functions
  • tests

Milestone 02 feedback

Hello,

  • I see that you are on track and did everything required in Milestone 02.
  • The package setup and tests run without any problems
  • You have efficiently used github workflow efficiently including branching, issue tracker,...
  • I suggest you relocate your tests folder to the main directory to comply with the conventions

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.