Giter VIP home page Giter VIP logo

copulas's Introduction

“sdv-dev” An open source project from Data to AI Lab at MIT.

Development Status PyPi Shield Travis CI Shield Coverage Status Downloads

Copulas

Overview

Copulas is a Python library for modeling multivariate distributions and sampling from them using copula functions. Given a table containing numerical data, we can use Copulas to learn the distribution and later on generate new synthetic rows following the same statistical properties.

Some of the features provided by this library include:

  • A variety of distributions for modeling univariate data.
  • Multiple Archimedean copulas for modeling bivariate data.
  • Gaussian and Vine copulas for modeling multivariate data.
  • Automatic selection of univariate distributions and bivariate copulas.

Supported Distributions

Univariate

  • Gaussian
  • Beta
  • Gamma
  • Gaussian KDE
  • Truncated Gaussian

Archimedean Copulas (Bivariate)

  • Clayton
  • Frank
  • Gumbel

Multivariate

  • Gaussian
  • D-Vine
  • C-Vine
  • R-Vine

Install

Requirements

Copulas has been developed and tested on Python 3.5, 3.6 and 3.7

Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where Copulas is run.

Install with pip

The easiest and recommended way to install Copulas is using pip:

pip install copulas

This will pull and install the latest stable release from PyPi.

If you want to install from source or contribute to the project please read the Contributing Guide.

Quickstart

In this short quickstart, we show how to model a multivariate dataset and then generate synthetic data that resembles it.

import warnings
warnings.filterwarnings('ignore')

from copulas.datasets import sample_trivariate_xyz
from copulas.multivariate import GaussianMultivariate
from copulas.visualization import compare_3d

# Load a dataset with 3 columns that are not independent
real_data = sample_trivariate_xyz()

# Fit a gaussian copula to the data
copula = GaussianMultivariate()
copula.fit(real_data)

# Sample synthetic data
synthetic_data = copula.sample(len(real_data))

# Plot the real and the synthetic data to compare
compare_3d(real_data, synthetic_data)

The output will be a figure with two plots, showing what both the real and the synthetic data that you just generated look like:

Quickstart

What's next?

For more details about Copulas and all its possibilities and features, please check the documentation site.

There you can learn more about how to contribute to Copulas in order to help us developing new features or cool ideas.

Credits

Copulas is an open source project from the Data to AI Lab at MIT which has been built and maintained over the years by the following team:

Related Projects

SDV

SDV, for Synthetic Data Vault, is the end-user library for synthesizing data in development under the HDI Project. SDV allows you to easily model and sample relational datasets using Copulas thought a simple API. Other features include anonymization of Personal Identifiable Information (PII) and preserving relational integrity on sampled records.

CTGAN

CTGAN is a GAN based model for synthesizing tabular data. It's also developed by the MIT's Data to AI Lab and is under active development.

copulas's People

Contributors

aliciasun avatar amontanez24 avatar csala avatar jdtheripperpc avatar k15z avatar kveerama avatar manuelalvarezc avatar paulolimac avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.