Giter VIP home page Giter VIP logo

awesome-synthetic-data's Introduction

awesome-synthetic-data

Awesome

A curated list of resources dedicated to Synthetic Data

If you want to contribute to this list, read the contribution guidelines first. Please add your favorite synthetic data resource by raising a pull request

Also, a listed repository should be deprecated if:

  • Repository's owner explicitly says that "this library is not maintained".
  • Not committed for a long time (2~3 years).

Contents

Research Summaries and Trends

Back to Top

Tutorials

Back to Top

Reading Content

Back to Top

Introductions and Guides to Synthetic Data

Blogs and Newsletters

Videos and Online Courses

Videos and Online Courses

Back to Top

Diffusion Models

Libraries

Open Source Generative Synthetic Data Models, Libraries and Frameworks | Back to Top

Text, Tabular and Time-Series

  • gretel-synthetics - Generative models for structured and unstructured text, tabular, and multi-variate time-series data featuring differentially private learning.
  • SDV - Synthetic Data Generator for tabular, relational, and time series data.
  • Synthea - Synthetic Patient Population Simulator.
  • ydata-synthetic - Synthetic structured data generators.
  • synthpop - A tool for producing synthetic versions of microdata.

Image

Audio

  • Jukebox - OpenAI's Jukebox- A Generative Model for Music.

Simulation

  • AirSim - AirSim is a simulator for drones, cars and more, built on Unreal and Unity engines.
  • Nvidia Dataset Synthesizer - NDDS is a UE4 plugin from NVIDIA to empower computer vision researchers to export high-quality synthetic images with metadata.
  • OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
  • Unity Perception Perception toolkit for sim2real training and validation in Unity.

Video

Academic Papers

Back to Top

Language Models

  • Evaluating Large Language Models Trained on Code (2021) Mark Chen et al. [pdf]

Generative Adversarial Networks (GANs)

  • Modeling Tabular Data using Conditional GAN (2019) Xu et al. [pdf]
  • Generating Long Videos of Dynamic Scenes (2022) Tim Brooks [pdf]
  • Generative Adversarial Networks (2014) Ian J. Goodfellow et al. [pdf]
  • Conditional Generative Adversarial Nets (2014) Mehdi Mirza et al. [pdf]
  • Modeling Tabular Data using Conditional GAN (2019) Xu et al. [pdf]
  • Wasserstein GAN (2017) Martin Arjovsky, et al.[pdf]
  • Improved Training of Wasserstein GANs (2017) Ishaan Gulrajani, et al. [pdf]
  • Time-series Generative Adversarial Networks (2019) Jinsung Yoon, et all [pdf]

Diffusion Models

  • Generative Modeling by Estimating Gradients of the Data Distribution (2021) Yang Song [pdf]
  • Diffusion Models are Autoencoders S. Dielman (2021) [pdf]
  • Deep Unsupervised Learning using Nonequilibrium Thermodynamics (2015) J Sohl-Dickstein et al. [pdf]
  • KNN-Diffusion: Image Generation via Large-Scale Retrieval (2022) Oron Ashual [pdf]

Fair AI

  • A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle (2021) Harini Suresh, John Guttag [pdf]
  • DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks (2021) Boris van Breugel et al [pdf]
  • On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (2021) Emily M. Bender, et al. [pdf]
  • A Survey on Bias and Fairness in Machine Learning (2022) Ninareh Mehrabi [pdf]
  • AI Fairness (Approaches & Mathematical Definitions) (2022) Jonathan Hui [blog]
  • AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias (2018) Rachel K. E. Bellamy et al [pdf]

Algorithmic Privacy

  • Deep Learning with Differential Privacy (2016) Abadi et al. [pdf]
  • An Efficient DP-SGD Mechanism for Large Scale NLP Models (2021) Dupuy et al. [pdf]
  • PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees (2018) Jordon et al. [pdf]
  • Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence (2021) Cao et al. [pdf]
  • Differentially Private Fine-tuning of Language Models (2022) Yu et al. [pdf]

Services

Synthetic Data as API with higher level functionality such model training, fine-tuning, and generation | Back to Top

Prominent Synthetic Data Research Labs

Back to Top

Datasets

Back to Top

License

License - CC0

awesome-synthetic-data's People

Contributors

zredlined avatar masonegger avatar andrewnc avatar amysteier avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.