Giter VIP home page Giter VIP logo

synth's Introduction

The Declarative Data Generator


docs license language build status discord Run in Cloud Shell


Synth is a tool for generating realistic data using a declarative data model. Synth is database agnostic and can scale to millions of rows of data.

Why Synth

Synth answers a simple question. There are so many ways to consume data, why are there no frameworks for generating data?

Synth provides a robust, declarative framework for specifying constraint based data generation, solving the following problems developers face on the regular:

  1. You're creating an App from scratch and have no way to populate your fresh schema with correct, realistic data.
  2. You're doing integration testing / QA on production data, but you know it is bad practice, and you really should not be doing that.
  3. You want to see how your system will scale if your database suddenly has 10x the amount of data.

Synth solves exactly these problems with a flexible declarative data model which you can version control in git, peer review, and automate.

Key Features

The key features of Synth are:

  • Data as Code: Data generation is described using a declarative configuration language allowing you to specify your entire data model as code.

  • Import from Existing Sources: Synth can import data from existing sources and automatically create data models. Synth currently has Alpha support for Postgres!

  • Data Inference: While ingesting data, Synth automatically infers the relations, distributions and types of the dataset.

  • Database Agnostic: Synth supports semi-structured data and is database agnostic - playing nicely with SQL and NoSQL databases.

  • Semantic Data Types: Synth integrates with the (amazing) Python Faker library, supporting generation of thousands of semantic types (e.g. credit card numbers, email addresses etc.) as well as locales.

Installation & Getting Started

To get started quickly, check out the docs.

Examples

Building a data model from scratch

To start generating data without having a source to import from, you need to first initialise a workspace using synth init:

$ mkdir workspace && cd workspace && synth init

Inside the workspace we'll create a namespace for our data model and call it my_app:

$ mkdir my_app

Next let's create a users collection using Synth's configuration language, and put it into my_app/users.json:

{
    "type": "array",
    "length": {
        "type": "number",
        "constant": 1
    },
    "content": {
        "type": "object",
        "id": {
            "type": "number",
            "id": {}
        },
        "email": {
            "type": "string",
            "faker": {
                "generator": "email"
            }
        },
        "joined_on": {
            "type": "string",
            "date_time": {
                "format": "%Y-%m-%d",
                "subtype": "naive_date",
                "begin": "2010-01-01",
                "end": "2020-01-01"
            }
        }
    }
}

Finally, generate data using the synth generate command:

$ synth generate my_app/ --size 2 | jq
{
  "users": [
    {
      "email": "[email protected]",
      "id": 1,
      "joined_on": "2014-12-14"
    },
    {
      "email": "[email protected]",
      "id": 2,
      "joined_on": "2013-04-06"
    }
  ]
}

Building a data model from Postgres

If you have an existing database, Synth can create the data model for you by importing data from your database.

To get started, initialise your Synth workspace locally:

$ mkdir synth_workspace && cd synth_workspace && synth init

Then use the synth import command to build a data model from your Postgres database:

$ synth import tpch --from postgres://user:pass@localhost:5432/tpch
Building customer collection...
Building primary keys...
Building foreign keys...
Ingesting data for table customer...  10 rows done.

Finally, generate data into another instance of Postgres:

$ synth generate tpch --to postgres://user:pass@localhost:5433/tpch

Why Rust

We decided to build Synth from the ground up in Rust. We love Rust, and given the scale of data we wanted synth to generate, it made sense as a first choice. The combination of memory safety, performance, expressiveness and a great community made it a no-brainer and we've never looked back!

Get in touch

If you would like to learn more, or you would like support for your use-case, feel free to open an issue on Github.

If your query is more sensitive, you can email [email protected] and we'll happily chat about your usecase.

If you intend on using Synth, we would recommend joining our growing Discord community.

About Us

The Synth project is backed by OpenQuery. We are a YCombinator backed startup based in London, England. We are passionate about data privacy, developer productivity, and building great tools for software engineers.

Contributing

First of all, we sincerely appreciate all contributions to Synth, large or small so thank you.

See the contributing section for details.

License

Synth is source-available and licensed under the Apache 2.0 License.

synth's People

Contributors

christos-h avatar brokad avatar shkurskid avatar llogiq avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.