Giter VIP home page Giter VIP logo

recap's Introduction

recap

What is Recap?

Recap reads and writes schemas from web services, databases, and schema registries in a standard format.

⭐️ If you like this project, please give it a star! It helps the project get more visibility.

Table of Contents

Supported Formats

Format Read Write
Avro
Protobuf
JSON Schema
Snowflake
PostgreSQL
MySQL
BigQuery
Confluent Schema Registry
Hive Metastore

Install

Install Recap and all of its optional dependencies:

pip install 'recap-core[all]'

You can also select specific dependencies:

pip install 'recap-core[avro,kafka]'

See pyproject.toml for a list of optional dependencies.

Usage

CLI

Recap comes with a command line interface that can list and read schemas from external systems.

List the children of a URL:

recap ls postgresql://user:pass@host:port/testdb
[
  "pg_toast",
  "pg_catalog",
  "public",
  "information_schema"
]

Keep drilling down:

recap ls postgresql://user:pass@host:port/testdb/public
[
  "test_types"
]

Read the schema for the test_types table as a Recap struct:

recap schema postgresql://user:pass@host:port/testdb/public/test_types
{
  "type": "struct",
  "fields": [
    {
      "type": "int64",
      "name": "test_bigint",
      "optional": true
    }
  ]
}

Gateway

Recap comes with a stateless HTTP/JSON gateway that can list and read schemas from data catalogs and databases.

Start the server at http://localhost:8000:

recap serve

List the schemas in a PostgreSQL database:

curl http://localhost:8000/gateway/ls/postgresql://user:pass@host:port/testdb
["pg_toast","pg_catalog","public","information_schema"]

And read a schema:

curl http://localhost:8000/gateway/schema/postgresql://user:pass@host:port/testdb/public/test_types
{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}

The gateway fetches schemas from external systems in realtime and returns them as Recap schemas.

An OpenAPI schema is available at http://localhost:8000/docs.

Registry

You can store schemas in Recap's schema registry.

Start the server at http://localhost:8000:

recap serve

Put a schema in the registry:

curl -X POST \
    -H "Content-Type: application/x-recap+json" \
    -d '{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}' \
    http://localhost:8000/registry/some_schema

Get the schema (and version) from the registry:

curl http://localhost:8000/registry/some_schema
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]

Put a new version of the schema in the registry:

curl -X POST \
    -H "Content-Type: application/x-recap+json" \
    -d '{"type":"struct","fields":[{"type":"int32","name":"test_int","optional":true}]}' \
    http://localhost:8000/registry/some_schema

List schema versions:

curl http://localhost:8000/registry/some_schema/versions
[1,2]

Get a specific version of the schema:

curl http://localhost:8000/registry/some_schema/versions/1
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]

The registry uses fsspec to store schemas in a variety of filesystems like S3, GCS, ABS, and the local filesystem. See the registry docs for more details.

An OpenAPI schema is available at http://localhost:8000/docs.

API

Recap has recap.converters and recap.clients packages.

  • Converters convert schemas to and from Recap schemas.
  • Clients read schemas from external systems (databases, schema registries, and so on) and use converters to return Recap schemas.

Read a schema from PostgreSQL:

from recap.clients import create_client

with create_client("postgresql://user:pass@host:port/testdb") as c:
    c.schema("testdb", "public", "test_types")

Convert the schema to Avro, Protobuf, and JSON schemas:

from recap.converters.avro import AvroConverter
from recap.converters.protobuf import ProtobufConverter
from recap.converters.json_schema import JSONSchemaConverter

avro_schema = AvroConverter().from_recap(struct)
protobuf_schema = ProtobufConverter().from_recap(struct)
json_schema = JSONSchemaConverter().from_recap(struct)

Transpile schemas from one format to another:

from recap.converters.json_schema import JSONSchemaConverter
from recap.converters.avro import AvroConverter

json_schema = """
{
    "type": "object",
    "$id": "https://recap.build/person.schema.json",
    "properties": {
        "name": {"type": "string"}
    }
}
"""

# Use Recap as an intermediate format to convert JSON schema to Avro
struct = JSONSchemaConverter().to_recap(json_schema)
avro_schema = AvroConverter().from_recap(struct)

Store schemas in Recap's schema registry:

from recap.storage.registry import RegistryStorage
from recap.types import StructType, IntType

storage = RegistryStorage("file:///tmp/recap-registry-storage")
version = storage.put(
    "postgresql://localhost:5432/testdb/public/test_table",
    StructType(fields=[IntType(32)])
)
storage.get("postgresql://localhost:5432/testdb/public/test_table")

# Get all versions of a schema
versions = storage.versions("postgresql://localhost:5432/testdb/public/test_table")

# List all schemas in the registry
schemas = storage.ls()

Docker

Recap's gateway and registry are also available as a Docker image:

docker run \
    -p 8000:8000 \
    -e RECAP_URLS=["postgresql://user:pass@localhost:5432/testdb"]' \
    ghcr.io/recap-build/recap:latest

See Recap's Docker documentation for more details.

Schema

See Recap's type spec for details on Recap's type system.

Documentation

Recap's documentation is available at recap.build.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.