Giter VIP home page Giter VIP logo

protoc-gen-bq-schema's Introduction

protoc-gen-bq-schema

Build Status Docker Hub

This is the fork of GoogleCloudPlatform/protoc-gen-bq-schema repository addressing these PRs:

Default branch of this repository is develop.

Two satellite repositories:

protoc-gen-bq-schema is a plugin for ProtocolBuffer compiler.

It converts messages written in .proto format into schema files in JSON for BigQuery.

So you can reuse existing data definitions in .proto for BigQuery with this plugin.

Installation

go get github.com/chuhlomin/protoc-gen-bq-schema

Usage

protoc --bq-schema_out=path/to/outdir foo.proto

protoc and protoc-gen-bq-schema commands must be found in $PATH.

The generated JSON schema files are suffixed with .schema and their base names are named after their package names and bq_table_name options.

If you do not already have the standard Google Protobuf libraries in your proto_path, you'll need to specify them directly on the command line (and potentially need to copy bq_schema.proto into a proto_path directory as well), like this:

protoc --bq-schema_out=path/to/out/dir foo.proto --proto_path=. --proto_path=<path_to_google_proto_folder>/src

Example

Suppose that we have the following foo.proto.

syntax = "proto3";

package foo;

import "google/type/date.proto";
import "bq/bq_table.proto";
import "bq/bq_field.proto";

message Bar {
  option (gen_bq_schema.bigquery_opts).table_name = "bar_table";
  option (gen_bq_schema.bigquery_opts).extra_fields = "f:INTEGER";
  option (gen_bq_schema.bigquery_opts).extra_fields = "g:RECORD:Baz";

  message Nested {
    repeated int32 a = 1;
  }

  enum EnumAllowingAlias {
    option allow_alias = true;
    UNKNOWN = 0;
    STARTED = 1;
    RUNNING = 1;
  }

  int32 a = 1; // field comment
  Nested b = 2;
  string c = 3;

  bool d = 4 [(gen_bq_schema.bigquery).ignore = true];
  uint64 e = 5 [
    (gen_bq_schema.bigquery) = {
      require: true
      type_override: 'TIMESTAMP'
    }
  ];

  google.type.Date date = 6 [(gen_bq_schema.bigquery).type_override = "DATE"];

  EnumAllowingAlias status = 8;
}

message Baz {
  int32 a = 1;
}

protoc --bq-schema_out=. foo.proto will generate a file named foo/bar_table.schema.

The message foo.Baz is ignored because it doesn't have option gen_bq_schema.bigquery_opts.

Plugin parameter enumsasints=true will marshal all enums into integers instead of strings: protoc --bq-schema_out=enumsasints=true:. foo.proto.

Docker Hub

You can use chuhlomin/protoc-gen-bq-schema image on Docker Hub.

Example Docker run:

mkdir bq_schema
docker run -i -t -v $(pwd):/workdir \
  chuhlomin/protoc-gen-bq-schema:1.6 \
  -I/workdir \
  -I/workdir/bq \
  --bq-schema_out=/workdir/bq_schema \
  /workdir/foo.proto

Example Drone step: .drone.yml

  - name: build
    image: chuhlomin/protoc-gen-bq-schema:1.6
    commands:
      - mkdir bq_schema
      - protoc -I/protobuf/ -I. -Ibq --bq-schema_out=bq_schema foo.proto

Local Development

To test build binaries inside an isolated Docker container (recommended):

docker run -i -t -v $(pwd):/workdir golang:1.12.14-alpine3.10 /bin/sh

apk add --no-cache make git gcc libc-dev protobuf
cd /workdir
make clean test install
make examples

exit

To test and build the plugin binary on your machine run the following commands:

make clean test install

# (optionally) build a Docker image
docker build -t protoc-gen-bq-schema:local .

License

protoc-gen-bq-schema is licensed under the Apache License version 2.0.

This is not an official Google product.

protoc-gen-bq-schema's People

Contributors

chuhlomin avatar dmccartney avatar glukasiknuro avatar jhump avatar jtratner avatar mdittmer avatar yugui avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

protoc-gen-bq-schema's Issues

How can this library support proto extensions ?

First of all, big thanks to @chuhlomin for keeping this repo alive 🙏

My issue:
I have proto files that have extensions defined in them, but it seems like this tool doesn't support them and simply ignores them. Now, am I not using this tool properly, is it intended, or is it just a missing feature?

Cheers

Issue with generated descriptions on fields in RECORD

Given the following proto:

syntax = "proto3";
package foo;
import "bq_table.proto";

message Bar {
  option (gen_bq_schema.bigquery_opts).table_name = "bar_table";

  // ThingA Description
  repeated ThingA things = 1;
}

message Baz {
  option (gen_bq_schema.bigquery_opts).table_name = "baz_table";

  // ThingB Description
  ThingB thing = 1;
}

message ThingA {
  // ThingA ID
  string thing_id = 1;
}

message ThingB {
  // ThingB ID
  string other_thing_id = 1;
}

The following schemas are generated:

bar_table.schema

[
 {
  "name": "things",
  "type": "RECORD",
  "mode": "REPEATED",
  "description": "ThingA Description",
  "fields": [
   {
    "name": "thing_id",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "ThingA ID"
   }
  ]
 }
]

baz_table.schema

[
 {
  "name": "thing",
  "type": "RECORD",
  "mode": "NULLABLE",
  "description": "ThingB Description",
  "fields": [
   {
    "name": "other_thing_id",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "ThingA ID"
   }
  ]
 }
]

Note the same description ThingA ID in the field inside RECORD in both schemas.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.