Giter VIP home page Giter VIP logo

Comments (8)

ejona86 avatar ejona86 commented on June 19, 2024 1

For reference, shared dictionary compression examples in HTTP:
https://chromium.googlesource.com/chromium/src/+/53.0.2744.1/net/sdch/README.md (vcdiff, the first example of this)
https://learn.microsoft.com/en-us/deployedge/learnmore-zsdch-compression (zstd)
https://chromestatus.com/feature/5124977788977152 (originally just brotli)

Those last two are related, and link to a shared draft rfc

from grpc-go.

coxley avatar coxley commented on June 19, 2024

Here's an example for how I will probably work around this until there's a better way. Feel free to steer me another direction :)

Protobuf

syntax = "proto3";

package zstd;
option go_package = "repro/zstd";

// Compressed is a wrapper for zstd encoded data in lieu of better ways of sharing metadata
message Compressed {
  string category = 1;
  string namespace = 2;
  bytes data = 3;
}

Codec

// codec is a dictionary-aware compression implementation
//
// Given a category and namespace, it knows how to find the correct dictionary
// to compress with. For decompression, it reads the dictionary ID from the
// zstd payload.
//
// This could later be augmented to have decompression IDs category:namespace
// aware vs. assuming they're globally scoped.
type codec struct {
	Category   string
	Namespace string
}

func (c *codec) Marshal(v any) ([]byte, error) {
	return marshal(c.Category, c.Namespace, v)
}

func (c *codec) Unmarshal(data []byte, v any) error {
	return unmarshal(data, v)
}

func (c *codec) Name() string {
	return "zstd-wrapper"
}

// marshal and compress 'v', wrap in [zstd.Compressed], and return the result
// of marshalling that
//
// Skip wrapping if 'v' is already a [zstd.Compressed]
func marshal(category, namespace string, v any) ([]byte, error) {
	switch vv := v.(type) {
	case *zstd.Compressed:
		return proto.Marshal(vv)
	case proto.Message:
		data, err := proto.Marshal(vv)
		if err != nil {
			return nil, err
		}
		container := &zstd.Compressed{
			Category:   category,
			Namespace: namespace,
			// TODO: Imagine there was compression happening before this
			Data: data,
		}

		fmt.Println("using the codec for compression")
		return proto.Marshal(container)
	default:
		return nil, fmt.Errorf("unsure what to do for type: %T", v)
	}
}

// unmarshal data into a [zstd.Compressed], then unmarshal it's data field into 'v'
func unmarshal(data []byte, v any) error {
	switch vv := v.(type) {
	case *zstd.Compressed:
		return proto.Unmarshal(data, vv)
	case proto.Message:
		var container zstd.Compressed
		if err := proto.Unmarshal(data, &container); err != nil {
			return err
		}
		fmt.Println("using the codec for decompression")
		// TODO: Imagine there was decompression happening before this
		return proto.Unmarshal(container.Data, vv)
	default:
		return fmt.Errorf("unsure what to do for type: %T", v)
	}
}

Usage

package main

import (
	"context"
	"fmt"
	"log"
	"net"

	zstd "repro/gen"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
	"google.golang.org/grpc/encoding"
	pb "google.golang.org/grpc/examples/helloworld/helloworld"
	"google.golang.org/protobuf/proto"
)

const addr = "localhost:50051"

type server struct {
	pb.UnimplementedGreeterServer
}

func (s *server) SayHello(ctx context.Context, in *pb.HelloRequest) (*pb.HelloReply, error) {
	return &pb.HelloReply{Message: "Hello " + in.GetName()}, nil
}

func main() {
	// Register default codec without category/namespace for decompression
	encoding.RegisterCodec(&codec{})

	lis, err := net.Listen("tcp", addr)
	if err != nil {
		log.Fatalf("failed to listen: %v", err)
	}

	s := grpc.NewServer()
	pb.RegisterGreeterServer(s, &server{})

	go func() {
		if err := s.Serve(lis); err != nil {
			log.Fatalf("failed to serve: %v", err)
		}
	}()

	// Now connect from client
	conn, err := grpc.Dial(addr, grpc.WithTransportCredentials(insecure.NewCredentials()))
	if err != nil {
		log.Fatalf("did not connect: %v", err)
	}
	defer conn.Close()

	c := pb.NewGreeterClient(conn)
	r, err := c.SayHello(
		context.Background(),
		&pb.HelloRequest{Name: "coxley"},
		// TODO: This would ideally be cached and retrieved via an accessor somewhere
		grpc.ForceCodec(&codec{Category: "test", Namespace: "foo"}),
	)
	if err != nil {
		log.Fatalf("could not greet: %v", err)
	}
	fmt.Printf("Result: %v\n", r)
}

from grpc-go.

dfawley avatar dfawley commented on June 19, 2024

Another option here would be to pass UseCompressor and SetSendCompressor an optional any argument that gets passed to the compressor as an extra parameter to Compress. This would not be a backward compatible change (it could be if we made it a variadic parameter, but that's a bit ugly from an API documentation POV). With this approach, Decompress can't really be parameterized, so the compressor would need to encode whatever the decompressor needs in its header.

If we wanted to pass the metadata around, then we could pass the metadata to both sides (outgoing metadata to the compressor, and incoming metadata to the decompressor). This then has the unfortunate effect of requiring the metadata to be polluted in order to parameterize the compressor.

Instead of passing metadata, we could pass the context directly. This would enable parameterization via the context instead of the metadata. I'm not sure we want to encourage blocking on this path, though, and passing a context might do that.

from grpc-go.

coxley avatar coxley commented on June 19, 2024

And with interesting timing, this recent one: https://developer.chrome.com/blog/shared-dictionary-compression

from grpc-go.

dfawley avatar dfawley commented on June 19, 2024

From a quick reading, it looks like none of these approaches requires decompression to have access to anything besides the compressed data -- is that correct? So we don't have any known use case that requires passing the incoming metadata to the decompressor?

from grpc-go.

coxley avatar coxley commented on June 19, 2024

@dfawley I think that could be true. Some cases may need to have multiple decompressor "names" for ID space scoping if that's the direction we go, though. (eg: zstd-groupA, zstd-groupB)

At least in zstd, the dictionary ID is limited to 4 bytes. I'm not sure how other companies "scope" their dictionary registries, but it's at least a consideration. Maybe not end of the world.

from grpc-go.

dfawley avatar dfawley commented on June 19, 2024

My proposed change, then would be:

package grpc

func UseCompressor(name string, compressorOptions ...any) CallOption {}

func SetSendCompressor(ctx context.Context, name string, compressorOptions ...any) error {}
package encoding

type Compressor interface {
	Compress(/*see below*/, ...any) /*see below*/
	...
}

Essentially this simply adds "...any" to all the places where we set and invoke the compressor.

Regarding the Compressor interface, we are planning some changes to the encoding package to support scatter/gather and memory re-use. Those will be covered in #6619 and I would propose incorporating these changes into the work there.

How does all of that sound?

from grpc-go.

coxley avatar coxley commented on June 19, 2024

@dfawley Sorry for the late reply — my Github notifications organization is abysmal.

I think that sounds good as long as it does for you!

from grpc-go.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.