bakdata / quick Goto Github PK

View Code? Open in Web Editor NEW

12.0 3.0 0.0 3.23 MB

The Fastest Way to Create Live Data Products

Home Page: https://bakdata.github.io/quick

License: Apache License 2.0

Kotlin 2.60% Java 95.73% Mustache 0.20% Dockerfile 0.06% Shell 1.04% Just 0.37%

apache-kafka data-stream graphql kafka-streams rest-api

quick's Issues

Deploy range mirror

The manager should prepare the deployment of a range mirror. The manager must pass the range field to the range mirror. The range field is passed by the user with the --range-field <Fieled> through the Quick CLI. For that, we need a new endpoint in the manager.

Extend the docu with multi subscriptions (intro + example and details)

Similar to the docu for range queries, we want to add a description of multi subscriptions. This means we will have a part with a basic intro + example and a separate section that goes into details (for developers and interested users).

Non string type arguments not supported in list arguments

Issue description

The gateway cannot retrieve values of a given list with a non-string type (e.g., long or integer).

Expected Behavior

For a given schema:

type Query {
  findProducts(productId: [Int]): [Product] @topic(name: "schema-product-topic-test", keyArgument: "productId")
}

type Product {
  productId: Int!,
  name: String,
}

{
    findProduct(productId: [123, 456]) {
        productId
        name
    }
}

Current Behavior

Currently the gateway returns this error:

{
  "errors": [
    {
      "message": "Exception while fetching data (/findProducts) : class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')",
      "locations": [
        {
          "line": 2,
          "column": 5
        }
      ],
      "path": [
        "findProducts"
      ],
      "extensions": {
        "classification": "DataFetchingException"
      }
    }
  ],
  "data": {
    "findProducts": null
  }
}

Steps to Reproduce

Create a gateway with the schema
Create a topic and ingest some data
Query the data

Detailed Description

The ListArgumentFetcher cast the arguments object to the list of strings:

final List<V> results = this.dataFetcherClient.fetchResults((List<String>) arguments);

This cast fails for a list of integers or longs.

GraphQL to Protobuf converter

We require a GraphQL to Protobuf supporter similar to GraphQLToAvroConverter.
It will be called when a user creates a new topic with a GraphQL schema and expects Quick to use Protobuf schemas.

Refactor validation rules

As a lash-up solution for #88 we disabled the validation rule (KeyInformation) that caused the clash with the semantics of multi-subscription. We now want to refactor the validation rules so that the KeyInformation rule can be used again.

Create a test for MultiSubscriptionFetcher that encompasses complex types

For MultiSubscriptionFetcher we only have tests that involve string types.
In the scope of this issue, we want to add extra tests that encompass complex types (avro and proto).

Links in the Docu should point to the version being viewed instead of latest

As in the title.

Create Range Data Fetcher

The Gateway needs to prepare the range query request to send to the mirror. This can be done with a range data fetcher. The fetcher then sends a GET request to the mirror endpoint. Example: GET /user-request-mirror/mirror/range/1?from=1&to=2

Design: Protobuf support

Last updated: 04.05.2022

Milestone: Protobuf support
Development: 0.7

This issue describes our approach for the support of Protobuf in Quick.

Protobuf is a data format for (de-)serializing data that has gained a lot of support in the Kafka ecosystem recently. It is comparable to Avro, which so far is the only schema format supported by Quick.

We track all related issues in the Protobuf support milestone. As per the roadmap, the development of this feature is planned for Quick 0.7.

Goals

With the implementation of this enhancement, Quick supports:

Topic Creation: users can create topics that are backed by Protobuf schemas
Format Information: components should be able to tell which schema format a topic uses
Data Ingest: users can ingest data into topics backed by Protobuf schemas
GraphQL Query: gateways can query topics backed by Protobuf schemas
GraphQL Subscription: gateways can subscribe to topics backed by Protobuf schemas
Mirror: mirrors read and expose data from topics backed by Protobuf schemas

Implementation

1. Topic Creation

Goal: users can create topics that are backed by Protobuf schemas

First, let's look into what happens when the user creates a new topic. Quick:

checks if it already exists (topic registry, Kafka, Schema Registry)
creates the Kafka topic
converts (key and value) GraphQL to Avro and registers it with the Schema Registry
deploys a mirror

The steps affected by the proposed change are 1 and 3:

In 1, Quick has to ensure the schema doesn't exist yet. So far, this works by checking if the subject already exists. We have to look into whether this works the same for Protobuf.
For step 3, Quick now requires a converter from GraphQL to Protobuf. Further, we also have to evaluate if the registration works the same as for Avro.

Quick additionally requires a way to let users decide between Avro and Protobuf. There are (at least) the following two options to implement this:

Extend the API to include an additional parameter for setting the schema format
Add a configuration variable for setting the schema format

The advantages of 1 is the flexibility that comes with it. A user can decide per topic creation which schema they want.
However, this can also become repetitive since most use a single format. This also complicates the overall implementation: We would then require a way to propagate the information per topic.
We therefore start with option 2. If users require option 1, we can still add it later.

2. Format Information

Goal: components should be able to tell which schema format a topic uses.

All the following goals require a mechanism in place that tells the corresponding components whether the topics use Avro or Protobuf. Since we start with a global environment variable as described in goal 1, this configuration can be used.

Other options that allow more granular configurations are:

Store this information in the topic registry
Infer from Schema Registry

3. Data Ingest

Goal: users can ingest data into topics backed by Protobuf schemas

The ingest-service uses the TypeResolver to transform JSON to Avro. We therefore require an additional implementation of TypeResolver for Protobuf. The configuration of the TypeResolver happens in QuickTopicTypeService. Here Quick has to differentiate between Avro and Protobuf and set the resolver accordingly.

4. GraphQL Query

Goal: gateways can query topics backed by Protobuf schemas

This is dependent on goal 6 (mirror). During a GraphQL query, the gateway forwards requests to corresponding mirror applications. The communication between gateway and mirror uses REST + JSON. Therefore, the underlying schema format is transparent from the gateway's point of view.

5. GraphQL Subscription

Goal: gateways can subscribe to topics backed by Protobuf schemas

Similar to the data ingest, the GraphQL subscription uses the SerDe provided by the QuickTopicTypeService. Since Quick can't know the exact message, it has to use DynamicMessage. This is similar to the way Quick uses Avro's GenericRecords currently.

6. Mirror

Goal: mirrors can read data from topics backed by Protobuf schemas

As in the data ingest, mirrors use the QuickTopicTypeService to get TypeResolver for (de-)serializing data. The resolvers are used to

read data from topic
store them in the state store
transform the data to JSON in the REST API

Thus, the mirror can handle Protobuf with the updated TypeResolver.

Document supported GraphQL elements

We should consider adding a section in our documentation where we address the supported GraphQL elements. For example, we are not supporting GraphQL Schemas with Union or Interface types.

Micronaut 3.0 upgrade

We're still on Micronaut 2.5 and could benefit from some of latest updates

Expose and implement range query API

The mirror should expose the REST API GET /user-request-mirror/mirror/range/1?from=1&to=2. Moreover, we should extend implement the a getRange function in the QueryService interface.

Define the system behaviour when --no-point is entered and there is no range field

What should happen if a user passes --no-point flag and no range is specified at the same time?
Some possible scenarios:

Create neither topic nor mirror -> throw an exception
point=true and create a topic and the corresponding mirror -> log to the user what happens

Check for `--range-field` and `--retention-time` during deployment

The --retention-time option should only be set with --point option and not with the --range-field option

Support reading Protobuf data

The mirror should be able to read data that was serialized with Protobuf.
This should mainly be done through #24. We have to evaluate if further changes to the mirror are necessary.

Allow users to switch between Avro and Protobuf when creating new topics

The user should switch between both types as required. Right now, Quick assumes all schemas are in Avro.

Ensure Protobuf schema doesn't exist when creating a new topic

As described in #17, when creating a new topic, Quick checks whether the corresponding subject for its schema already exists in the schema registry.
We need to make sure the existing mechanism also works with Protobuf.

Add test case for Protobuf
(maybe) add logic for testing Protobuf schema existence

Convert Protobuf to GraphQL

We require the automatic conversion from the Kafka schema to GraphQL for checking schema compatibility and automatically creating the target schema.
With #3, we therefore need the conversion from Protobuf to GraphQL.

Topic Registry should store the precise schema type

Issue description

Curenntly, the Internal Topic Registry, does not contain the precise schema type of the topic. Concretely, if Quick is configured with one of the supported schemas (i.e., Avro or Protobuf), the Internal Topic Registry registers each topic only with the value type SCHEMA.

Expected Behavior

That the Internal Topic Registry stores the correct schema type in the value type of a topic.

Current Behavior

The Internal Topic Registry registers the value type of a topic with a schema as SCHEMA.
Here is a screenshot of a registered topic with schema:

Steps to Reproduce

Create a topic with a schema: quick topic create example-topic --key-type int --value-type schema --schema test-gateway.mytype
Use Redpanda Console to see the registered topic

Detailed Description

The TopicController class sets the value type to SCHEMA. The TopicService does not check if the schema type is Avro or Protobuf.

Gateway should not work on individual data types, just on JSON.

Consider the scenario in which a user wants to make a query according to the following:

type Query {
getPurchase(id: String): Purchase @topic(name: "purchase", keyArgument: "purchaseId")
}

with a concrete query being:

{
getPurchase(id: "abc")  {
    productId
    }
}

What happens behind the scenes, for example in QueryKeyArgumentFetcher.get():

We retrieve the argument ("abc") from the DataFetchingEnvironment, which is JSON representation of the graphql schema,
We fetch a value of a specific type (Double, Avro, Protobuf) from the DataFetcherClient,
This value is then converted by the framework (which one?) to JSON and delivered to the client.

If we start and end up with JSON, we make unnecessary conversions. Thus, we might refrain from working with different data types (f.e., Double, Protobuf, Avro) in Gateway and work directly with JSON.

For this:
a) MirrorDataFetcherClient has to be rebuilt so that it does not work with TypeResolver<V> but with JSON,
b) MirrorClient should not receive a resolver that works on a given data type but simply on JSON.

Additionally, it might be considered to complete remove the generic V parameter from Gateway.

Investigate high latency ingestion

After upgrading Kafka to 3.1, the ingestion time increased noticeably. I had to add sleeps in the e2e tests after each ingest. We should investigate what the main reason is.

New configuration variable for schema type

As described in #17, we require a configuration variable for letting users set their desired schema format (e.g., Avro, Protobuf).

Java Configuration class
Helm chart support

Mirror should consume from non Quick topics

Imagine the following scenario, if a user has a topic filled with records and only wants to query them, they only need a mirror that consumes the data from their topic. But now they have to create a topic with quick move their records to the new topic to make them queryable. This limits the users.

We should consi der updating the mirror creation command in the CLI so it sends the correct key and value type. The manager should check the Internal Topic Registry and register the topic with the correct key and value type. This check can be done in the KubernetesMirrorService.

Key information validation clashes with the semantics of multi subscriptions

I wanted to test multi-subscriptions using CLI. I created a gateway and applied the following schema:

type Query {
    findPurchase(purchaseId: String): Purchase @topic(name: "purchase", keyArgument: "purchaseId")
    allPurchases: [Purchase!] @topic(name: "purchase")
}

type Purchase {
    purchaseId: String!
    productId: Int!
    userId: Int!
    product: Product @topic(name: "product", keyField: "productId")
    amount: Int
    price: Price
}

type Product {
    productId: Int!
    name: String
    description: String
    price: Price
}

type Price {
    total: Float
    currency: String
}

type Click {
    userId: Int!
    timestamp: Int
}

type Subscription {
    userStatistics: UserStatistics
}

type UserStatistics {
    purchase: Purchase @topic(name: "purchase")
    click: Click @topic(name: "click")
}

I received the following error: Internal Server Error: {"type":"errors/serverError","title":"Internal Server Error","code":500,"detail":"An unexpected error occurred:When the return type is not a list for a non-mutation and non-subscription type, key information (keyArgument or keyField) is needed.","uriPath":"/control/schema"} Could not apply schema to gateway: multisubstest

Investigation of the error message led me to the one of the Validation Rules - KeyInformation

The semantics of multi subscription that transfers topic directives from the Subscription type to the user-defined type clashes with this rule.

Expose partition information in mirror

As described by https://github.com/bakdata/kafka-key-value-store, Quick mirrors should expose information about the different partitions and the corresponding hosts.

Evaluate range queries

Currently, quick only supports point queries. Some use-cases need the support of range queries. Interactive Queries enable querying the state store in Kafka Streams. For range queries we found these approaches:

Use the range() method in the ReadOnlyKeyValueStore interface.
Another possibility is to use the prefixScan() method.
The newest release of Kafka 3.2 comes with Interactive Queries v2, which adds a RangeQuery class to the IQs.

We should evaluate these approaches and set up a road map for the implementation.

Partition routing not working properly

Issue description

The partition routing in the gateway is not working properly for mirrors with range index. There are some bugs and problems I noticed while working with the partition routing more in-depth:

The StreamsStateHost creates a wrong URL. So the request never arrives at the Mirror.
The getResponseFromFallbackService method in MirrorRequestManagerWithFallback creates the wrong URL because it is ignoring the query parameters of the initial URL.
The MirrorDataFetcherClient creates a PartitionedMirrorClient using the String SerDe. This causes that the partition calculation gets wrong. We need to know the type of the key that user is sending with its query request.

Expected Behavior

That the partition router updates the partition to host map correctly.

Current Behavior

The partition router in the gateway fails to fetch the information from a mirror and cannot access the correct mirror.

Steps to Reproduce

Setup a gateway, mirror with range index
send the first request to register the partition to host id
kill the mirror to get a new ip
send the request again

Add tracing support

We should add tracing to Quick to troubleshoot interactions between the Microservices. The traces can be gathered with the API, SDK of OpenTelemetry. For the tracing backend, we can choose between Jeager, Zipkin, or SigNoz.

Enforce checkstyle in the CI

We use checkstyle for our coding style. But currently, we are not enforcing the check in our CI runs. We should add this check a let the CI fail.

Add a mechanism for the dynamic update of partition-host mapping to the PartitionRouter (PartitionedMirrorClient)

Currently we fetch information about the partition-host mapping once during the initialisation of the PartitionRouter. Should the mapping change, for example because of the addition of another replica, some cache-misses will occur as there is no update-mechanism. Thus, it is advisable to introduce such a mechanism, i.e. the possibility to update the mapping dynamically.

A possible approach:

If there is a host-miss, the request is forwarded to the correct replica without any error,
In addition, a prompt to update the cache is written into a header,
The gateway uses this info to forward it to the router which reinits (updates) the mapping.

Add `rangeFrom` and `rangeTo` to `@topic` directive

The Gateway should be able to read and store the information for a range query

type Query {
    userRequests(
        userId: Int
        timestampFrom: Int
        timestampTo: Int
    ): [UserRequests] @topic(name: "user-request-range", 
                             keyArgument: "userId", 
                             rangeFrom: "timestampFrom", 
                             rangeTo: "timestampTo")
}

type UserRequests {
    userId: Int
    serviceId: Int
    timestamp: Int
    requests: Int
    success: Int
}

Subscription- and MultiSubscriptionFetcher should work with JSON and not with generics

This issue is related to #49 and should address its limitations, i.e. adjusting SubscriptionFetcher and MultiSubscriptionFetcher to work with JSON.

Move solutions to main repository

We've two existing solutions, "Creating and querying real-time Customer Profiles" and "Real-time Monitoring and Analytics", that should be moved into the examples section of our documentation.

Partition-mapping routing in the gateway to avoid selecting a wrong replica in a multi-replica scenario

When we have a single mirror, then the situation is clear. All partitions are located on this particular mirror, and when there is a request for the value of a given key, it is retrieved seamlessly.
The problem arises when we have more than one replica of a mirror:

Scenario: Mirror1 stores partitions 1 and 4, and Mirror2 stores partitions 2 and 3. Let's say that we want to get data for the key="x". Our hashing function h says that h("x") = 1 which means that the value for the key "x" is stored in the first partition, which is located in Mirror1.
Currently, a request from gateway goes to the (Kubernetes?) service that chooses a replica in a round-robin fashion. This means that it is statistically half of the time wrong. When this happens, the request has to be redirected to the other replica.

Because of the fact that a mirror knows which partitions it stores (it has a mapping between a partitions and a host), we can use this information to introduce routing based on the partition mapping.

Questions:

How do we expose this info to the gateway?
In which part of the gateway should we put this functionality?

Extra ideas:

Caching
Mirror could announce itself by the gateway (observer pattern?)

Broken links in documentation

Support range on field type

Currently, we just concentrated on range queries with topic directives on the Query type. It would be nice to support range of queries on the field type.

type Query {
    product(key: Int, timestampFrom: Int, timestampTo: Int): ProductInfo!
}

type ProductInfo {
    key: String!
    info: [Info!] @topic(name: "info-topic", keyField: "key", rangeFrom: "timestampFrom", rangeTo: "timestampTo")
}

type Info {
    key: Int
    timestamp: Int!
}

New `TypeResolver` for Protobuf

The TypeResolver is responsible for (de)-serializing data from and to strings. Quick requires this functionality for Ingest (String -> Protobuf) and Mirror/Gateway (Protobuf -> String).
This can probably be done with Protobuf's included json conversion.

It is not possible to create a gateway when -s flag is used

Executing the gateway create command with the -s flag results in error.

Example:

quick gateway create -s schema.graphql schematest (quick gateway create --schema schema.graphql schematest)

results in:

Internal Server Error: An unexpected error occurred:while parsing a block mapping
 in 'reader', line 1, column 1:
    apiVersion: v1
    ^
expected <block end>, but found '}'
 in 'reader', line 14, column 1:
    }
    ^

schema.graphql is:

type Query {
    getPerson(id: ID): Person @topic(name: "person", keyArgument: "id")
    getCorporate(id: ID): Corporate @topic(name: "corporate", keyArgument: "id")
}

type Person {
    id: ID!
    corporateId: ID!
    firstName: String
    lastName: String
    # birthday: String
    # birthLocation: String
    corporate: Corporate @topic(name: "corporate", keyField: "corporateId")
}

type Corporate {
    id: ID!
    referenceId: String
    name: String
    street: String
    city: String
}

There is a problem with transferring the schema.graphql to the .yml file with themeleaf. After having corrected the schema.graphql manually so that the indentation is 2 spaces and not 4, the error changes to:

mapping values are not allowed here
 in 'reader', line 12, column 35:
      findPurchase(purchaseId: String): Purchase @topic(name: "purchas ...

No error message when creating the same gateway, app, or mirror multiple times

Issue description
Issuing the following command doesn't lead to an error message:
quick gateway create example

Expected behaviour
An error should be displayed

Current behaviour
No error

Steps to Reproduce
quick gateway create example
quick gateway create example
quick gateway create example

Detailed Description
Executing commands one after another yields the following result: Create gateway example (this may take a few seconds) instead of an error.
Similar behaviour is expected when creating an application or a mirror because the corresponding services also reference the KubernetesManagerClient.deploy function.

Make range queries exclusive

Range queries upper bound should be exclusive.

Allow a possibility to choose the instance of Python Package Index

Sometimes it is desired to use the TestPyPI as an instance of the Python Package Index. For example, a specific new functionality of Quick demands a more recent version of CLI, which has not been deployed yet.
The possibility of choosing the package index version during the process of building a docker image is nice-to-have.
The index and the version of CLI could be passed to the docker build command as arguments. For example:
docker build --build-arg index=test version=0.7.0.dev6

When this is done, the documentation can be adjusted accordingly.

Implement processor for range indexes

The mirror should implement a custom processor and create the flattened key (with the zero padding) index structure

Enable the app deployment from a private container registry

Regarding streams app deployments, the manager is limited to open (public) container repositories. In order to allow users to deploy an app from a private repo, a possibility to add imagePullSecrets through quick and quick-cli should be added.
Example:

 quick app deploy tiny-url-counter \
     --registry bakdata \
     --image quick-examples-tinyurl \
     --tag 0.0.1 \
     --imagePullSecrets=secret\
     --args input-topics=track-fetch output-topic=count-fetch productive=false

We assume that a user has (uses) only one secret. Checking multiple secrets won’t be supported for now.

Improve validation checks for the topic directive

Below, some examples which show invalid behaviour or missing checks.

Validation of keyArguments

type Product {
    id: ID!         
    name: String!   
}

type Query {        
    getProduct(id: ID): Product @topic(name: "product-topic")    
}

The query is missing the keyArgment but since the mutation rule passes no error will be thrown.
Another example:

type Query {
    getProduct(productId: ID): ProductInfo
}

type ProductInfo {
    product: Product @topic(name: "product-topic")
    url: String @topic(name: "url-topic", keyArgument: "productId")
}

Validation of the rule with one input argument of a mutation type.

type Mutation {
    setClick(clickCount: Long): Long @topic(name: "click-topic")
}

It should contain 2 inputs.

Add Telepresence support

To develop and debug Quick locally, we can use a tool like Telepresence.

Adjust manager to the change in the --point / --no-point approach

We decided to drop the possibility of not creating a mirror index. A mirror index is always created by default. Thus, there is no need for --point or --no-point flags. Since Manager has already been adjusted to these flags, it must re-adjusted again.

Add documentation for working with range queries

We should provide documentation and a brief example of how the user can create and query range queries.

Design: Range queries V1 support

Desing: Range queries V1 support

Development: 0.8

last update: 06.10.2022

This issue describes our approach for the support of Range queries in Quick.

Goals

Quick CLI: user defines range mirrors that index the data for range queries
GraphQL Range Query: the user defines the range field (from and to) in the GraphQL Query type
GraphQL Range Data Fetcher: extracts the range information and prepares the request to the mirror
Range Processor for Mirrors: the quick mirror builds a range index on a separate state store
Range Index Structure: a flattened string key in the mirror
Range Query Service: a service that calls the Interactive Query API to fetch the data from the range state store

Out of scope

Custom State Store: the range queries use RocksDB as the default state store. Using a custom state store (SQLite) is currently not in the scope of this epic
Custom order: the Interactive Query API of Kafka returns the results in the Lexicographic order. To customize this order, we need to build a custom index on RocksDB
To infinity: if the query contains only from and not to argument, then all the values from the lower bound to the highest bound should be returned.
Mirror Library: an abstraction over the mirror API where the user can implement its query logic
Query Complex Keys: Quick does not support topics with complex keys yet
Pagination: Limit the data with pagination
Multi Range: to do range over two fields

Implementation

1. Quick CLI

Goal: the user defines range mirrors that index the data for range queries

During topic creation, the user can pass a --range-field <Fieled> option. This option deploys a mirror with an extra state store containing the range query index.

Example:

quick topic user-request-range --key integer --value schema --schema gateway.UserRequests --range-field timestamp

This command sends a request to the manager, and the manager prepares the deployment of a mirror called user-request-range. This mirror creates two indexes:

Range Index over the topic key (here the userId) and timestamp
Point Index only over the topic key (here the userId)

2. GraphQL Range Query

Goal: the user defines the range (from and to) in GraphQL Query type

The user needs to define the range query and arguments in the GraphQL schema. The GraphQL schema should contain the necessary information for the range data fetcher. For simplicity, we decided to extend the @topic directive. The @topic directive gets two new arguments, rangeFrom and rangeTo. These two arguments define the range for a specific field.

Example:

type Query {
    userRequests(
        userId: Int
        timestampFrom: Int
        timestampTo: Int
    ): [UserRequests] @topic(name: "user-request-range", 
                             keyArgument: "userId", 
                             rangeFrom: "timestampFrom", 
                             rangeTo: "timestampTo")
}

type UserRequests {
    userId: Int
    serviceId: Int
    timestamp: Int
    requests: Int
    success: Int
}

3. GraphQL Range Data Fetcher

Goal: extracts the range information and prepares the request to the mirror

Given the example below:

# query from 1 to 2
{
    userRequests(userId: 1, timestampFrom: 1, timestampTo: 2)  {
        requests
    }
}

The range data fetcher gets the necessary information and prepares a range call to the mirror range endpoint:
GET /user-request-mirror/mirror/range/1?from=1&to=2. It is important to notice that the range query is an exclusive range. In other words, the boundary point is not included in the range. For this specific example, so only the value of timestamp 1 is included in the returned value.

4. Range Processor for Mirrors

Goal: the mirror builds a range index on a separated state store

The mirror needs a new processor to prepare a range index in a separate state store for range queries. Consider the following example. The topic contains the following information:

key (UserId)	value
1	{timestamp: 1, serviceId: 2, requests: 10, success: 8}
1	{timestamp: 2, serviceId: 3, requests: 5, success: 3}
2	{timestamp: 1, serviceId: 4, requests: 7, success: 2}

The range mirror will materialize the topic in RocksDB in two ways:

For range queries:

key	value
1_00000000001	{timestamp: 1, serviceId: 2, requests: 10, success: 8}
1_00000000002	{timestamp: 2, serviceId: 3, requests: 5, success: 3}
2_00000000001	{timestamp: 1, serviceId: 4, requests: 7, success: 2}

For point queries:

key	value
1	{timestamp: 2, serviceId: 3, requests: 5, success: 3}
2	{timestamp: 1, serviceId: 4, requests: 7, success: 2}

5. Range Index Structure

Goal: a flattened string key in the mirror

The mirror implements the processor API to create an index to support range queries. This index is a flattened string with a combination of the topic key and the value for which the range queries are requested. The index needs to pad the values (depending on the type Int 10 digits or Long 19 digits) with zeros to keep the lexicographic order. So to generify the format of the key in the state store: <topicKeyValue>_<zero_paddings><rangeFieldValue>. In our example, if we have a topic with userId as its key and want to create a range over the timestamp, the key in the state store would look like this:

1_00000000001

The flattened key approach will create unique keys for each user in a timestamp. Therefore all the values will be accessible when running a range query.

6. Range Query Service

Goal: a service that calls the Interactive Query API to fetch the data from the range state store

when the request GET /user-request-mirror/mirror/range/<key>?from=<rangeFrom>&to=<rangeTo> (e.g GET /user-request-mirror/mirror/range/1?from=1&to=2) is received by the mirror. The mirror creates the range from argument (in the above example, this would be 00000000001_00000000001 and range to (again in the example, this value would be 00000000001_00000000002) and passes these values to the range method of the IQ puts the values in a list and returns them to the requested gateway.

Prepare Kafka 3.0 upgrade

Kafka 3.0 is already out for a while. We should try whether there are any problems when running on Kafka 3.0.

`TopicTypeService` should be aware of different schema formats

In the implementation of TopicTypeService (mainly QuickTopicTypeService), Quick configures the SerDe (see #1) and TypeResolver (see #2). Therefore, Quick should be aware of Protobuf here to appropriately configure those elements.

bakdata / quick Goto Github PK

quick's Issues

Issue description

Expected Behavior

Current Behavior

Steps to Reproduce

Detailed Description

Design: Protobuf support

Goals

Implementation

1. Topic Creation

2. Format Information

3. Data Ingest

4. GraphQL Query

5. GraphQL Subscription

6. Mirror

Issue description

Expected Behavior

Current Behavior

Steps to Reproduce

Detailed Description

Issue description

Expected Behavior

Current Behavior

Steps to Reproduce

Desing: Range queries V1 support

Goals

Out of scope

Implementation

1. Quick CLI

2. GraphQL Range Query

3. GraphQL Range Data Fetcher

4. Range Processor for Mirrors

5. Range Index Structure

6. Range Query Service

Recommend Projects

Recommend Topics

Recommend Org