numaproj / numaflow Goto Github PK

Kubernetes-native platform to run massively parallel data/streaming jobs

Home Page: https://numaflow.numaproj.io/

License: Apache License 2.0

Dockerfile 0.12% Makefile 0.40% Go 70.23% Awk 0.01% Shell 0.65% Smarty 0.07% Lua 0.02% HTML 0.04% CSS 0.41% TypeScript 15.69% JavaScript 0.01% Rust 12.31% Python 0.02% Mustache 0.02%

kubernetes stream-processing data-processing k8s pipeline map-reduce hacktoberfest

numaflow's Introduction

Numaflow

Welcome to Numaflow! A Kubernetes-native, serverless platform for running scalable and reliable event-driven applications. Numaflow decouples event sources and sinks from the processing logic, allowing each component to independently auto-scale based on demand. With out-of-the-box sources and sinks, and built-in observability, developers can focus on their processing logic without worrying about event consumption, writing boilerplate code, or operational complexities. Each step of the pipeline can be written in any programming language, offering unparalleled flexibility in using the best programming language for each step and ease of using the languages you are most familiar with.

Numaflow, created by the Intuit Argo team to address community needs for continuous event processing, leverages their expertise to deliver a scalable and robust, serverless platform for event-driven applications.

Use Cases

Event driven applications: Process events as they happen, e.g., updating inventory and sending customer notifications in e-commerce.
Real time analytics: Analyze data instantly, e.g., social media analytics, observability data processing.
Inference on streaming data: Perform real-time predictions, e.g., anomaly detection.
Workflows running in a streaming manner.

Key Features

Kubernetes-native: If you know Kubernetes, you already know how to use Numaflow.
Serverless: Focus on your code and let the system scale up and down based on demand.
Language agnostic: Use your favorite programming language.
Exactly-Once semantics: No input element is duplicated or lost even as pods are rescheduled or restarted.
Auto-scaling with back-pressure: Each vertex automatically scales from zero to whatever is needed.

Data Integrity Guarantees

Minimally provide at-least-once semantics
Provide exactly-once semantics for unbounded and near real-time data sources
Preserving order is not required

Roadmap

Mono Vertex to bypass ISB for simple use cases (1.4)

Demo

Resources

numaflow's People

Contributors

Stargazers

Watchers

Forkers

whynowy sarabala1979 krithika3 malinamfong jy4096 qhuai kohlisid vdharmap yhl25 vigith ashwinidulams saradhis neeraj-a-22 draychev lxlxok shadowshot-x ab93 cosmic-chichu keranyang shivagoy dratler nkoppisetty akhilchoudhary2k premadk dseapy kokikathir shashank10456 veds-g mobs75 dpadhiar juliev0 alexanghh xdevxy sandangel louisnow sjanulonoks luis-sousa-pinto atlanhq rohanashar jasonzeshengchen jsallay jicanghaixb elad-en bbehnke gary115 tczhao akhil-rane kevjameshilton kevdowney garomonegro ccpeng genaziskind ysmolyar bkono estela-ramirez harshapps darshansimha mshakira vishalsoni07 joelcomp1 chandankumar4 tejablaze muregii jorvaulx bpcarey01 dsosa17 cdikibo kbaumzie aruwanip jacobjohansen tasneemkoushar madusudanan nromriell brunoscaglione akashjkhamkar inishchith balajikrb syayi dnomyar everettvt shaoxt quentinfaidide afugazzotto nagarajatantry ayildirim21 magelisk dalinkw3nt ibnummuhammad bulkbeing th0ger tolmanam lloydchang sidraeffendi charans29 samhith-kakarla shubhamdixit863 santakd carloshkayser pancakes79 saniyakalamkar

numaflow's Issues

Refine logs for buffer validation in init containers

Following log is displayed in the init container which is expected by misleading, need to refine the log.

{""level"":""error"",""ts"":1662516312.3855174,""logger"":""numaflow.isbsvc-buffer-validate"",""caller"":""commands/isbsvc_buffer_validate.go:61"",""msg"":""Buffers validation failed, will retry if the limit is not reached"",""error"":""failed to query information of stream ""ss-osam-mlp-fci-pipeline-dev-devx-osammlpfci-usw2-e2e-ss-osam-mlp-fci-pipeline-numaflowosamrouter-fcipluginappintr-inf"", nats: stream not found"","

User Defined Source

Is your feature request related to a problem? Please describe.
Similar to User Defined Sink, the users should have the ability to write their own user defined sources without coupling the numaflow code.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Reduce logging in Redis ISB

Describe the bug
The Redis ISB Implementation has a lot of error logs that are getting displayed during unit testing. This makes debugging tests and identifying failures a lot more difficult

To Reproduce
Steps to reproduce the behavior:

Run make test and look at the logs

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

document the high level data-plane architecture

in-memory watermark store

We need to implement an in-memory watermark store so that we can do better unit testing of code.

User Defined Transformer and Filter

Summary

The input source can have a filter functionality to filter the incoming data.

What change needs making?

Use Cases

When would you use this?

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Install script

Is your feature request related to a problem? Please describe.
Make it easier to install NumaFlow, particularly as the install steps change between releases.

Describe the solution you'd like
An install script, e.g. install.bash.

Describe alternatives you've considered
A more verbose install guide that users must read.

Additional context
N/A

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

AWS SQS source and sink

Is your feature request related to a problem? Please describe.
Would be useful to be able to source events from AWS SQS.

Describe the solution you'd like
A source that turns GCP Pub/Sub events into messages for further processing.

Describe alternatives you've considered
None

Additional context
None

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

document the high level control-plane architecture

NoOp watermark support

use NoOp watermark progressor and avoid nil checks in forwarder.

GCP Pub/Sub source and sink

Is your feature request related to a problem? Please describe.
Useful for GCP users to be able to source events from the GCP Pub/Sub service.

Describe the solution you'd like
A source that turns GCP Pub/Sub events into messages for further processing.

Describe alternatives you've considered
None

Additional context
None

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Investigate if there's any memory leak in autoscaler

Describe the bug

Controller with 1GB memory keeps getting OOM restart.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Kubernetes: [e.g. v1.18.6]
Numaflow: [e.g. v0.5.1]

Additional context
Add any other context about the problem here.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

documentation around how the pipeline handles errors (retryable and non-retryable errors)

Summary

What change needs making?

Documentation around how the pipeline handles errors (retryable and non-retryable errors).

Use Cases

When would you use this?

This will help in understanding the platform better and help the developers in handling the retryable errors in a better way without panicking.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Refine repository settings

Includes but not limited to:

Security Policy
Openssf Best Practices
Issue templates
Slack channel (link)

Watermark displays incorrectly

Describe the bug

With a simple 3-vertex pipeline with generator source, start it with only the source with replica=1, and the other vertices with 0 replica. It is expected only the source vertex replica to show timestamp moving, however from the screenshot below, the 2nd vertex shows the same watermark, which is wrong.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Kubernetes: [e.g. v1.18.6]
Numaflow: [e.g. v0.5.1]

Additional context
Add any other context about the problem here.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

Documentation Feedback Improvements for Numaflow

Here are some feedback that we should consider for the documentation we have currently for NumaFlow .

We should follow the basic structure in https://github.com/argoproj/argo-workflows -Which provides uber level description of why we build Numaflow and pointers to location of documentation which includes the Quickstart.
We should add capability matrix
We should have a section that shares the point of view as to when users should use argo workflow versus numa
All examples documented needs to have the following criteria - how to set pipeline up, how to pump data into pipeline, how to view the results , how to validate the results
More details around how to create UDF and enabling it in the pipeline
There needs to be a hook between setting the pipeline and viewing the pipeline in the UI

Unit test always fails in main

It succeeds in PR.

Watermark shows -62135596800 when a fresh pipeline started without any traffic

Describe the bug

feat: HTTP pull source

Is your feature request related to a problem? Please describe.
Desirable to have a source that periodically pulls from an http endpoint to generate messages.
This would be very useful for demos as well since we can pull from services that everyone can access rather than just synthetically generated data.

Describe the solution you'd like
A source that generates a stream of data by periodically pulling from an http endpoint.

Describe alternatives you've considered
Using an S3 bucket as a source (not many interesting data is available to everyone in S3 buckets).

Additional context
None

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Vertex level latency calculation

Summary

Be able to provide vertex level latency data directly without going to he metrics dashboard.

Use Cases

It could be displayed in the UI.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Improve auto reconnection mechanism.

Is your feature request related to a problem? Please describe.

The following error message is logged when the pipeline stopped working. We need to have better auto-reconnection mechanism.

"{"level":"warn","ts":1654726946.6985655,"logger":"numaflow.sink-processor","caller":"forward/forward.go:161","msg":"failed to read fromBuffer","vertex":"http-pipeline-output","error":"failed to fetch messages from jet stream subject "http-pipeline-numaflow-system-http-pipeline-cat-output", nats: Leadership Change"}"

UI is not showing Pipeline if you pipeline has autoscaling

Describe the bug
A clear and concise description of what the bug is.
If the vertex scale to 0, UI is failed to show the pipeline on UI. looks like metrics API call is failing.

To Reproduce
Steps to reproduce the behavior:

....
....
....

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Kubernetes: [e.g. v1.18.6]
Numaflow: [e.g. v0.5.1]

Additional context
Add any other context about the problem here.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

Add Podman and Minikube to development tools

Summary

Docker is one container echo system podman is other .
k3d is Kubernetes local environment but it is not supported for all operations system for such we can offer minikube.

Use Cases

Development - these offers to increase flavors for :
local Kubernetes and container system.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Pipeline UI broken when vertex is scaled to 0

Describe the bug
When a vertex is scaled down to 0, the pipeline UI page display empty content, following error seen in the browser console.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Kubernetes: [e.g. v1.18.6]
Numaflow: [e.g. v0.5.1]

Additional context
Add any other context about the problem here.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

Show human-readable timestamps in Daemon Service log

Summary

Currently our Daemon Service log shows timestamp in epoch format.

{"level":"info","timestamp":1665774352.9100938,"logger":"numaflow","caller":"service/pipeline_watermark_query.go:123","msg":"Head watermark retrieved for vertex cat is -1."}

We should change it to human-readable timestamp for a better development experience. An ideal log could be as follow:

{"level":"info","timestamp":"Fri, 14 Oct 2022 19:07:16 GMT","logger":"numaflow","caller":"service/pipeline_watermark_query.go:123","msg":"Head watermark retrieved for vertex cat is -1."}

What change needs making?
Change logger configuration.

Use Cases

For developer to easily read logs with clear timestamps.

When would you use this?

Not urgent.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

replace JSON format of isb.message with a binary format

Today we have implemented MarshalBinary and UnmarshalBinary to encode and decode using JSON as the intermediary format. We should try out using binary formats.

The code change itself is minimal, we should try encoding/gob first.

Run the performance test before the change (measure CPU usage, TPS, etc.)
Make the change
Compare the performance

Kafka e2e test broken

Kafka e2e test is broken with following error.

2022/05/21 00:54:13 create Kafka topic "e2e-topic-jtl8v"
2022/05/21 00:54:13 GET http://127.0.0.1:8378/kafka/create-topic?topic=e2e-topic-jtl8v
2022/05/21 00:54:13 > 500 Internal Server Error
2022/05/21 00:54:13 > kafka: controller is not available
suite.go:63: test panicked: 500 Internal Server Error
goroutine 102 [running]:
runtime/debug.Stack()
/usr/local/Cellar/go/1.18.1/libexec/src/runtime/debug/stack.go:24 +0x65
github.com/stretchr/testify/suite.failOnPanic(0xc000682340)
/Users/jwang21/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:63 +0x3e
panic({0x2189a20, 0xc000614a80})
/usr/local/Cellar/go/1.18.1/libexec/src/runtime/panic.go:838 +0x207
github.com/numaproj/numaflow/test/fixtures.InvokeE2EAPI({0x23acbc9?, 0xf?}, {0xc0006ad508?, 0x1?, 0x1?})
/Users/jwang21/workspace/numaproj/numaflow/test/fixtures/e2eapi.go:31 +0x37e
github.com/numaproj/numaflow/test/fixtures.CreateKafkaTopic()
/Users/jwang21/workspace/numaproj/numaflow/test/fixtures/kafka.go:15 +0xe9
github.com/numaproj/numaflow/test/kafka-e2e.(*KafkaSuite).TestKafkaSourceSink(0xc0002569a0)
/Users/jwang21/workspace/numaproj/numaflow/test/kafka-e2e/kafka_test.go:77 +0x4a
reflect.Value.call({0xc0003fb800?, 0xc000114050?, 0x13?}, {0x23953b6, 0x4}, {0xc0000f9e70, 0x1, 0x1?})
/usr/local/Cellar/go/1.18.1/libexec/src/reflect/value.go:556 +0x845
reflect.Value.Call({0xc0003fb800?, 0xc000114050?, 0xc0002569a0?}, {0xc000082e70, 0x1, 0x1})
/usr/local/Cellar/go/1.18.1/libexec/src/reflect/value.go:339 +0xbf
github.com/stretchr/testify/suite.Run.func1(0xc000682340)
/Users/jwang21/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:158 +0x4b6
testing.tRunner(0xc000682340, 0xc000630480)
/usr/local/Cellar/go/1.18.1/libexec/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
/usr/local/Cellar/go/1.18.1/libexec/src/testing/testing.go:1486 +0x35f

Support gRPC based udf

Currently, we are using HTTP to communicate between the UDF container and the main container.
In order to support data processing paradigms, we need a schematize protocol to communicate between the UDF container and the main container, and gRPC seems to be the fit.

expose Namespace, Pipeline and Vertex name as environment variables

As a UDF or UDSink developer, I would like to emit custom metrics. WIth the custom metrics, I would like to attach namespace, vertex, and pipeline names as tags so I can filter those metrics easily. Today there is no way to get these names.

Improve code comments

Is your feature request related to a problem? Please describe.
Make it easier for new developers to read and understand the code.

Describe the solution you'd like
Add a block comment to each file and a comment for each method, type definition etc. describing the intended purpose for the code.

Describe alternatives you've considered
None

Additional context
None

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Smarter autoscaling

Summary

With current autoscaling strategy, a high throughput pipeline (e.g. lots of backlog need to process), when the pipeline processing rate hits the bottleneck (due to ISB or anything else), the replicas will still go up, unless it reaches scale.max or kind of balance, which is not expected.

We need to make the autoscaler smarter, for example, if a 5 replica vertex has similar performance to running 6 replicas, we should only do 5.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Flaky e2e test

kubectl kustomize test/manifests | sed '[email protected]/numaproj/@k3d-e2e-registry:5111/@' | sed 's/:latest/:de4bd57104487580b386a23461b2f2711db5c62b/' | kubectl -n numaflow-system apply -l app.kubernetes.io/part-of=numaflow --prune=false --force -f -
customresourcedefinition.apiextensions.k8s.io/interstepbufferservices.numaflow.numaproj.io created
customresourcedefinition.apiextensions.k8s.io/pipelines.numaflow.numaproj.io created
customresourcedefinition.apiextensions.k8s.io/vertices.numaflow.numaproj.io created
serviceaccount/numaflow-sa created
clusterrole.rbac.authorization.k8s.io/numaflow-aggregate-to-admin created
clusterrole.rbac.authorization.k8s.io/numaflow-aggregate-to-edit created
clusterrole.rbac.authorization.k8s.io/numaflow-aggregate-to-view created
clusterrole.rbac.authorization.k8s.io/numaflow-role created
clusterrolebinding.rbac.authorization.k8s.io/numaflow-binding created
configmap/numaflow-controller-config created
deployment.apps/controller-manager created
kubectl -n numaflow-system wait --for=condition=Ready --timeout 60s pod --all
error: no matching resources found
make: *** [Makefile:168: start] Error 1

Flaky test case

{"level":"error","ts":1654024581.5282629,"logger":"numaflow","caller":"forward/forward.go:312","msg":"Retrying failed msgs","errors":{"(writer) Buffer full! isb.BufferWriteErr{Name:"writer", Full:true, InternalErr:false, Message:"Buffer full!"}":1},"stacktrace":"github.com/numaproj/numaflow/pkg/isb/forward.(*InterStepDataForward).writeToBuffer\n\t/home/runner/work/numaflow/numaflow/pkg/isb/forward/forward.go:312\ngithub.com/numaproj/numaflow/pkg/isb/forward.(*InterStepDataForward).writeToBuffers\n\t/home/runner/work/numaflow/numaflow/pkg/isb/forward/forward.go:270\ngithub.com/numaproj/numaflow/pkg/isb/forward.(*InterStepDataForward).forwardAChunk\n\t/home/runner/work/numaflow/numaflow/pkg/isb/forward/forward.go:219\ngithub.com/numaproj/numaflow/pkg/isb/forward.(*InterStepDataForward).Start.func1\n\t/home/runner/work/numaflow/numaflow/pkg/isb/forward/forward.go:109"}
{"level":"info","ts":1654024581.528482,"logger":"numaflow","caller":"generator/tickgen.go:244","msg":"Context.Done is called. returning from the inner function"}
tickgen_test.go:75: msgs read [0]
tickgen_test.go:76:
Error Trace: tickgen_test.go:76
Error: "0" is not greater than "0"
Test: TestStop
--- FAIL: TestStop (0.14s)

UDF container crashing won't restart the pod causing the pipeline to stall

If the UDF container crashes, the pod continues to be in a running state. This is an irrecoverable state where the main processes keep trying to reach UDF, and it never succeeds. If the UDF container crashes, either we have to restart the UDF container or restart the pod.

To support namespaced installation

Summary

Numaflow controller/server should be abled installed in namespaced scope.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Add RED/Saturation metrics

Is your feature request related to a problem? Please describe.
I need to see the RED metrics for my pipeline like Inbound Messages, Outbound messages, Error count etc for my main car and udf side car

Describe the solution you'd like
Add the corresponding to prometheus

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Too many K/V consumers overwhelms JetStream cluster

Describe the bug
With watermarks enabled, when running too many pods with a vertex, there are too many consumers created for a K/V bucket, which crashes the JetStream cluster. In that scenario, the vertex pods can not connect the JetStream any more, and the jetstream pod complains [249] 2022/10/14 18:33:04.612790 [WRN] Consumer assignment not cleaned up, retrying a lot.

Here is an example, the vertex is scaled up to 40 pods, the consumers seen in the jetstream console reached more than 15K.

To Reproduce
Steps to reproduce the behavior:

....
....
....

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Kubernetes: [e.g. v1.24]
Numaflow: [e.g. v0.6.0]

Additional context
Add any other context about the problem here.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

move watermark at millisecond granularity

Currently, the watermark is progressing at second granularity. This is too slow for near-realtime systems. Please make sure the progression at the millisecond level will not affect the overall load on the watermark store.

Tasks

write a proposal on what it takes to make watermark progress at the millisecond level
implement millisecond progression

Flaky E2E tests caused by port-forward and logs streaming

Using port-forward to verify the testing results from Pod logs is widely used in E2E tests, which became unreliable recently - sometimes it fails with message like below, which means streaming logs fails, and usually a retry makes through.

Watching POD: simple-pipeline-watermark-cat3-0-np6xu
error: Get "https://172.18.0.3:10250/containerLogs/numaflow-system/simple-pipeline-watermark-cat3-0-np6xu/main?follow=true": EOF    functional_test.go:180: Expected vertex pod log contains "Start processing udf messages"

This issue could be caused by:

The runner instances assigned by Github action has limited resources - this issue started to happen after we had more test cases;
Unknown bugs in our code - log streaming or port-forward not released?
k3s version matters (v1.21 worked well, v1.21+ became flaky).

Long termly, we need to figure out a way to run E2E tests with a new approach.

Refine E2E test cases

More e2e test coverage.

Pipeline level latency

Summary

Be able to provide the latency of a pipeline (from source to sinks).

Use Cases

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Unexpected scaling down to 0

When a vertex has all the messages read and processed, and keeps retrying to write to next ISB (due to back pressure), the pending messages calculation only counts number of pending, which is 0. This will lead to a scaling down to 0, thus a pod termination will happen (even though the code logic tries to do clean shutdown after receiving SIGTERM, the pod has terminationGracePeriodSeconds which will do a force shutdown), which will end up with ack pending messages not 0 in the buffer information of the UI, while the pod number is 0.

Vertex Pod creation failure no reflected in Vertex or Pipeline status

When the vertex controller encounters failures on pod creation (such as out of quota), it keeps retrying but does not reflect it in the vertex status, thus it's not clear to the user what's wrong unless checking the controller log.

JS error when click a vertex with zero pod

Describe the bug
After scaling down to 0 from the vertex input, click it on the UI, there's error seen as below screenshot.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Kubernetes: [e.g. v1.18.6]
Numaflow: [e.g. v0.5.1]

Additional context
Add any other context about the problem here.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

move from redis master to redis cluster

Current REDIS based ISB implementation does not support REDIS in cluster mode. To have a scalable ISB, we need to support REDIS cluster.

Generate swagger file for client libs.

Swagger file is needed for multi language client generation.

Pipeline stops due to error "nats: no responders available for request"

Seeing following error which caused pipeline stops.

2022-06-23T23:51:50.431Z ERROR numaflow.udf-processor jetstream/reader.go:134 Failed to ack message {"vertex": "kafka-rw-pipeline-p1", "bufferReader": "xxxxx", "stream": "kafka-rw-pipeline-oss-analytics-sampledataflow-usw2-prd-kafka-rw-pipeline-input-p1", "subject": "kafka-rw-pipeline-oss-analytics-sampledataflow-usw2-prd-kafka-rw-pipeline-input-p1", "error": "nats: no responders available for request"}

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

Versioning udf/udsink request/response

Summary

The contract between the main container and udf, udsink container needs to be versioned, in case there's any change in the future so that it won't be broken.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

flaky test

2022-05-24T18:29:10.0207030Z === RUN   TestJetStreamBufferWriterBufferFull
2022-05-24T18:29:10.0208160Z     writer_test.go:167: 
2022-05-24T18:29:10.0208496Z            Error Trace:    writer_test.go:167
2022-05-24T18:29:10.0209167Z            Error:          Expected nil, but got: isb.BufferWriteErr{Name:"testJetStreamBufferWrite", Full:true, InternalErr:false, Message:"Buffer full!"}
2022-05-24T18:29:10.0209684Z            Test:           TestJetStreamBufferWriterBufferFull
2022-05-24T18:29:10.0209994Z     writer_test.go:167: 
2022-05-24T18:29:10.0210640Z            Error Trace:    writer_test.go:167
2022-05-24T18:29:10.0211344Z            Error:          Expected nil, but got: isb.BufferWriteErr{Name:"testJetStreamBufferWrite", Full:true, InternalErr:false, Message:"Buffer full!"}
2022-05-24T18:29:10.0211863Z            Test:           TestJetStreamBufferWriterBufferFull
2022-05-24T18:29:10.0227631Z     writer_test.go:174: 
2022-05-24T18:29:10.0228178Z            Error Trace:    writer_test.go:174
2022-05-24T18:29:10.0228590Z            Error:          An error is expected but got nil.
2022-05-24T18:29:10.0229042Z            Test:           TestJetStreamBufferWriterBufferFull
2022-05-24T18:29:10.0229716Z --- FAIL: TestJetStreamBufferWriterBufferFull (2.03s)
2022-05-24T18:29:10.0230319Z panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2022-05-24T18:29:10.0230801Z    panic: runtime error: invalid memory address or nil pointer dereference
2022-05-24T18:29:10.0231363Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1312bd6]
2022-05-24T18:29:10.0231765Z 
2022-05-24T18:29:10.0231887Z goroutine 198 [running]:
2022-05-24T18:29:10.0232206Z testing.tRunner.func1.2({0x13ca380, 0x1dfc2f0})
2022-05-24T18:29:10.0232915Z    /opt/hostedtoolcache/go/1.17.10/x64/src/testing/testing.go:1209 +0x36c
2022-05-24T18:29:10.0233242Z testing.tRunner.func1()
2022-05-24T18:29:10.0233796Z    /opt/hostedtoolcache/go/1.17.10/x64/src/testing/testing.go:1212 +0x3b6
2022-05-24T18:29:10.0234354Z panic({0x13ca380, 0x1dfc2f0})
2022-05-24T18:29:10.0235031Z    /opt/hostedtoolcache/go/1.17.10/x64/src/runtime/panic.go:1047 +0x266
2022-05-24T18:29:10.0235506Z github.com/numaproj/numaflow/pkg/isb/jetstream.TestJetStreamBufferWriterBufferFull(0x0)
2022-05-24T18:29:10.0236269Z    /home/runner/work/numaflow/numaflow/pkg/isb/jetstream/writer_test.go:175 +0x776
2022-05-24T18:29:10.0236627Z testing.tRunner(0xc0000fed00, 0x15417b8)
2022-05-24T18:29:10.0236996Z    /opt/hostedtoolcache/go/1.17.10/x64/src/testing/testing.go:1259 +0x230
2022-05-24T18:29:10.0237304Z created by testing.(*T).Run
2022-05-24T18:29:10.0237661Z    /opt/hostedtoolcache/go/1.17.10/x64/src/testing/testing.go:1306 +0x727
2022-05-24T18:29:10.0238011Z FAIL       github.com/numaproj/numaflow/pkg/isb/jetstream  3.261s

Implement shuffle

We need to pigeonhole keyed and non-keyed partitions to a fixed number of ISBs. This should be a consistent operation (a partition should always be mapped to the same ISB).

Write a whitepaper on
- the requirements of shuffle
- implementation details
Implementation

numaproj / numaflow Goto Github PK

numaflow's Introduction

Numaflow

Use Cases

Key Features

Data Integrity Guarantees

Roadmap

Demo

Resources

numaflow's People

Contributors

Stargazers

Watchers

Forkers

numaflow's Issues

Summary

Use Cases

Summary

Use Cases

Summary

Use Cases

Summary

Use Cases

Summary

Use Cases

When would you use this?

Summary

Summary

Summary

Use Cases

Summary

Recommend Projects

Recommend Topics

Recommend Org