Giter VIP home page Giter VIP logo

numaproj / numaflow Goto Github PK

View Code? Open in Web Editor NEW
1.1K 21.0 111.0 30.79 MB

Kubernetes-native platform to run massively parallel data/streaming jobs

Home Page: https://numaflow.numaproj.io/

License: Apache License 2.0

Dockerfile 0.12% Makefile 0.40% Go 70.23% Awk 0.01% Shell 0.65% Smarty 0.07% Lua 0.02% HTML 0.04% CSS 0.41% TypeScript 15.69% JavaScript 0.01% Rust 12.31% Python 0.02% Mustache 0.02%
kubernetes stream-processing data-processing k8s pipeline map-reduce hacktoberfest

numaflow's Introduction

Numaflow

Go Report Card slack GoDoc License Release Version CII Best Practices

Welcome to Numaflow! A Kubernetes-native, serverless platform for running scalable and reliable event-driven applications. Numaflow decouples event sources and sinks from the processing logic, allowing each component to independently auto-scale based on demand. With out-of-the-box sources and sinks, and built-in observability, developers can focus on their processing logic without worrying about event consumption, writing boilerplate code, or operational complexities. Each step of the pipeline can be written in any programming language, offering unparalleled flexibility in using the best programming language for each step and ease of using the languages you are most familiar with.

Numaflow, created by the Intuit Argo team to address community needs for continuous event processing, leverages their expertise to deliver a scalable and robust, serverless platform for event-driven applications.

Numaflow Pipeline

Use Cases

  • Event driven applications: Process events as they happen, e.g., updating inventory and sending customer notifications in e-commerce.
  • Real time analytics: Analyze data instantly, e.g., social media analytics, observability data processing.
  • Inference on streaming data: Perform real-time predictions, e.g., anomaly detection.
  • Workflows running in a streaming manner.

Key Features

  • Kubernetes-native: If you know Kubernetes, you already know how to use Numaflow.
  • Serverless: Focus on your code and let the system scale up and down based on demand.
  • Language agnostic: Use your favorite programming language.
  • Exactly-Once semantics: No input element is duplicated or lost even as pods are rescheduled or restarted.
  • Auto-scaling with back-pressure: Each vertex automatically scales from zero to whatever is needed.

Data Integrity Guarantees

  • Minimally provide at-least-once semantics
  • Provide exactly-once semantics for unbounded and near real-time data sources
  • Preserving order is not required

Roadmap

  • Mono Vertex to bypass ISB for simple use cases (1.4)

Demo

Numaflow Demo

Resources

numaflow's People

Contributors

ashwinidulams avatar ayildirim21 avatar bbehnke avatar bulkbeing avatar chandankumar4 avatar darshansimha avatar dependabot[bot] avatar dpadhiar avatar dseapy avatar edlee2121 avatar github-actions[bot] avatar inishchith avatar jasonzeshengchen avatar juliev0 avatar jy4096 avatar keranyang avatar kohlisid avatar krithika3 avatar mshakira avatar rohanashar avatar samhith-kakarla avatar shubhamdixit863 avatar syayi avatar tasneemkoushar avatar tczhao avatar veds-g avatar vigith avatar whynowy avatar xdevxy avatar yhl25 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

numaflow's Issues

Refine logs for buffer validation in init containers

Following log is displayed in the init container which is expected by misleading, need to refine the log.

{""level"":""error"",""ts"":1662516312.3855174,""logger"":""numaflow.isbsvc-buffer-validate"",""caller"":""commands/isbsvc_buffer_validate.go:61"",""msg"":""Buffers validation failed, will retry if the limit is not reached"",""error"":""failed to query information of stream ""ss-osam-mlp-fci-pipeline-dev-devx-osammlpfci-usw2-e2e-ss-osam-mlp-fci-pipeline-numaflowosamrouter-fcipluginappintr-inf"", nats: stream not found"","

User Defined Source

Is your feature request related to a problem? Please describe.
Similar to User Defined Sink, the users should have the ability to write their own user defined sources without coupling the numaflow code.


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Reduce logging in Redis ISB

Describe the bug
The Redis ISB Implementation has a lot of error logs that are getting displayed during unit testing. This makes debugging tests and identifying failures a lot more difficult

To Reproduce
Steps to reproduce the behavior:

  1. Run make test and look at the logs

Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

User Defined Transformer and Filter

Summary

The input source can have a filter functionality to filter the incoming data.

What change needs making?

Use Cases

When would you use this?


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Install script

Is your feature request related to a problem? Please describe.
Make it easier to install NumaFlow, particularly as the install steps change between releases.

Describe the solution you'd like
An install script, e.g. install.bash.

Describe alternatives you've considered
A more verbose install guide that users must read.

Additional context
N/A


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

AWS SQS source and sink

Is your feature request related to a problem? Please describe.
Would be useful to be able to source events from AWS SQS.

Describe the solution you'd like
A source that turns GCP Pub/Sub events into messages for further processing.

Describe alternatives you've considered
None

Additional context
None


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

GCP Pub/Sub source and sink

Is your feature request related to a problem? Please describe.
Useful for GCP users to be able to source events from the GCP Pub/Sub service.

Describe the solution you'd like
A source that turns GCP Pub/Sub events into messages for further processing.

Describe alternatives you've considered
None

Additional context
None


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Investigate if there's any memory leak in autoscaler

Describe the bug

Controller with 1GB memory keeps getting OOM restart.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Kubernetes: [e.g. v1.18.6]
  • Numaflow: [e.g. v0.5.1]

Additional context
Add any other context about the problem here.


Message from the maintainers:

Impacted by this bug? Give it a ๐Ÿ‘. We often sort issues this way to know what to prioritize.

documentation around how the pipeline handles errors (retryable and non-retryable errors)

Summary

What change needs making?

Documentation around how the pipeline handles errors (retryable and non-retryable errors).

Use Cases

When would you use this?

This will help in understanding the platform better and help the developers in handling the retryable errors in a better way without panicking.


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Watermark displays incorrectly

Describe the bug

With a simple 3-vertex pipeline with generator source, start it with only the source with replica=1, and the other vertices with 0 replica. It is expected only the source vertex replica to show timestamp moving, however from the screenshot below, the 2nd vertex shows the same watermark, which is wrong.

Screen Shot 2022-08-08 at 5 58 59 PM

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Kubernetes: [e.g. v1.18.6]
  • Numaflow: [e.g. v0.5.1]

Additional context
Add any other context about the problem here.


Message from the maintainers:

Impacted by this bug? Give it a ๐Ÿ‘. We often sort issues this way to know what to prioritize.

Documentation Feedback Improvements for Numaflow

Here are some feedback that we should consider for the documentation we have currently for NumaFlow .

  • We should follow the basic structure in https://github.com/argoproj/argo-workflows -Which provides uber level description of why we build Numaflow and pointers to location of documentation which includes the Quickstart.
  • We should add capability matrix
  • We should have a section that shares the point of view as to when users should use argo workflow versus numa
  • All examples documented needs to have the following criteria - how to set pipeline up, how to pump data into pipeline, how to view the results , how to validate the results
  • More details around how to create UDF and enabling it in the pipeline
  • There needs to be a hook between setting the pipeline and viewing the pipeline in the UI

feat: HTTP pull source

Is your feature request related to a problem? Please describe.
Desirable to have a source that periodically pulls from an http endpoint to generate messages.
This would be very useful for demos as well since we can pull from services that everyone can access rather than just synthetically generated data.

Describe the solution you'd like
A source that generates a stream of data by periodically pulling from an http endpoint.

Describe alternatives you've considered
Using an S3 bucket as a source (not many interesting data is available to everyone in S3 buckets).

Additional context
None


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Vertex level latency calculation

Summary

Be able to provide vertex level latency data directly without going to he metrics dashboard.

Use Cases

It could be displayed in the UI.


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Improve auto reconnection mechanism.

Is your feature request related to a problem? Please describe.

The following error message is logged when the pipeline stopped working. We need to have better auto-reconnection mechanism.

"{"level":"warn","ts":1654726946.6985655,"logger":"numaflow.sink-processor","caller":"forward/forward.go:161","msg":"failed to read fromBuffer","vertex":"http-pipeline-output","error":"failed to fetch messages from jet stream subject "http-pipeline-numaflow-system-http-pipeline-cat-output", nats: Leadership Change"}"

UI is not showing Pipeline if you pipeline has autoscaling

Describe the bug
A clear and concise description of what the bug is.
If the vertex scale to 0, UI is failed to show the pipeline on UI. looks like metrics API call is failing.

To Reproduce
Steps to reproduce the behavior:

  1. ....
  2. ....
  3. ....

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Kubernetes: [e.g. v1.18.6]
  • Numaflow: [e.g. v0.5.1]

Additional context
Add any other context about the problem here.


Message from the maintainers:

Impacted by this bug? Give it a ๐Ÿ‘. We often sort issues this way to know what to prioritize.

Add Podman and Minikube to development tools

Summary

Docker is one container echo system podman is other .
k3d is Kubernetes local environment but it is not supported for all operations system for such we can offer minikube.

Use Cases

Development - these offers to increase flavors for :
local Kubernetes and container system.


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Pipeline UI broken when vertex is scaled to 0

Describe the bug
When a vertex is scaled down to 0, the pipeline UI page display empty content, following error seen in the browser console.

Screen Shot 2022-08-08 at 2 47 46 PM

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Kubernetes: [e.g. v1.18.6]
  • Numaflow: [e.g. v0.5.1]

Additional context
Add any other context about the problem here.


Message from the maintainers:

Impacted by this bug? Give it a ๐Ÿ‘. We often sort issues this way to know what to prioritize.

Show human-readable timestamps in Daemon Service log

Summary

Currently our Daemon Service log shows timestamp in epoch format.

{"level":"info","timestamp":1665774352.9100938,"logger":"numaflow","caller":"service/pipeline_watermark_query.go:123","msg":"Head watermark retrieved for vertex cat is -1."}

We should change it to human-readable timestamp for a better development experience. An ideal log could be as follow:

{"level":"info","timestamp":"Fri, 14 Oct 2022 19:07:16 GMT","logger":"numaflow","caller":"service/pipeline_watermark_query.go:123","msg":"Head watermark retrieved for vertex cat is -1."}

What change needs making?
Change logger configuration.

Use Cases

For developer to easily read logs with clear timestamps.

When would you use this?

Not urgent.


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

replace JSON format of isb.message with a binary format

Today we have implemented MarshalBinary and UnmarshalBinary to encode and decode using JSON as the intermediary format. We should try out using binary formats.

The code change itself is minimal, we should try encoding/gob first.

  • Run the performance test before the change (measure CPU usage, TPS, etc.)
  • Make the change
  • Compare the performance

Kafka e2e test broken

Kafka e2e test is broken with following error.

2022/05/21 00:54:13 create Kafka topic "e2e-topic-jtl8v"
2022/05/21 00:54:13 GET http://127.0.0.1:8378/kafka/create-topic?topic=e2e-topic-jtl8v
2022/05/21 00:54:13 > 500 Internal Server Error
2022/05/21 00:54:13 > kafka: controller is not available
suite.go:63: test panicked: 500 Internal Server Error
goroutine 102 [running]:
runtime/debug.Stack()
/usr/local/Cellar/go/1.18.1/libexec/src/runtime/debug/stack.go:24 +0x65
github.com/stretchr/testify/suite.failOnPanic(0xc000682340)
/Users/jwang21/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:63 +0x3e
panic({0x2189a20, 0xc000614a80})
/usr/local/Cellar/go/1.18.1/libexec/src/runtime/panic.go:838 +0x207
github.com/numaproj/numaflow/test/fixtures.InvokeE2EAPI({0x23acbc9?, 0xf?}, {0xc0006ad508?, 0x1?, 0x1?})
/Users/jwang21/workspace/numaproj/numaflow/test/fixtures/e2eapi.go:31 +0x37e
github.com/numaproj/numaflow/test/fixtures.CreateKafkaTopic()
/Users/jwang21/workspace/numaproj/numaflow/test/fixtures/kafka.go:15 +0xe9
github.com/numaproj/numaflow/test/kafka-e2e.(*KafkaSuite).TestKafkaSourceSink(0xc0002569a0)
/Users/jwang21/workspace/numaproj/numaflow/test/kafka-e2e/kafka_test.go:77 +0x4a
reflect.Value.call({0xc0003fb800?, 0xc000114050?, 0x13?}, {0x23953b6, 0x4}, {0xc0000f9e70, 0x1, 0x1?})
/usr/local/Cellar/go/1.18.1/libexec/src/reflect/value.go:556 +0x845
reflect.Value.Call({0xc0003fb800?, 0xc000114050?, 0xc0002569a0?}, {0xc000082e70, 0x1, 0x1})
/usr/local/Cellar/go/1.18.1/libexec/src/reflect/value.go:339 +0xbf
github.com/stretchr/testify/suite.Run.func1(0xc000682340)
/Users/jwang21/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:158 +0x4b6
testing.tRunner(0xc000682340, 0xc000630480)
/usr/local/Cellar/go/1.18.1/libexec/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
/usr/local/Cellar/go/1.18.1/libexec/src/testing/testing.go:1486 +0x35f

Support gRPC based udf

Currently, we are using HTTP to communicate between the UDF container and the main container.
In order to support data processing paradigms, we need a schematize protocol to communicate between the UDF container and the main container, and gRPC seems to be the fit.

Improve code comments

Is your feature request related to a problem? Please describe.
Make it easier for new developers to read and understand the code.

Describe the solution you'd like
Add a block comment to each file and a comment for each method, type definition etc. describing the intended purpose for the code.

Describe alternatives you've considered
None

Additional context
None


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Smarter autoscaling

Summary

With current autoscaling strategy, a high throughput pipeline (e.g. lots of backlog need to process), when the pipeline processing rate hits the bottleneck (due to ISB or anything else), the replicas will still go up, unless it reaches scale.max or kind of balance, which is not expected.

We need to make the autoscaler smarter, for example, if a 5 replica vertex has similar performance to running 6 replicas, we should only do 5.


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Flaky e2e test

kubectl kustomize test/manifests | sed '[email protected]/numaproj/@k3d-e2e-registry:5111/@' | sed 's/:latest/:de4bd57104487580b386a23461b2f2711db5c62b/' | kubectl -n numaflow-system apply -l app.kubernetes.io/part-of=numaflow --prune=false --force -f -
customresourcedefinition.apiextensions.k8s.io/interstepbufferservices.numaflow.numaproj.io created
customresourcedefinition.apiextensions.k8s.io/pipelines.numaflow.numaproj.io created
customresourcedefinition.apiextensions.k8s.io/vertices.numaflow.numaproj.io created
serviceaccount/numaflow-sa created
clusterrole.rbac.authorization.k8s.io/numaflow-aggregate-to-admin created
clusterrole.rbac.authorization.k8s.io/numaflow-aggregate-to-edit created
clusterrole.rbac.authorization.k8s.io/numaflow-aggregate-to-view created
clusterrole.rbac.authorization.k8s.io/numaflow-role created
clusterrolebinding.rbac.authorization.k8s.io/numaflow-binding created
configmap/numaflow-controller-config created
deployment.apps/controller-manager created
kubectl -n numaflow-system wait --for=condition=Ready --timeout 60s pod --all
error: no matching resources found
make: *** [Makefile:168: start] Error 1

Flaky test case

{"level":"error","ts":1654024581.5282629,"logger":"numaflow","caller":"forward/forward.go:312","msg":"Retrying failed msgs","errors":{"(writer) Buffer full! isb.BufferWriteErr{Name:"writer", Full:true, InternalErr:false, Message:"Buffer full!"}":1},"stacktrace":"github.com/numaproj/numaflow/pkg/isb/forward.(*InterStepDataForward).writeToBuffer\n\t/home/runner/work/numaflow/numaflow/pkg/isb/forward/forward.go:312\ngithub.com/numaproj/numaflow/pkg/isb/forward.(*InterStepDataForward).writeToBuffers\n\t/home/runner/work/numaflow/numaflow/pkg/isb/forward/forward.go:270\ngithub.com/numaproj/numaflow/pkg/isb/forward.(*InterStepDataForward).forwardAChunk\n\t/home/runner/work/numaflow/numaflow/pkg/isb/forward/forward.go:219\ngithub.com/numaproj/numaflow/pkg/isb/forward.(*InterStepDataForward).Start.func1\n\t/home/runner/work/numaflow/numaflow/pkg/isb/forward/forward.go:109"}
{"level":"info","ts":1654024581.528482,"logger":"numaflow","caller":"generator/tickgen.go:244","msg":"Context.Done is called. returning from the inner function"}
tickgen_test.go:75: msgs read [0]
tickgen_test.go:76:
Error Trace: tickgen_test.go:76
Error: "0" is not greater than "0"
Test: TestStop
--- FAIL: TestStop (0.14s)

To support namespaced installation

Summary

Numaflow controller/server should be abled installed in namespaced scope.


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Add RED/Saturation metrics

Is your feature request related to a problem? Please describe.
I need to see the RED metrics for my pipeline like Inbound Messages, Outbound messages, Error count etc for my main car and udf side car

Describe the solution you'd like
Add the corresponding to prometheus

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Too many K/V consumers overwhelms JetStream cluster

Describe the bug
With watermarks enabled, when running too many pods with a vertex, there are too many consumers created for a K/V bucket, which crashes the JetStream cluster. In that scenario, the vertex pods can not connect the JetStream any more, and the jetstream pod complains [249] 2022/10/14 18:33:04.612790 [WRN] Consumer assignment not cleaned up, retrying a lot.

Here is an example, the vertex is scaled up to 40 pods, the consumers seen in the jetstream console reached more than 15K.

Screen Shot 2022-10-14 at 11 29 22 AM

To Reproduce
Steps to reproduce the behavior:

  1. ....
  2. ....
  3. ....

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Kubernetes: [e.g. v1.24]
  • Numaflow: [e.g. v0.6.0]

Additional context
Add any other context about the problem here.


Message from the maintainers:

Impacted by this bug? Give it a ๐Ÿ‘. We often sort issues this way to know what to prioritize.

move watermark at millisecond granularity

Currently, the watermark is progressing at second granularity. This is too slow for near-realtime systems. Please make sure the progression at the millisecond level will not affect the overall load on the watermark store.

Tasks

  • write a proposal on what it takes to make watermark progress at the millisecond level
  • implement millisecond progression

Flaky E2E tests caused by port-forward and logs streaming

Using port-forward to verify the testing results from Pod logs is widely used in E2E tests, which became unreliable recently - sometimes it fails with message like below, which means streaming logs fails, and usually a retry makes through.

Watching POD: simple-pipeline-watermark-cat3-0-np6xu
error: Get "https://172.18.0.3:10250/containerLogs/numaflow-system/simple-pipeline-watermark-cat3-0-np6xu/main?follow=true": EOF    functional_test.go:180: Expected vertex pod log contains "Start processing udf messages"

This issue could be caused by:

  1. The runner instances assigned by Github action has limited resources - this issue started to happen after we had more test cases;
  2. Unknown bugs in our code - log streaming or port-forward not released?
  3. k3s version matters (v1.21 worked well, v1.21+ became flaky).

Long termly, we need to figure out a way to run E2E tests with a new approach.

Pipeline level latency

Summary

Be able to provide the latency of a pipeline (from source to sinks).

Use Cases


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

Unexpected scaling down to 0

When a vertex has all the messages read and processed, and keeps retrying to write to next ISB (due to back pressure), the pending messages calculation only counts number of pending, which is 0. This will lead to a scaling down to 0, thus a pod termination will happen (even though the code logic tries to do clean shutdown after receiving SIGTERM, the pod has terminationGracePeriodSeconds which will do a force shutdown), which will end up with ack pending messages not 0 in the buffer information of the UI, while the pod number is 0.

JS error when click a vertex with zero pod

Describe the bug
After scaling down to 0 from the vertex input, click it on the UI, there's error seen as below screenshot.

Screen Shot 2022-08-08 at 6 13 07 PM

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Kubernetes: [e.g. v1.18.6]
  • Numaflow: [e.g. v0.5.1]

Additional context
Add any other context about the problem here.


Message from the maintainers:

Impacted by this bug? Give it a ๐Ÿ‘. We often sort issues this way to know what to prioritize.

Pipeline stops due to error "nats: no responders available for request"

Seeing following error which caused pipeline stops.

2022-06-23T23:51:50.431Z ERROR numaflow.udf-processor jetstream/reader.go:134 Failed to ack message {"vertex": "kafka-rw-pipeline-p1", "bufferReader": "xxxxx", "stream": "kafka-rw-pipeline-oss-analytics-sampledataflow-usw2-prd-kafka-rw-pipeline-input-p1", "subject": "kafka-rw-pipeline-oss-analytics-sampledataflow-usw2-prd-kafka-rw-pipeline-input-p1", "error": "nats: no responders available for request"}


Message from the maintainers:

Impacted by this bug? Give it a ๐Ÿ‘. We often sort issues this way to know what to prioritize.

Versioning udf/udsink request/response

Summary

The contract between the main container and udf, udsink container needs to be versioned, in case there's any change in the future so that it won't be broken.


Message from the maintainers:

If you wish to see this enhancement implemented please add a ๐Ÿ‘ reaction to this issue! We often sort issues this way to know what to prioritize.

flaky test

2022-05-24T18:29:10.0207030Z === RUN   TestJetStreamBufferWriterBufferFull
2022-05-24T18:29:10.0208160Z     writer_test.go:167: 
2022-05-24T18:29:10.0208496Z            Error Trace:    writer_test.go:167
2022-05-24T18:29:10.0209167Z            Error:          Expected nil, but got: isb.BufferWriteErr{Name:"testJetStreamBufferWrite", Full:true, InternalErr:false, Message:"Buffer full!"}
2022-05-24T18:29:10.0209684Z            Test:           TestJetStreamBufferWriterBufferFull
2022-05-24T18:29:10.0209994Z     writer_test.go:167: 
2022-05-24T18:29:10.0210640Z            Error Trace:    writer_test.go:167
2022-05-24T18:29:10.0211344Z            Error:          Expected nil, but got: isb.BufferWriteErr{Name:"testJetStreamBufferWrite", Full:true, InternalErr:false, Message:"Buffer full!"}
2022-05-24T18:29:10.0211863Z            Test:           TestJetStreamBufferWriterBufferFull
2022-05-24T18:29:10.0227631Z     writer_test.go:174: 
2022-05-24T18:29:10.0228178Z            Error Trace:    writer_test.go:174
2022-05-24T18:29:10.0228590Z            Error:          An error is expected but got nil.
2022-05-24T18:29:10.0229042Z            Test:           TestJetStreamBufferWriterBufferFull
2022-05-24T18:29:10.0229716Z --- FAIL: TestJetStreamBufferWriterBufferFull (2.03s)
2022-05-24T18:29:10.0230319Z panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2022-05-24T18:29:10.0230801Z    panic: runtime error: invalid memory address or nil pointer dereference
2022-05-24T18:29:10.0231363Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1312bd6]
2022-05-24T18:29:10.0231765Z 
2022-05-24T18:29:10.0231887Z goroutine 198 [running]:
2022-05-24T18:29:10.0232206Z testing.tRunner.func1.2({0x13ca380, 0x1dfc2f0})
2022-05-24T18:29:10.0232915Z    /opt/hostedtoolcache/go/1.17.10/x64/src/testing/testing.go:1209 +0x36c
2022-05-24T18:29:10.0233242Z testing.tRunner.func1()
2022-05-24T18:29:10.0233796Z    /opt/hostedtoolcache/go/1.17.10/x64/src/testing/testing.go:1212 +0x3b6
2022-05-24T18:29:10.0234354Z panic({0x13ca380, 0x1dfc2f0})
2022-05-24T18:29:10.0235031Z    /opt/hostedtoolcache/go/1.17.10/x64/src/runtime/panic.go:1047 +0x266
2022-05-24T18:29:10.0235506Z github.com/numaproj/numaflow/pkg/isb/jetstream.TestJetStreamBufferWriterBufferFull(0x0)
2022-05-24T18:29:10.0236269Z    /home/runner/work/numaflow/numaflow/pkg/isb/jetstream/writer_test.go:175 +0x776
2022-05-24T18:29:10.0236627Z testing.tRunner(0xc0000fed00, 0x15417b8)
2022-05-24T18:29:10.0236996Z    /opt/hostedtoolcache/go/1.17.10/x64/src/testing/testing.go:1259 +0x230
2022-05-24T18:29:10.0237304Z created by testing.(*T).Run
2022-05-24T18:29:10.0237661Z    /opt/hostedtoolcache/go/1.17.10/x64/src/testing/testing.go:1306 +0x727
2022-05-24T18:29:10.0238011Z FAIL       github.com/numaproj/numaflow/pkg/isb/jetstream  3.261s

Implement shuffle

We need to pigeonhole keyed and non-keyed partitions to a fixed number of ISBs. This should be a consistent operation (a partition should always be mapped to the same ISB).

  • Write a whitepaper on
    • the requirements of shuffle
    • implementation details
  • Implementation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.