Comments (8)
Hi @elanv sorry for the late reply.
The blocking mode was added to support the use cases like collecting metrics if users are relying on Apache Beam; It's not possible to do it in detached mode; This, however, can potentially be deprecated once we add support for Application mode but we are not there yet.
Right now I want us to keep supporting this mode and I do like the idea of if there's already running jobs to fail the submission and let the user address it.
from flink-on-k8s-operator.
Thank you for explaining @regadas.
I'm also looking for support from upstream to resolve this issue.
https://issues.apache.org/jira/browse/FLINK-24365
from flink-on-k8s-operator.
Hi @elanv! I was revisiting this issue and from a snd look the JobId does get printed out in both modes; So guess to accurately track it we can inspect the pod logs get it from there instead
from flink-on-k8s-operator.
Looks good! Thanks for the work @regadas.
When that PR is merged, I would proceed my work.
When I looked more closely, the ID was not printed out in version <= 1.9 with blocking mode. (This could be guided by document)
For now, your work seems the best way. And if application mode is supported later, it would be good if a more stable method is introduced because it doesn't seem reliable getting the ID from a pod, the transient, either via termination log or via log stream. If Beam is supported in application mode later, it may be worth considering introducing Flink REST API that can submit a job with the ID created by the operator. I tried to utilize REST API in the past, but it was not applied due to several issues. (GoogleCloudPlatform/flink-on-k8s-operator#372)
from flink-on-k8s-operator.
Hi @elanv!
Are you sure it's not being logged? Looking at the release 1.9.3
it should https://github.com/apache/flink/blob/6d23b2c38c7a8fd8855063238c744923e1985a63/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L764
from flink-on-k8s-operator.
@regadas I tested again a while ago with Flink versions 1.9.0
~ 1.9.3
and the result is the same.
I canceled the job after 8 minunute in Flink web console, additional logs are printed out after "The program finished with the following exception", but there was no ID until the end.
---------- Submitting job ----------
/opt/flink/bin/flink run --jobmanager t13-jobmanager:8081 --class org.apache.flink.streaming.examples.statemachine.StateMachineExample --parallelism 1 ./examples/streaming/StateMachineExample.jar --brokers sm-cp-kafka.kafka.svc.cluster.local:9092 --kafka-topic flink-app
Starting execution of program
Usage with built-in data generator: StateMachineExample [--error-rate <probability-of-invalid-transition>] [--sleep <sleep-per-record-in-ms>]
Usage with Kafka: StateMachineExample --kafka-topic <topic> [--brokers <brokers>]
Options for both the above setups:
[--backend <file|rocks>]
[--checkpoint-dir <filepath>]
[--async-checkpoints <true|false>]
[--incremental-checkpoints <true|false>]
[--output <filepath> OR null for stdout]
Reading from kafka topic flink-app @ sm-cp-kafka.kafka.svc.cluster.local:9092
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 7a95df7eb581c2f98bf7a1f26eb24af1)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:60)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
at org.apache.flink.streaming.examples.statemachine.StateMachineExample.main(StateMachineExample.java:142)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1008)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1081)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1081)
Caused by: org.apache.flink.runtime.client.JobCancellationException: Job was cancelled.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:149)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
... 18 more
from flink-on-k8s-operator.
Ahah I see; I also initially thought that could be do to logging properties but according to the line I shared it should have been logging to stdout as well ...
So this means that neither of the solutions will work for older versions.
from flink-on-k8s-operator.
In detatched mode, the ID was printed out in the version <= 1.9. It looks like Flink client bug.
from flink-on-k8s-operator.
Related Issues (20)
- Cluster stuck in Updating state if PodDisruptionBudget is set
- Wrong job status after job update.
- Rework examples
- Allow Flink to ignore savepoint on restore if the states of the old and new jobs are incompatible
- Create a new cluster before deleting the old one on the job update
- FlinkCluster stuck in Updating state when PDB is used.
- Add HorizontalPodAutoScaller properties to FlinkCluster spec HOT 1
- QUESTION: how to get sample app WordCount.jar to run with version 1.15.3 and 2 taskmanager replicas HOT 3
- Caused by GSSException: No valid credentials provided(Mechanism level: Failed to find any Kerberos tgt)
- Validation Error `nodeaffinity` rule for the flinkcluster HOT 2
- HPA not creating new pods on scale event HOT 3
- Pod Affinity Feature Causing Flink Pipeline Redeployment to Fail HOT 2
- poddisruptionbudget is not allowing any disruptions HOT 1
- While using application mode, the jobmanager pod is not restarted when killed
- Flink Operator Loses Job Manager Contact during EKS upgrade HOT 10
- Question: is the latest CRD backwards compatible with the CRD from 0.30 HOT 3
- Job Manager is not brought back up HOT 1
- Application mode Job Manager restart can create multiple FlinkJobs
- Streaming Application mode Jobs can sometimes reach completed stage
- If the job submitter fails, the job keeps running
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flink-on-k8s-operator.