Giter VIP home page Giter VIP logo

Comments (8)

regadas avatar regadas commented on June 2, 2024

Hi @elanv sorry for the late reply.

The blocking mode was added to support the use cases like collecting metrics if users are relying on Apache Beam; It's not possible to do it in detached mode; This, however, can potentially be deprecated once we add support for Application mode but we are not there yet.

Right now I want us to keep supporting this mode and I do like the idea of if there's already running jobs to fail the submission and let the user address it.

from flink-on-k8s-operator.

elanv avatar elanv commented on June 2, 2024

Thank you for explaining @regadas.
I'm also looking for support from upstream to resolve this issue.
https://issues.apache.org/jira/browse/FLINK-24365

from flink-on-k8s-operator.

regadas avatar regadas commented on June 2, 2024

Hi @elanv! I was revisiting this issue and from a snd look the JobId does get printed out in both modes; So guess to accurately track it we can inspect the pod logs get it from there instead

from flink-on-k8s-operator.

elanv avatar elanv commented on June 2, 2024

Looks good! Thanks for the work @regadas.
When that PR is merged, I would proceed my work.

When I looked more closely, the ID was not printed out in version <= 1.9 with blocking mode. (This could be guided by document)

For now, your work seems the best way. And if application mode is supported later, it would be good if a more stable method is introduced because it doesn't seem reliable getting the ID from a pod, the transient, either via termination log or via log stream. If Beam is supported in application mode later, it may be worth considering introducing Flink REST API that can submit a job with the ID created by the operator. I tried to utilize REST API in the past, but it was not applied due to several issues. (GoogleCloudPlatform/flink-on-k8s-operator#372)

from flink-on-k8s-operator.

regadas avatar regadas commented on June 2, 2024

Hi @elanv!

Are you sure it's not being logged? Looking at the release 1.9.3 it should https://github.com/apache/flink/blob/6d23b2c38c7a8fd8855063238c744923e1985a63/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L764

from flink-on-k8s-operator.

elanv avatar elanv commented on June 2, 2024

@regadas I tested again a while ago with Flink versions 1.9.0 ~ 1.9.3 and the result is the same.
I canceled the job after 8 minunute in Flink web console, additional logs are printed out after "The program finished with the following exception", but there was no ID until the end.

---------- Submitting job ----------
/opt/flink/bin/flink run --jobmanager t13-jobmanager:8081 --class org.apache.flink.streaming.examples.statemachine.StateMachineExample --parallelism 1 ./examples/streaming/StateMachineExample.jar --brokers sm-cp-kafka.kafka.svc.cluster.local:9092 --kafka-topic flink-app
Starting execution of program
Usage with built-in data generator: StateMachineExample [--error-rate <probability-of-invalid-transition>] [--sleep <sleep-per-record-in-ms>]
Usage with Kafka: StateMachineExample --kafka-topic <topic> [--brokers <brokers>]
Options for both the above setups:
	[--backend <file|rocks>]
	[--checkpoint-dir <filepath>]
	[--async-checkpoints <true|false>]
	[--incremental-checkpoints <true|false>]
	[--output <filepath> OR null for stdout]

Reading from kafka topic flink-app @ sm-cp-kafka.kafka.svc.cluster.local:9092


------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 7a95df7eb581c2f98bf7a1f26eb24af1)
	at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
	at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:60)
	at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
	at org.apache.flink.streaming.examples.statemachine.StateMachineExample.main(StateMachineExample.java:142)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604)
	at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466)
	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
	at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
	at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
	at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
	at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1008)
	at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1081)
	at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1081)
Caused by: org.apache.flink.runtime.client.JobCancellationException: Job was cancelled.
	at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:149)
	at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
	... 18 more

from flink-on-k8s-operator.

regadas avatar regadas commented on June 2, 2024

Ahah I see; I also initially thought that could be do to logging properties but according to the line I shared it should have been logging to stdout as well ...

So this means that neither of the solutions will work for older versions.

from flink-on-k8s-operator.

elanv avatar elanv commented on June 2, 2024

In detatched mode, the ID was printed out in the version <= 1.9. It looks like Flink client bug.

from flink-on-k8s-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.