Create example DC with: <a href="https://paste.fedorapr

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Doesn't detect failed replication controller/deployment configuration,about openshift/jenkins-plugin

Comments (80)

livelace commented on July 19, 2024

In other words, why we don't subscribe to event of pod status/why we don't wait completion of pod creation ? Can we check pod status through "Verify OpenShift Deployment" ?

from jenkins-plugin.

gabemontero commented on July 19, 2024

@livelace the "Verify OpenShift Deployment" step currently stops after seeing the RC go to Complete, but after seeing you scenario, I realize it could do better.

I'll start looking into including a monitor of the deploy pod status into that step's logic (perhaps the other deploy related steps as well - we'll review).

@bparees - FYI

from jenkins-plugin.

bparees commented on July 19, 2024

@livelace perhaps you could use the http check step to confirm the pod is running? or a readiness check in your DC that confirms the pod came up (which will block the deployment completion).

from jenkins-plugin.

livelace commented on July 19, 2024

@bparees My service is not HTTP capable, I thought about this.

My case:

First build step - start service1.
Second build step - start service2.
I want to start third build step, which depend from 1/2. I get problems:

a) I don't know that service1 and service2 is up and running and all hooks completed. I can't stop Jenkins tasks, because I think that is all right.

b) I can't scale deployments to zero at the proper time, because I don't know that all tasks inside pods are completed.

I can't properly manage tasks, because I don't know about states of tasks.

from jenkins-plugin.

gabemontero commented on July 19, 2024

Not to overly distract from this thread but I should have deployer pod
state verification working either later today or tomorrow.

On Thursday, April 7, 2016, Oleg Popov [email protected] wrote:

@bparees https://github.com/bparees My service is not HTTP capable, I
thought about this.

My case:

First build step - start service1.

Second build step - start service2.

I want to start third build step, which depend from 1/2. I get
problems:

a) I don't know that service1 and service2 is up and running and all hooks
completed. I can't stop Jenkins tasks, because I think that is all right.

b) I can't scale deployments to zero at the proper time, because I don't
know that all tasks inside pods are completed.

I can't properly manage tasks, because I don't know about states of tasks.

—
You are receiving this because you were assigned.
Reply to this email directly or view it on GitHub
#33 (comment)

from jenkins-plugin.

livelace commented on July 19, 2024

@gabemontero It will be great!

from jenkins-plugin.

bparees commented on July 19, 2024

@gabemontero deployer pod state, or just pod state?

from jenkins-plugin.

gabemontero commented on July 19, 2024

@bparees I'll look for both to a degree. Testing shows the deployer pod is
prunned minimally if successful. So I'll first see if we have a deployer
pod in a non complete state. If a deployer pod no longer exists, I'll
confirm that a running pod exists for the correct gen of the deployment.

On Thursday, April 7, 2016, Ben Parees [email protected] wrote:

@gabemontero https://github.com/gabemontero deployer pod state, or just
pod state?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#33 (comment)

from jenkins-plugin.

bparees commented on July 19, 2024

the replication controller (deployment) ought to reflect the state of the deployer pod, so i don't see the value in looking at the deployer pod.

from jenkins-plugin.

gabemontero commented on July 19, 2024

I have not seen that yet at least on what i was previously examing from the
output provided and my duplication with the evil post start hook but I'll
double check when i get back to the office. The deployment phase still
said complete.

On Thursday, April 7, 2016, Ben Parees [email protected] wrote:

the replication controller (deployment) ought to reflect the state of the
deployer pod, so i don't see the value in looking at the deployer pod.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#33 (comment)

from jenkins-plugin.

gabemontero commented on July 19, 2024

Yep, at least with the latest level from upstream origin, @bparees is correct wrt the RC being sufficient. Adding the same the same lifecycle: postStart sabotage, the RC ends up in Failed state per the deployment.phase annotation on the RC. I think my earlier repo did not go far enough or something. Could have swore I saw it go to Complete, but I now consistently see it go to Failed after several runs.

So we are either at two spots @livelace :

you could try adding a "Verify OpenShift Deployment" step and hopefully you see the same results
if your output at https://paste.fedoraproject.org/350950/60028895/ was in fact captured after the Pod failed, then I suspect your version of OpenShift is far back enough from the latest were you are seeing a difference in deployment behavior (certainly that component has evolved some this last release cycle). If that is the case, it may be simply a matter of when you can upgrade.

from jenkins-plugin.

livelace commented on July 19, 2024

Not working:

[root@openshift-master1 ~]# oc version
oc v1.1.6
kubernetes v1.2.0-36-g4a3f9c5
Jenkins console output (verbose mode), job with verification, job completed without any errors:

https://paste.fedoraproject.org/351461/91294146/

RC status:

https://paste.fedoraproject.org/351462/46009139/

[root@openshift-master1 ~]# oc get rc
NAME DESIRED CURRENT AGE
testing-11.0-drweb-netcheck-nossl-peer1-1 0 0 17h
testing-11.0-drweb-netcheck-nossl-peer1-2 1 1 16h
testing-11.0-drweb-netcheck-nossl-peer2-1 0 0 17h
testing-11.0-drweb-netcheck-nossl-peer2-2 0 0 16h
testing-11.0-drweb-netcheck-nossl-peer3-1 0 0 17h
testing-11.0-drweb-netcheck-nossl-peer3-2 0 0 16h

Pod status:

https://paste.fedoraproject.org/351463/46009150/
http://prntscr.com/apk3ey

from jenkins-plugin.

livelace commented on July 19, 2024

NAME READY STATUS RESTARTS AGE
testing-11.0-drweb-netcheck-nossl-peer1-2-6zkg7 0/1 CrashLoopBackOff 14 1h

from jenkins-plugin.

livelace commented on July 19, 2024

"Verify whether the pods are up" in settings will be enough :)

from jenkins-plugin.

gabemontero commented on July 19, 2024

@livelace I'll see if I can pull a v1.1.6 version of openshift and reproduce what you are seeing, but at the moment, it appears that we are falling into category 2) from my earlier comment. If that does prove to be true, than rather than adding the new step, we'll want you to try the existing step against v1.2.0 when it becomes available (that is the "latest version" I was testing against).

from jenkins-plugin.

gabemontero commented on July 19, 2024

@livelace - one additional request while I try to reproduce at a lower level of code - when you reproduce, is the equivalent of the testing-11.0-drweb-netcheck-nossl-peer1-2-deploy pod from your last repro staying around long enough for you to dump its contents to json/yaml ? If so, can you provide that as well (assuming you'll need to reproduce again to do so)

thanks

from jenkins-plugin.

gabemontero commented on July 19, 2024

ok, I went to the same level as @livelace and could not reproduce. One additional question did occur to me ... do you create a successful deployment, the scale it down, edit the DC to introduce the
lifecycle: postStart: exec: command: - /bin/sh - -c - exit 1
and then scale to 1 and verify deployment?

from jenkins-plugin.

livelace commented on July 19, 2024

@gabemontero Hello.

No, DC has hook from the beginning.

from jenkins-plugin.

livelace commented on July 19, 2024

After creating "DC" has zero count.

from jenkins-plugin.

livelace commented on July 19, 2024

Creation progress - https://paste.fedoraproject.org/351916/14601346/

from jenkins-plugin.

livelace commented on July 19, 2024

Error - https://paste.fedoraproject.org/351917/60134739/

from jenkins-plugin.

livelace commented on July 19, 2024

After error occur I can scale down DC and to repeat all again.

from jenkins-plugin.

livelace commented on July 19, 2024

I can modify script (exit 0) that runs inside hook and all be fine with DC (without any modification of configuration).

I can modify script (exit 0) during attempt of set up DC, and DC will be work fine.

PS. It is possible because I can use dedicated script, that contain "exit 1"

from jenkins-plugin.

gabemontero commented on July 19, 2024

On Fri, Apr 8, 2016 at 12:58 PM, Oleg Popov [email protected]
wrote:

Creation progress - https://paste.fedoraproject.org/351916/14601346/

Hey @livelace - not sure what you mean by "creation progress". I just see
another Pod yaml for a Pod created by a replication controller.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#33 (comment)

from jenkins-plugin.

livelace commented on July 19, 2024

"Creation progress" - scale DC to 1.

from jenkins-plugin.

gabemontero commented on July 19, 2024

Thanks for the additional details. I have a couple of thoughts on
reworking my repo attempts. I'll report back when I have something
tangible.

On Fri, Apr 8, 2016 at 1:34 PM, Gabe Montero [email protected] wrote:

On Fri, Apr 8, 2016 at 12:58 PM, Oleg Popov [email protected]
wrote:

Creation progress - https://paste.fedoraproject.org/351916/14601346/

Hey @livelace - not sure what you mean by "creation progress". I just see
another Pod yaml for a Pod created by a replication controller.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#33 (comment)

from jenkins-plugin.

livelace commented on July 19, 2024

During hour I can grant access to my test environment, I think.

from jenkins-plugin.

livelace commented on July 19, 2024

@gabemontero Can you connect over SSH to my environment ?

from jenkins-plugin.

gabemontero commented on July 19, 2024

OK, I've reproduced it. I did:

before starting a deployment, added your lifecycle/podstart with exit 0
deployed, then scaled back down to 0
edited the DC, changing lifecycle/podstart to exit 1
scaled to 1 ... Pod fails, but next gen of RC says it completed successfully.

Note, if I start with the lifecycle/podstart exitting with 1 and initial replicas of 1, then the RC is marked as failed. This is basically what my recent repo attempts did. And now that I understand what is going on, I'm pretty positive that my very first repro attempt, where I saw the RC in complete state, was when I edited a previously used DC to added the lifecycle/podstart with exit 1 check. So good for me that I was not imagining things originally :-).

Now, what to do about this. It is not a given we want to address this with a new plugin step.

this could be a deployment bug that needs to get addressed, with the RC reflecting the state of the pod
the nuance of updating a DC which has been deployed one way, scaled down, editted, and redployed though could be "against current design" or some such.
certainly the lifecycle/podstart induced failure is merely a means for producing an unexpected container start up failure, but are there some nuances wrt using that to tank the container, where a container dying on startup "naturally" will have different characteristics

@bparees: thoughts? ... and I thought about tagging our friends in platform mgmt now, but decided on getting a sanity check from you before officially pulling that trigger.

from jenkins-plugin.

gabemontero commented on July 19, 2024

I'll try the exit 1, but initial replicas 0 permutation, then scale to 1, as well ... see if that is different.

from jenkins-plugin.

livelace commented on July 19, 2024

It's strange, the problem exist at once after DC import in my situation. But in my situation initial replica count equals 0.

from jenkins-plugin.

gabemontero commented on July 19, 2024

So it also occurred for me when:

create with exit 1 and replicas 0
RC is created with state "Complete", but of course no pod was started up
then scale to 1, and RC stays complete when pod fails.

So one interpretation is that the openshift.io/deployment.phase annotation on the RC is only updated when the RC is initially created (as part of doing the first deployment, where replicas could be either 1 or 0). If we cause the Pod failure in conjunction with the RC initially coming up, that annotation reflects the error. But once the RC is created, perhaps that annotation is no longer maintained (either by design, or incorrectly, and hence a bug). If by design, then I'm not seeing where else in the RC we could infer Pod state. Perhaps I'm missing something, but if not, then the plugin step does in fact have to pull the Pod up directly.

Next steps from my perspective, the @bparees sanity check, followed by most likely platform team engagement, with either bug fix on their end, or the original change I was envisioning for "verify openshift deployment" to check Pod state in addition to RC state.

from jenkins-plugin.

livelace commented on July 19, 2024

It is cool, thanks for your help!

from jenkins-plugin.

bparees commented on July 19, 2024

@gabemontero it sounds like it's probably working correctly if i understand the scenario. The RC did complete successfully (deployed successfully). The fact that the RC can't be scaled up to 1 because basically you've got a bad pod definition isn't going to reverse the fact that the deployment succeeded. (that is, scaling is not the same as deploying).

it is a bit hokey since you have to start with a count of 0 to get there. If the original replica count was 1, the deployment never would have succeeded, as you saw.

so you can run it by platform management, but i think it's basically working as we'd expect... so the question comes back to "what, if anything, can we do about this?"

doesn't the replica count verification step handle this scenario? that is, you can always add another step to verify that the correct number of replicas are running, which in this scenario, they won't be.

from jenkins-plugin.

livelace commented on July 19, 2024

@bparees Not sure that is "bad pod definition", because it is correctly. I set my script, which should start after pod initialization, but this script definitely can return 0/1 and DC should reflect to these probable situations.

from jenkins-plugin.

gabemontero commented on July 19, 2024

The replica count of the RC is showing 1 in this error scenario @bparees.
I have a "live" version of the error state and that is what it is showing.
And yeah, based on what you outlined, I don't think we should pull in
platform mgmt.

Thus, I think we need to introduce the Pod state verification I had started
earlier this week.

On Fri, Apr 8, 2016 at 3:04 PM, Ben Parees [email protected] wrote:

@gabemontero https://github.com/gabemontero it sounds like it's
probably working correctly if i understand the scenario. The RC did
complete successfully (deployed successfully). The fact that the RC can't
be scaled up to 1 because basically you've got a bad pod definition isn't
going to reverse the fact that the deployment succeeded. (that is, scaling
is not the same as deploying).

it is a bit hokey since you have to start with a count of 0 to get there.
If the original replica count was 1, the deployment never would have
succeeded, as you saw.

so you can run it by platform management, but i think it's basically
working as we'd expect... so the question comes back to "what, if anything,
can we do about this?"

doesn't the replica count verification step handle this scenario? that is,
you can always add another step to verify that the correct number of
replicas are running, which in this scenario, they won't be.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#33 (comment)

from jenkins-plugin.

gabemontero commented on July 19, 2024

Add it within the existing "verify openshift deployment"

On Fri, Apr 8, 2016 at 3:15 PM, Gabe Montero [email protected] wrote:

The replica count of the RC is showing 1 in this error scenario @bparees.
I have a "live" version of the error state and that is what it is showing.
And yeah, based on what you outlined, I don't think we should pull in
platform mgmt.

Thus, I think we need to introduce the Pod state verification I had
started earlier this week.

On Fri, Apr 8, 2016 at 3:04 PM, Ben Parees [email protected]
wrote:

@gabemontero https://github.com/gabemontero it sounds like it's
probably working correctly if i understand the scenario. The RC did
complete successfully (deployed successfully). The fact that the RC can't
be scaled up to 1 because basically you've got a bad pod definition isn't
going to reverse the fact that the deployment succeeded. (that is, scaling
is not the same as deploying).

it is a bit hokey since you have to start with a count of 0 to get there.
If the original replica count was 1, the deployment never would have
succeeded, as you saw.

so you can run it by platform management, but i think it's basically
working as we'd expect... so the question comes back to "what, if anything,
can we do about this?"

doesn't the replica count verification step handle this scenario? that
is, you can always add another step to verify that the correct number of
replicas are running, which in this scenario, they won't be.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#33 (comment)

from jenkins-plugin.

bparees commented on July 19, 2024

if the pod isn't reported as running, it seems like a mistake for the RC to be reporting the replica count as 1.

from jenkins-plugin.

bparees commented on July 19, 2024

@livelace if you want to prevent the deployment from succeeding, you need to use a readiness check, not a post-start hook.

from jenkins-plugin.

gabemontero commented on July 19, 2024

On Fri, Apr 8, 2016 at 3:28 PM, Ben Parees [email protected] wrote:

if the pod isn't reported as running, it seems like a mistake for the RC
to be reporting the replica count as 1.

I could see applying your earlier rationale to this facet as well. Perhaps
we should engage platform mgmt here, but
I'm still coming to the mind that adding the redundancy of checking the pod
state is still a good thing irregardless, given
the complexities and on-going (at least from my perspective) evolution for
this particular area.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#33 (comment)

from jenkins-plugin.

livelace commented on July 19, 2024

@bparees I understand this, but not all configuration contain service. I think, that with liveness probes we can devise something, but it complicate task, instead of just detect pod status.

from jenkins-plugin.

ironcladlou commented on July 19, 2024

@Kargakis with a post lifecycle hook, can the container become ready and then not ready if the hook fails? It's possible the old rolling updater will observe the first ready state and move on even though the pod will become not ready soon.

from jenkins-plugin.

ironcladlou commented on July 19, 2024

Upon closer inspection, @bparees was right with earlier statements: the example is using the default rolling params, so 25% max unavailable (for the desired replica count of 1, this means 0 is the minimum ready pods to maintain during the update is 0). So, scale-down of the old RC will proceed regardless of new RC pod readiness. The absence of a readiness check on the RC's container specs means that the new RC's active pod count will not be coupled to pod readiness, and it doesn't seem like lifecycle post hook failure will affect the RC's active replica count.

Seems like you need a readiness check to define what it means for your pod to be ready; the failure of a post hook may not necessarily imply the pod is not ready.

from jenkins-plugin.

0xmichalis commented on July 19, 2024

The rolling updater doesn't care about readiness when scaling up:/

from jenkins-plugin.

livelace commented on July 19, 2024

Not working with liveness/readiness probes:

Liveness case:

nginx-live.yaml - https://paste.fedoraproject.org/352642/14602069/
RC status - https://paste.fedoraproject.org/352639/46020692/
Pod status - https://paste.fedoraproject.org/352640/02069641/

Readiness case:

nginx-ready.yaml - https://paste.fedoraproject.org/352646/60207412/
RC status - https://paste.fedoraproject.org/352650/20746814/
Pod status - https://paste.fedoraproject.org/352652/02075051/

All cases were checked with Jenkins task (scale + verify).

from jenkins-plugin.

livelace commented on July 19, 2024

[root@openshift-master1 ~]# cat /share/run.sh
#!/bin/bash

exit 1
[root@openshift-master1 ~]# cat /share/check.sh
#!/bin/bash

exit 0

from jenkins-plugin.

livelace commented on July 19, 2024

Guys, tell me something, please :)

from jenkins-plugin.

ironcladlou commented on July 19, 2024

Two things here:

If your concern is a violation of your minimum availability requirements, use readiness checks and a >0 availability threshold. The rolling updater won't scale down below your threshold given you have readiness checks in place.
If your concern is the updater scaling up without regards to pod readiness, we'll need to take the discussion to origin/kubernetes, because the rolling updater progresses scale-ups based on replica count of the RC which doesn't take readiness into account.

cc @Kargakis

from jenkins-plugin.

gabemontero commented on July 19, 2024

Hey @livelace - to build on what @ironcladlou outlined, @bparees and I have had some discussion offline. I have a prototype for the plugin which inspects Pod state, but @bparees has convinced me that it only handles rudimentary cases, and that we should finish the path of understanding what is needed for the ReplicationController to more accurately reflect the state of the Pods.... ideally, we still want the jenkins plugin to stop its examination at the ReplicationController level, and leverage all the infrastructure in place on the DC, RC, and Pod side (and avoid duplicating similar tech in the plugin).

But let's see how @ironcladlou 's 1) and 2) progress, and then we'll level set.

from jenkins-plugin.

livelace commented on July 19, 2024

@ironcladlou @gabemontero

Ok, thanks. I understood this and agree with yours conclusions. At this moment I can detect DC and hook statuses by myself (through "status" files). But I must to know, how this case will be solved.

from jenkins-plugin.

gabemontero commented on July 19, 2024

Hey @livelace - to get resolution, at least based on my interpretation of things, they key point is 2) from #33 (comment).

To that end, an issue should be opened against https://github.com/openshift/origin, referencing this issue and basically copying/pasting @ironcladlou 's item 2) from that comment.

As the end user, it would be best if you open the issue. Are you comfortable doing that?

thanks

from jenkins-plugin.

gabemontero commented on July 19, 2024

Quick status update: leaving this issue open for now, and will monitor openshift/origin#8507 to see how that progresses. I'm anticipating things progressing so that the jenkins-plugin can continue to take its validation logic only to the replication controller level, but let's wait and see. Ideally that occurs and I'll then close this out.

from jenkins-plugin.

0xmichalis commented on July 19, 2024

All the links are dead unfortunately. Why a readiness probe is not working for your case @livelace? There are different kinds of probes, if you cannot use http, you can run shell commands (exec) or try to open a tcp connection.

from jenkins-plugin.

livelace commented on July 19, 2024

@Kargakis No, readiness and liveness probes don't working in this case. I repeat @bparees statement

"the fundamental issue in my mind is that the replication controller is reporting an active count of 1 despite the fact that the only pod that exists is in a FAILED state."

from jenkins-plugin.

0xmichalis commented on July 19, 2024

Ok. Actually we want, that lifecycle hooks will be processed in context of deployment status. If hook ended with error - deployment should be considered as a failure. Main process of pod may working well and pass all tests (liveness/readiness) but if hook returned error - deployment status should be - failed, because hook is important part of pod and it readiness.

Sorry for jumping from one thread to the other but upstream deployments have no hooks yet. If your problem is that you want a failed hook to fail your deployment you can specify FailurePolicy == Abort

from jenkins-plugin.

0xmichalis commented on July 19, 2024

@livelace any news here? Did hooks work for you?

from jenkins-plugin.

livelace commented on July 19, 2024

@Kargakis I'm sad :(

apiVersion: v1
kind: List
items:

- apiVersion: "v1"
  kind: "DeploymentConfig"
  metadata:
    name: "test"
  spec:
    template:
      metadata:
        labels:
          name: "test"
      spec:
        containers:
          - 
            name: "nginx"
            image: "nginx:latest"

            lifecycle:
              postStart:
                exec:
                  command: [ "exit", "1" ]
    replicas: 1
    selector:
      name: "test"

    strategy:
      type: "Rolling"
      rollingParams:
        pre:
          failurePolicy: "Abort"
          execNewPod:
            containerName: "nginx"
            command: [ "true" ]

[root@openshift-master1 ~]# oc get pods
NAME READY STATUS RESTARTS AGE
test-1-deploy 1/1 Running 0 8m
test-1-hook-pre 0/1 Completed 0 5m
test-1-oo71c 0/1 CrashLoopBackOff 3 6m

[root@openshift-master1 ~]# oc get dc
NAME REVISION REPLICAS TRIGGERED BY
test 1 1 config

from jenkins-plugin.

livelace commented on July 19, 2024

If pod inside broken - we should mark DC as failed and replica count should be 0, because existing replica isn't the same that working replica is.

from jenkins-plugin.

0xmichalis commented on July 19, 2024

@livelace the timeout for deployments is at 10m. If you waited 2 more minutes you would see that the deployment will be marked as failed and eventually scaled down to zero:)

[vagrant@localhost sample-app]$ oc get pods
NAME              READY     STATUS             RESTARTS   AGE
test-1-deploy     1/1       Running            0          10m
test-1-hook-pre   0/1       Completed          0          10m
test-1-utjds      0/1       CrashLoopBackOff   6          9m
[vagrant@localhost sample-app]$ oc get pods
NAME              READY     STATUS      RESTARTS   AGE
test-1-deploy     0/1       Error       0          10m
test-1-hook-pre   0/1       Completed   0          10m
[vagrant@localhost sample-app]$ oc status
In project test on server https://10.0.2.15:8443

dc/test deploys docker.io/library/nginx:latest 
  deployment #1 failed 10 minutes ago

1 warning identified, use 'oc status -v' to see details.

Note that your use of the deployment hook didn't do anything and the deployment hook was complete. I think in your case you would want a post hook with Abort policy with a script that makes sure your application pod is up and running. Also readiness probes can help.

Regarding the timeout, we will make it configurable eventually.

from jenkins-plugin.

livelace commented on July 19, 2024

@Kargakis

The deployment hook is useful when we should be sure that new version of software working as expected and go back to working deployment if fail. But we have different situation. Our deployment should be launched and if container hook (which do some bunch of things inside, it is dynamic configuration) return error - deployment should be marked as failed.

The deployment hook isn't working for us, because it works only in deployment. But we need possibility when container hook cause an error during scaling. We need failure policy for container hook.

from jenkins-plugin.

0xmichalis commented on July 19, 2024

@smarterclayton @ironcladlou, @livelace wants container hooks to be taken into account for deployments. Thoughts?

from jenkins-plugin.

0xmichalis commented on July 19, 2024

@livelace actually can you try to run a deployment with a container postStart hook that fails and see if it works for you after 10 minutes?

from jenkins-plugin.

livelace commented on July 19, 2024

@Kargakis

dc/test deploys docker.io/library/nginx:latest 
  deployment #1 failed 11 minutes ago

Every 1,0s: oc get pods | grep ^test-                                                                                                                           Tue May 31 20:16:58 2016

test-1-deploy                                      0/1       Error       0          13m

[root@openshift-master1 ~]# oc get rc
NAME      DESIRED   CURRENT   AGE
test-1    0         0         17m

@bparees @gabemontero Can we detect and wait this behavior during "Scale Deployment" in Jenkins ?

from jenkins-plugin.

livelace commented on July 19, 2024

"Verify whether the specified number of replicas are up" already exist.

from jenkins-plugin.

bparees commented on July 19, 2024

@livelace detect and wait for what exactly? I still haven't seen a satisfactory answer for why the replication controller is reporting N current pods when those pods are in a failed state.

from jenkins-plugin.

livelace commented on July 19, 2024

@bparees @Kargakis Ok. I think, minimum two variants:

Report about replica count on early stage (immediately after a container hook return an error).
When we do "Scale Deployment", we may wait and check that this deployment not in failed state.
?

from jenkins-plugin.

0xmichalis commented on July 19, 2024

I still haven't seen a satisfactory answer for why the replication controller is reporting N current pods when those pods are in a failed state.

Because it's not the job for a replication controller but for a deployment. The rc/rs will always report what it has created but it cannot know if those pods are running.

Report about replica count on early stage (immediately after a container hook return an error).
When we do "Scale Deployment", we may wait and check that this deployment not in failed state.

You shouldn't deploy zero replicas and scale after the fact. That's why we use deployments in the first place, otherwise we would still use replication controllers. Deployments ensure that your pods are able to run. Replication controllers cannot do that by design.

from jenkins-plugin.

livelace commented on July 19, 2024

Deployments ensure that your pods are able to run. Replication controllers cannot do that by design.

Our deployments are always can run and working without any problem. Pods work excellent. But we use container hooks, which launch integration tests with other services in different pods.

If we will do all possible deployment configurations which will contain all possible combination of our software - it's impossible. And we can't hold pods online all time.

from jenkins-plugin.

0xmichalis commented on July 19, 2024

But we use container hooks, which launch integration tests with other services in different pods.

Would it make sense to group all those containers together? Or are those other services independent components of your system?

And we can't hold pods online all time.

Do you really need to scale down to zero here or could you just stop directing traffic to those pods?

from jenkins-plugin.

livelace commented on July 19, 2024

Would it make sense to group all those containers together? Or are those other services independent components of your system?

Yes, it is. For example: three containers, each should be set specific settings and all communicate each other + exist "external" services (not in the same pod) and they also in connection with other.

Do you really need to scale down to zero here or could you just stop directing traffic to those pods?

We need:

step 1: start/scale DC to N
step 2: wait tests completion
step 3: stop/scale DC to zero

from jenkins-plugin.

0xmichalis commented on July 19, 2024

We need:

step 1: start/scale DC to N
step 2: wait tests completion
step 3: stop/scale DC to zero

Try setting dc.spec.test=true

You should deploy it everytime you need it to run (oc deploy NAME --latest).

from jenkins-plugin.

livelace commented on July 19, 2024

@Kargakis Thanks, but:

``
[root@openshift-master1 11.0]# oc explain dc.spec.test
FIELD: test

DESCRIPTION:
Test ensures that this deployment config will have zero replicas except
while a deployment is running. This allows the deployment config to be used
as a continuous deployment test - triggering on images, running the
deployment, and then succeeding or failing. Post strategy hooks and After
actions can be used to integrate successful deployment with an action.
``

We need:

Scale DC1 (contain some services with some settings/tests through container hook)
Scale DC2 (contain some services with some settings/tests through container hook)
Scale DC3, which communicate with DC1 and DC2 (contain some services with some settings/tests through container hook).
Wait results.
Scale DC1/DC2/DC3 to zero

And what we have, steps:

Trigger deployment DC1. Deployment DC1 done and powered down.
Trigger deployment DC2. Deployment DC2 done and powered down.
DC3 can't communicate with DC1/DC2, because they was stopped.

from jenkins-plugin.

0xmichalis commented on July 19, 2024

There are a couple of things you can do in such a scenario. One is you can setup posthooks in DC1/DC2 to wait for DC3 to complete. DC3 comes up, runs its tests, completes, DC1 and DC2 complete. All are scaled down automatically because they have dc.spec.test=true.

You can also play around with custom deployments:
http://lists.openshift.redhat.com/openshift-archives/dev/2016-May/msg00037.html

from jenkins-plugin.

0xmichalis commented on July 19, 2024

There are a couple of things you can do in such a scenario. One is you can setup posthooks in DC1/DC2 to wait for DC3 to complete. DC3 comes up, runs its tests, completes, DC1 and DC2 complete. All are scaled down automatically because they have dc.spec.test=true.

You could also setup a prehook for DC3 to start DC1 and DC2 and wait for them to be running so you would need to run just DC3.

$ oc deploy dc3 --latest
---> DC3 prehook starts DC1 and DC2 and waits for them to be running...
---> DC1 and DC2 are running
---> Prehook exits
---> DC1 and DC2 should wait on their posthooks at this point
---> DC3 is running
---> DC3 completes, is scaled down because it has dc.spec.test=true
---> Posthooks for DC1 and DC2 exit, they complete and are scaled down because dc.spec.test=true

Complicated but it could work.

from jenkins-plugin.

livelace commented on July 19, 2024

@Kargakis Thanks, I will try it later.

PS. Thorny our way :)

from jenkins-plugin.

0xmichalis commented on July 19, 2024

The thing is that by scaling deployment configs up/down to zero instead of actually deploying them, you lose all the benefits you get from using deployments. Replication controllers by design cannot detect failures and I don't think it will ever change especially in the light of ... having deployments:)

from jenkins-plugin.

0xmichalis commented on July 19, 2024

DC3 also sounds a lot like a Job:
https://docs.openshift.org/latest/dev_guide/jobs.html

from jenkins-plugin.

livelace commented on July 19, 2024

The thing is that by scaling deployment configs up/down to zero instead of actually deploying them, you lose all the benefits you get from using deployments.

Actually deployments work fine, because they trigger by "image change" and it's Ok.

DC3 - it's most easy thing in configuration. I saw "Jobs" previously.

from jenkins-plugin.

livelace commented on July 19, 2024

Main challenge - use multiple deployments (services inside) with each other. At this moment after scaling we just check file flag, which indicate about tests execution inside deployment (with Jenkins help). But we want more mature/right mechanism.

from jenkins-plugin.

gabemontero commented on July 19, 2024

My interpretation is that the discussions here have circled back to openshift/origin#8507

We also got clarification from @Kargakis back with #33 (comment) on why the RC was reporting what it was reporting.

And based on the discussions noted with #33 (comment) , we still don't want the plugin to start interrogating Pod state.

Of course, this is still an evolving area. If changes occur around the multiple deployment orchestration or what the RC reports wrt Pod state, we can look into associated changes in the plugin.

But with the above preamble, I'm going to go ahead and close this one out. Please continue discussions either in openshift/origin#8507 or new issues to be opened against origin or k8s if the discussion broadens.

Thanks.

from jenkins-plugin.

Doesn't detect failed replication controller/deployment configuration about jenkins-plugin HOT 80 CLOSED

Comments (80)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent