Giter VIP home page Giter VIP logo

Comments (10)

nojnhuh avatar nojnhuh commented on July 26, 2024

I'd be surprised if this were a general problem since I don't remember seeing this ever in e2e. Can you provide any more details about what the cluster template looked like or if you were doing anything to it around the time it was being deleted? It would also be helpful to know what the capz-controller-manager was logging for the CAPZ resources while they were stuck and what the full YAML of the resources was then.

from cluster-api-provider-azure.

dtzar avatar dtzar commented on July 26, 2024

I think the errors were like this when it was in this state. These are the logs with the state of an AKS cluster not being able to be created.

ASO logs

I0214 23:48:07.738380       1 generic_reconciler.go:96] "msg"="Encountered error, re-queuing..." "logger"="controllers.ResourceGroupController" "name"="argocluster" "namespace"="default" "result"={"Requeue":false,"RequeueAfter":180000000000}
I0214 23:51:07.737209       1 common.go:58] "msg"="Reconcile invoked" "annotations"={"serviceoperator.azure.com/credential-from":"argocluster-aso-secret","serviceoperator.azure.com/operator-namespace":"capz-system","serviceoperator.azure.com/reconcile-policy":"skip","serviceoperator.azure.com/resource-id":"/subscriptions/#-4518-4d05-9d6d-a29b5cf33a8d/resourceGroups/argocluster"} "conditions"="[Condition [Ready], Status = \"False\", ObservedGeneration = 1, Severity = \"Warning\", Reason = \"Failed\", Message = \"error getting status for resource ID \\\"/subscriptions/#-4518-4d05-9d6d-a29b5cf33a8d/resourceGroups/argocluster\\\": getting resource with ID: \\\"/subscriptions/#-4518-4d05-9d6d-a29b5cf33a8d/resourceGroups/argocluster\\\": ClientSecretCredential: unable to resolve an endpoint: server response error:\\n Get \\\"https://login.microsoftonline.com/#-345f-445e-90b7-31ac0c5cf7ef/v2.0/.well-known/openid-configuration\\\": dial tcp: lookup login.microsoftonline.com on 10.96.0.10:53: server misbehaving\", LastTransitionTime = \"2024-02-14 23:41:18 +0000 UTC\"]" "creationTimestamp"="2024-02-14T23:40:55Z" "deletionTimestamp"=null "finalizers"=["serviceoperator.azure.com/finalizer"] "generation"=1 "kind"={"kind":"ResourceGroup","apiVersion":"resources.azure.com/v1api20200601storage"} "logger"="controllers.ResourceGroupController" "name"="argocluster" "namespace"="default" "owner"=null "ownerReferences"=[{"apiVersion":"infrastructure.cluster.x-k8s.io/v1beta1","kind":"AzureManagedControlPlane","name":"argocluster","uid":"acd970a7-adff-4fb8-af20-98014aee4f3d","controller":true,"blockOwnerDeletion":true}] "resourceVersion"="732785" "uid"="162c81d9-4718-49e8-a5f3-d3a7a305d12e"
I0214 23:51:07.737286       1 generic_reconciler.go:335] "msg"="Skipping creation/update of resource due to policy" "logger"="controllers.ResourceGroupController" "name"="argocluster" "namespace"="default" "serviceoperator.azure.com/reconcile-policy"="skip"
I0214 23:51:07.737464       1 azure_generic_arm_reconciler_instance.go:421] "msg"="Resource successfully created/updated" "azureName"="argocluster" "logger"="controllers.ResourceGroupController" "name"="argocluster" "namespace"="default" "resourceID"="/subscriptions/##-4518-4d05-9d6d-a29b5cf33a8d/resourceGroups/argocluster"
I0214 23:51:07.737700       1 recorder.go:104] "msg"="Using credential from \"default/argocluster-aso-secret\"" "logger"="events" "object"={"kind":"ResourceGroup","namespace":"default","name":"argocluster","uid":"162c81d9-4718-49e8-a5f3-d3a7a305d12e","apiVersion":"resources.azure.com/v1api20200601storage","resourceVersion":"732785"} "reason"="CredentialFrom" "type"="Normal"
E0214 23:51:30.238458       1 generic_reconciler.go:361] "msg"="Encountered error impacting Ready condition" "error"="Reason: Failed, Severity: Warning, RetryClassification: RetrySlow, Cause: error getting status for resource ID \"/subscriptions/#-4518-4d05-9d6d-a29b5cf33a8d/resourceGroups/argocluster\": getting resource with ID: \"/subscriptions/#-4518-4d05-9d6d-a29b5cf33a8d/resourceGroups/argocluster\": ClientSecretCredential: unable to resolve an endpoint: server response error:\n Get \"https://login.microsoftonline.com/#-345f-445e-90b7-31ac0c5cf7ef/v2.0/.well-known/openid-configuration\": dial tcp: lookup login.microsoftonline.com on 10.96.0.10:53: server misbehaving" "logger"="controllers.ResourceGroupController" "name"="argocluster" "namespace"="default"
I0214 23:51:30.280058       1 generic_reconciler.go:96] "msg"="Encountered error, re-queuing..." "logger"="controllers.ResourceGroupController" "name"="argocluster" "namespace"="default" "result"={"Requeue":false,"RequeueAfter":180000000000}

CAPZ logs

I0214 23:58:49.719804       1 azuremanagedcluster_controller.go:169] "Successfully reconciled" logger="controllers.AzureManagedClusterReconciler.Reconcile" controller="azuremanagedcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AzureManagedCluster" AzureManagedCluster="default/argocluster" namespace="default" name="argocluster" reconcileID="4f7de029-f703-4b0b-9c80-1fd98c860b88" kind="AzureManagedCluster" namespace="default" name="argocluster" x-ms-correlation-request-id="821674f9-3b7e-4d53-ad30-05646b3e8a39" cluster="argocluster" controlPlane="argocluster"
I0214 23:58:49.720816       1 azuremanagedmachinepool_controller.go:194] "AzureManagedControlPlane is not initialized" logger="controllers.AzureManagedMachinePoolReconciler.Reconcile" controller="azuremanagedmachinepool" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AzureManagedMachinePool" AzureManagedMachinePool="default/argocluster-pool0" namespace="default" name="argocluster-pool0" reconcileID="ce0d7739-b492-45e9-8e32-e3225ac5b3e5" namespace="default" name="argocluster-pool0" kind="AzureManagedMachinePool" x-ms-correlation-request-id="07d964de-46a9-424a-8caa-a300e0848848" ownerCluster="argocluster"
I0214 23:58:57.004553       1 azuremanagedcontrolplane_controller.go:234] "Reconciling AzureManagedControlPlane" logger="controllers.AzureManagedControlPlaneReconciler.reconcileNormal" controller="azuremanagedcontrolplane" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AzureManagedControlPlane" AzureManagedControlPlane="default/argocluster" namespace="default" name="argocluster" reconcileID="496335b7-c7c5-42d6-b1a7-4e99ef398eca" x-ms-correlation-request-id="cced4b6d-844f-46f0-8d73-a07c07b8d9ff"
E0214 23:58:57.004718       1 managedcontrolplane.go:427] "Unable to determine if ManagedControlPlaneScope VNET is managed by capz, assuming unmanaged" err="VirtualNetwork.network.azure.com \"argocluster\" not found" logger="scope.ManagedControlPlaneScope.IsVnetManaged" x-ms-correlation-request-id="28d3fddd-4e53-4226-ac43-d50c841734ff" AzureManagedCluster="argocluster"

from cluster-api-provider-azure.

dtzar avatar dtzar commented on July 26, 2024

Unfortunately, I can't reproduce the problem right now - but will circle back when/if I do.

from cluster-api-provider-azure.

nojnhuh avatar nojnhuh commented on July 26, 2024

It almost seems like there's something wonky in your Workload ID setup or your sub or something. I've never seen this kind of error in e2e or locally for me.

from cluster-api-provider-azure.

dtzar avatar dtzar commented on July 26, 2024

I have a live repo of this now. I'm pretty sure you can reproduce this by doing the following:

  1. Let CAPZ create a cluster
  2. Power off the CAPZ management cluster
  3. Manually delete the AKS cluster resource (but leave everything else in the RG)
  4. Bring back online the CAPZ management cluster

Here's my logs I have right now from ASO (using 1.14.0 release also).

I0314 21:49:12.626378       1 recorder.go:104] "msg"="Reason: ResourceNotFound, Severity: Warning, RetryClassification: RetrySlow, Cause: The Resource 'Microsoft.ContainerService/managedClusters/argocluster' under resource group 'argocluster' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix: PUT https://management.azure.com/subscriptions/####/resourceGroups/argocluster/providers/Microsoft.ContainerService/managedClusters/argocluster/agentPools/pool1\n--------------------------------------------------------------------------------\nRESPONSE 404: 404 Not Found\nERROR CODE: ResourceNotFound\n

from cluster-api-provider-azure.

nawazkh avatar nawazkh commented on July 26, 2024

#4609 May or may not be connected to this issue. But since both of these issues are around deletion, I am mentioning the other issue as well in here.

from cluster-api-provider-azure.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.