vmware / cloud-provider-for-cloud-director Goto Github PK
View Code? Open in Web Editor NEWKubernetes External Cloud Provider for VMware Cloud Director
License: Other
Kubernetes External Cloud Provider for VMware Cloud Director
License: Other
It is not possible to use CCM with latest Kubernetes versions e.g. K8s from 1.22 till 1.24
Add support for latest Kubernetes versions
No response
No response
Hi I have a AVI load balancer in front of the VCD cells and I am getting a strange issue with it
I0104 17:46:07.745301 1 auth.go:49] Using VCD OpenAPI version [36.0]
I0104 17:46:08.168096 1 cloud.go:92] Error initializing client from secrets: [unable to get swagger client from secrets: [unable to get bearer token from serets: [failed to set authorization header: [error finding LoginUrl: could not find valid version for login: could not retrieve supported versions: error fetching versions: [ParseErr]: error parsing error body for non-200 request: XML syntax error on line 6: element
The LB is instantly saying the requests are malformed and returning 400 error code. I can see its attempting to access the /api/versions endpoint. If I get a bash shell in the container I can curl this endpoint from within the container without issue.
If I bypass the AVI load balancer and let the url go directly to a VCD cell it works fine. Wondering if testing with this has only been directly to cells and the webserver on the cell is being a bit more lax with standards.
VCD 10.3.1
AVI LB 20.1.4
cloud-provider-for-cloud-director tested with 1.0.0 1.0.1 1.0.2
1. Install a cluster on VCD 10.3.1 with a AVI load balancer load balancing the VCD portal
2.
3.
...
The provider to be able to access the API
No response
Some automation that follows SemVer syntax to validate image tags fail because the images are tagged with .latest
appended.
Remove .latest
in image tags.
No response
No response
Currently if the nodePort or port values for a LoadBalancer service is changed in k8s after the service is created the ingress pool / virtual service / DNAT rules in vCD are not updated with the new values resulting in Pods not being accessible
When a LoadBalancer Service is updated with new port values, the Load Balancer config in vCD is automatically updated to match.
No response
No response
Bug if you have 2 LB service using private ips in with same name in 2 different namespaces 1 of them will override the other in VCD causing 1 of them to disappear
Create 2 different services of type lb with the metadata.name
, if you do 1 of them will override the other one in a diff namespace.
Service's should not disappear or override themselves.
No response
To provide vcd-internal network loadbalanced services.
Can we make it possible to create LB's using user-specified or from a CIDR?
No response
No response
I was running 1.0.2 on k8s v1.21.2+vmware.1, and edited the deployment vmware-cloud-director-ccm to change the image to projects.registry.vmware.com/vmware-cloud-director/cloud-provider-for-cloud-director:1.1.0.latest.
The new vmware-cloud-director-ccm pod then logged continual messages like this, and no new NATs/ALBs were created in vCD:
E0201 14:52:00.261818 1 leaderelection.go:330] error retrieving resource lock kube-system/cloud-controller-manager: leases.coordination.k8s.io "cloud-controller-manager" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0201 14:52:03.107259 1 leaderelection.go:330] error retrieving resource lock kube-system/cloud-controller-manager: leases.coordination.k8s.io "cloud-controller-manager" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0201 14:52:06.108764 1 leaderelection.go:330] error retrieving resource lock kube-system/cloud-controller-manager: leases.coordination.k8s.io "cloud-controller-manager" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0201 14:52:10.287091 1 leaderelection.go:330] error retrieving resource lock kube-system/cloud-controller-manager: leases.coordination.k8s.io "cloud-controller-manager" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0201 14:52:14.167106 1 leaderelection.go:330] error retrieving resource lock kube-system/cloud-controller-manager: leases.coordination.k8s.io "cloud-controller-manager" is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
I edited the clusterrole (kubectl -n kube-system edit clusterrole system:cloud-controller-manager) and added the following to the end:
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- create
- get
- list
- update
After this the lease was acquired and NATs/ALBs were created again.
1. Upgrade to 1.1.0
2. Tail the logs for the new ccm pod
3. See permissions error relating to leases
4. New NAT/ALB config is not added
Service account should have correct permissions
NATs/ALBs should be created following upgrade, as before
No response
The generated swagger clients [1,2] annotate all fields with omitempty
. When a field with this annotation has a value equal to the type's zero value, Go considers it empty, and omits from the marshaled output.
Some fields have a zero value that carries meaning. For example, the GracefulTimeoutPeriod
has a zero value of 0
. This value has meaning: it means the timeout should be disabled. In fact, the SDK client (in pkg/vcdsdk
) disables this timeout. However, this has no effect: the swagger client removes the field from the API request, and so VCD service assigns the default timeout (a value of 1).
This problem is well-known by the Kubernetes community; it affected the core Kubernetes APIs. It is a topic of the Kubernetes API Conventions.
[1] https://github.com/vmware/cloud-provider-for-cloud-director/tree/868f15c9090e5b7799782047759cd0b5d069f4c7/pkg/vcdswaggerclient_36_0
[2] https://github.com/vmware/cloud-provider-for-cloud-director/tree/868f15c9090e5b7799782047759cd0b5d069f4c7/pkg/vcdswaggerclient_37_2
I'll try to create a failing unit test.
The swagger client must not omit fields with values that have meaning, when the values happen to be Go zero values.
For a quick demonstration of how omitempty
works when marshaling, see https://go.dev/play/p/CAOw2aCY3Gk
When you create a LoadBalancer service in k8s with multiple ports such as below this generates multiple virtual services in Cloud Director instead of a single virtual service with multiple ports.
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx-ingress-controller
spec:
ports:
- name: http
protocol: TCP
port: 80
targetPort: http
nodePort: 32697
- name: https
protocol: TCP
port: 443
targetPort: https
nodePort: 30032
selector:
app: nginx-ingress
type: LoadBalancer
Results in the following virtual services being added to vCD
ingress-vs-ingress-nginx-ingress-controller-____-http Virtual IP: x.x.x.1 Ports: 80 (L4)
ingress-vs-ingress-nginx-ingress-controller-____-https Virtual IP: x.x.x.2 Ports: 443 (L4)
From the vCD UI you cannot add multiple ports either other than using a range in the TCP Proxy field such as 80-443.
However multiple ports do seem to work as if you edit the Virtual Service in NSX-ALB adding the additional port this is displayed correctly in vCD where another TCP Proxy appears with an x to delete it and it can be modified and saved, the UI just doesn't look to have a button for adding additional TCP Proxy ports at this time but the API seems to function correctly.
The API call vCD does to retrieve and display multiple ports has the following structure for servicePorts
https://VCD_FQDN/cloudapi/1.0.0/edgeGateways/urn:vcloud:gateway:GUID/loadBalancer/virtualServiceSummaries?page=1&pageSize=15&sortAsc=name&links=true
"servicePorts": [{
"tcpUdpProfile": {
"name": null,
"type": "-",
"systemDefined": null
},
"portStart": 80,
"portEnd": 80,
"sslEnabled": false
}, {
"tcpUdpProfile": {
"name": null,
"type": "-",
"systemDefined": null
},
"portStart": 443,
"portEnd": 443,
"sslEnabled": false
}
],
When I try to edit the port in vCD though the API returns an error stating Edge Gateway k8s-edge can have multiple service ports only with additional licensing. Please contact your service provider.
None of this makes exact sense as AVI seems to support multiple ports under the license however vCD rejects you when you edit it so maybe this is not possible.
Cloud Director: 10.3.2
NSX-T: 3.1.3.1
AVI (NSX-ALB): 21.1.2 with Basic License
CCM: Built from main branch
1. Create a load balancer service in vCD with multiple port definitions
2. Observe that multiple virtual services are created in NSX-ALB when there should only be one
A single virtual service with multiple L4 ports is created, instead of multiple services.
It seems like a fairly fundamental feature for a single IP to be able to listen on multiple ports, especially when talking about HTTP and HTTPS as these cannot be on different IP addresses else a website may not work correctly.
We hit a nil pointer exception because of this change vmware/cluster-api-provider-cloud-director@3c10715#diff-59b60be5f954bd671a4e14145d5d90fa00ac1585211788e31632b9df9b84a8f7R122.
ovdcNetwork.OrgVdc
is nil in our case.
The provider shouldn't crash.
Slack thread: https://kubernetes.slack.com/archives/C04JFT7GDGR/p1677700001396329
The organization vDC the network belongs to. This should be unset if the network is owned by a vDC Group.
The docker image is hosted at harbor-repo.vmware.com/vcloud/cloud-provider-for-cloud-director:main-branch.latest
. But this is inaccessible.
Is harbor-repo.vmware.com
GA? Or would it be better for you guys to use docker.io or quay.io instead? Just want to know if this is a temporary issue or if there is no guarantee for uptime of that registry.
1. `curl harbor-repo.vmware.com`
No response
Hi this image cloud-provider-for-cloud-director:main-branch.8946fef
which is in manifests/cloud-director-ccm.yaml
was removed.
docker pull harbor-repo.vmware.com/vcloud/cloud-provider-for-cloud-director:main-branch.8946fef
Error response from daemon: unknown: artifact vcloud/cloud-provider-for-cloud-director:main-branch.8946fef not found
Making this manifest to fail.
docker pull harbor-repo.vmware.com/vcloud/cloud-provider-for-cloud-director:main-branch.8946fef
Pull complete should ocurr
No response
When we try to install an ingress controller which fails to install due to reason such as
When we try to uninstall the ingress or delete the service, cloud controller manager does not delete the created resources such as AppPortProfile, Pools, DNAT, virtual from the vCloud Director
It should cleanup the created resources even on unsuccessful creation of load balancer when the service is deleted.
No response
After I deleted DNAT manually, I delete service kubernetes, but resources (Virtual Service, Lb pool, application port profile) is not deleted. Has anyone checked this test case.
Here is log of normal delete svc case:
I0706 08:36:07.291178 1 client.go:117] successfully refreshed all clients
I0706 08:36:07.305516 1 loadbalancer.go:313] Deleting virtual service [ingress-vs-hello-world-u4cpuwjd] and lb pool [ingress-pool-hello-world-u4cpuwjd]
I0706 08:36:07.305574 1 loadbalancer.go:325] Deleting loadbalancer for ports [[]vcdsdk.PortDetails{vcdsdk.PortDetails{Protocol:"TCP", PortSuffix:"http", ExternalPort:80, InternalPort:30520, UseSSL:false, CertAlias:""}}]
I0706 08:36:07.583971 1 gateway.go:75] Obtained Gateway [XPLAT-VPC_GW] for Network Name [vdc-group-v695hb3v] of type [NSXT_FLEXIBLE_SEGMENT]
I0706 08:36:12.921990 1 gateway.go:1375] Deleted virtual service [ingress-vs-hello-world-u4cpuwjd-http]
I0706 08:36:20.402882 1 gateway.go:910] Deleted loadbalancer pool [ingress-pool-hello-world-u4cpuwjd-http]
I0706 08:36:27.768517 1 gateway.go:702] Deleted DNAT rule [dnat-ingress-vs-hello-world-u4cpuwjd-http] on gateway [XPLAT-VPC_GW]
I0706 08:36:27.768537 1 gateway.go:706] Checking if App Port Profile [appPort_dnat-ingress-vs-hello-world-u4cpuwjd-http] in org [000023-xplat] exists
I0706 08:36:27.991429 1 gateway.go:735] Deleting App Port Profile [appPort_dnat-ingress-vs-hello-world-u4cpuwjd-http] in org [000023-xplat]
E0706 08:36:31.258996 1 controller.go:307] error processing service example/hello-world (will retry): failed to delete load balancer: Unable to delete load balancer for virtual-service [ingress-vs-hello-world-u4cpuwjd] and lb pool [ingress-pool-hello-world-u4cpuwjd]: [error when removing vip [103.160.79.18] from RDE: [error getting current vips: [error when getting defined entity: [403 Forbidden]]]]
I0706 08:36:31.259058 1 event.go:291] "Event occurred" object="example/hello-world" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to delete load balancer: Unable to delete load balancer for virtual-service [ingress-vs-hello-world-u4cpuwjd] and lb pool [ingress-pool-hello-world-u4cpuwjd]: [error when removing vip [103.160.79.18] from RDE: [error getting current vips: [error when getting defined entity: [403 Forbidden]]]]"
E0706 08:36:31.263249 1 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"hello-world.16ff3091f0806873", GenerateName:"", Namespace:"example", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Service", Namespace:"example", Name:"hello-world", UID:"87605dc2-c2c1-43a5-b35d-1a1b622950e0", APIVersion:"v1", ResourceVersion:"1119444", FieldPath:""}, Reason:"SyncLoadBalancerFailed", Message:"Error syncing load balancer: failed to delete load balancer: Unable to delete load balancer for virtual-service [ingress-vs-hello-world-u4cpuwjd] and lb pool [ingress-pool-hello-world-u4cpuwjd]: [error when removing vip [103.160.79.18] from RDE: [error getting current vips: [error when getting defined entity: [403 Forbidden]]]]", Source:v1.EventSource{Component:"service-controller", Host:""}, FirstTimestamp:time.Date(2022, time.July, 6, 8, 36, 31, 258970227, time.Local), LastTimestamp:time.Date(2022, time.July, 6, 8, 36, 31, 258970227, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "hello-world.16ff3091f0806873" is forbidden: unable to create new content in namespace example because it is being terminated' (will not retry!)
I0706 08:36:31.581393 1 client.go:61] Refreshing vcd client
I0706 08:36:31.581413 1 client.go:66] Is user sysadmin: [false]
I0706 08:36:32.477084 1 client.go:117] successfully refreshed all clients
I0706 08:36:32.712574 1 gateway.go:75] Obtained Gateway [XPLAT-VPC_GW] for Network Name [vdc-group-v695hb3v] of type [NSXT_FLEXIBLE_SEGMENT]
I0706 08:36:32.814538 1 event.go:291] "Event occurred" object="example/hello-world" kind="Service" apiVersion="v1" type="Normal" reason="DeletedLoadBalancer" message="Deleted load balancer"
Here is log of delete svc case that I deleted DNAT manually before:
I0706 08:38:59.963452 1 client.go:61] Refreshing vcd client
I0706 08:38:59.963487 1 client.go:66] Is user sysadmin: [false]
I0706 08:39:01.103598 1 client.go:117] successfully refreshed all clients
I0706 08:39:01.496281 1 gateway.go:75] Obtained Gateway [XPLAT-VPC_GW] for Network Name [vdc-group-v695hb3v] of type [NSXT_FLEXIBLE_SEGMENT]
I0706 08:39:05.527633 1 event.go:291] "Event occurred" object="example/hello-world" kind="Service" apiVersion="v1" type="Normal" reason="DeletedLoadBalancer" message="Deleted load balancer"
1. Create svc k8s type Load Balancer
2. Delete DNAT manually
3. Delete svc k8s type Load Balancer
...
Virtual service, Lb pool and app port profile are deleted after deleting svc k8s type Load balancer
No response
When nodes are added or removed, the TCP health check is removed from the load balancer in VCD.
When the deprecated machine IP gets removed from the LB, the TCP check is removed at the same time and not recreated.
The TCP check should remain at all times.
No response
We would like to add node.kubernetes.io/instance-type labels to nodes provisioned using CAPVCD similar to other cloud provider node.kubernetes.io/instance-type.
https://kubernetes.io/docs/reference/labels-annotations-taints/#nodekubernetesioinstance-type
We can add the sizing policy of VCDMachine object as instanceType of the nodes. These labels can be added as metadata in init or join configuration of the nodes.
Check the original issue vmware/cluster-api-provider-cloud-director#536
No response
No response
We have a non-CSE cluster deployed in a network that doesn't allow direct connections to the VCD, instead we must use a HTTP/S proxy to connect. We have configured the vmware-cloud-director-ccm container to use the following configuration:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: proxy-config
namespace: kube-system
data:
HTTPS_PROXY: "http://<PROXY_IP_ADDRESS>:3128"
HTTP_PROXY: "http://<PROXY_IP_ADDRESS>:3128"
NO_PROXY: "localhost,127.0.0.1,10.20.172.10,<CLUSTER-SERVICE-CIDR-BLOCK>"
http_proxy: "http://<PROXY_IP_ADDRESS>:3128"
https_proxy: "http://<PROXY_IP_ADDRESS>:3128"
no_proxy: "localhost,127.0.0.1,10.20.172.10,<CLUSTER-SERVICE-CIDR-BLOCK>"
---
Logging the HTTP requests and responses with the VCD reveals that the container is talking to the VCD, for example successfully listing networks from the VCD until a specific query times out and the binary exits:
F0613 11:05:16.950898 1 main.go:75] Cloud provider could not be initialized: [could not init cloud provider "vmware-cloud-director": failed to create GatewayManager: [error caching gateway related details: [unable to get OVDC network [<NETWORK_NAME>]: [unable to get all ovdc networks: [<nil>]: [Get "https://<VCD_FQDN>/cloudapi/1.0.0/orgVdcNetworks?page=1&pageSize=32": dial tcp <VCD_IP>:443: connect: connection timed out]]]]]
Capturing traffic on the control plane node of the cluster with tcpdump reveals that most traffic going to the VCD does so via the HTTP/S proxy, but then something tries to send a SYN packet to the VCD directly a couple of minutes before the timeout, tries retransmission six times and never receives a reply packet.
Configure the vmware-cloud-director-ccm container to use an HTTP/S proxy in an environment where direct connections to the VCD are not available.
All connections to the VCD to be made through the proxy.
No response
CCM does not update the exiting virtual service if the port name is updated in the LoadBalancer service. It reports error saying there is existing virtual service with same IP and port combination.
It should update the existing virtual service or delete existing virtual first and then create a new one.
No response
If a customer has configured an external IP range which doesn't have enough IPs, they can't create extra load balancer by picking IPs from a different range.
Specifying which IP Block to take an IP from when creating a service of type load balancer.
No response
No response
As detailed on the README, when creating a k8s LoadBalancer service this will be created as HTTP / HTTPS, and HTTPS requires the creation of an SSL cert in vCD that is then applied to the virtual service.
However, if this is forwarding to a k8s ingress, it is likely that this ingress will serve many domains and be configured to use many certificates, so the configuration of a single certificate on the ALB virtual service is not required.
It would be better if the ALBs were L4 TCP only.
Option to create the ALB virtual service as L4 TCP, instead of HTTP/HTTPS with certificate.
Change the k8s service to NodePort instead of LoadBalancer. In this scenario no ALB config is added so it can be added manually.
However, it is actually faster to upload a cert, create the service, then once the virtual services and server pools have been added autoamtically, reconfigure the virtual services to be L4.
No response
If we create a LoadBalancer service in Kubernetes with port 80 and 443 - like the service created with https://projectcontour.io/quickstart/contour.yaml - the following happens:
2x ALB virtual services are created
2x ALB server pools are created
2x DNATs are created using the same public IP, that NAT through to the different IPs & ports used by the ALB virtual services, e.g.
We can then add the required firewall rule, e.g. allow all HTTP/HTTPS traffic to the ALB network 192.168.8.0/24 for example.
If the vCD Edge Gateway is backed by an NSX-T Tier-1 gateway, this will not work. Only one of the DNATs will work, e.g. HTTP will work but HTTPS will not. If we disable the HTTP DNAT, the HTTPS DNAT will start working.
If we connect directly to the 192.168.8.x IP of the ALB, both work.
The fix is to add applications to the NSX-T DNAT rules, e.g. HTTP and HTTPS as per the screenshot.
Ideally these would be added during the creation of the DNAT rule.
This was seen with vCD 10.3.2 and NSX-T 3.1.3.5.0.
1. Create k8s Loadbalancer service using vCD/NSX-T ALB/NSX-T Tier-1 Edge. DNATs will be created using a single public IP but different internal IPs/ports.
2. Test traffic direct to both ALB virtual services on their private IPs - OK
3. Test traffic to public DNAT IP - only one works. Disable HTTP DNAT - HTTPS will work.
Traffic to DNATs using a single public IP that is directed to different internal IPs/ports should work.
It works if applications are added to the DNATs via vCD.
Ideally these should be added during the automated creation of the rules.
No response
Some workloads exposed via ingress require the source IP to be the real one from the client.
Check annotation on service LB for a flag to enable Preserve Client IP.
No response
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.