Giter VIP home page Giter VIP logo

kubernetes / cloud-provider Goto Github PK

View Code? Open in Web Editor NEW
232.0 22.0 109.0 4.9 MB

cloud-provider defines the shared interfaces which Kubernetes cloud providers implement. These interfaces allow various controllers to integrate with any cloud provider in a pluggable fashion. Also serves as an issue tracker for SIG Cloud Provider.

License: Apache License 2.0

Go 100.00%
k8s-sig-cloud-provider k8s-staging

cloud-provider's Issues

Should limit the LoadBalancing rule's resource name length less than 80 characters long

When cloud provider creates LB, the LB rule's resource name is following the following pattern a7600d690dbdf11e9955da6e322bb002-[subnet name]-TCP-20201. If the subnet name is long, the rule's resource name is more than 80 characters long and the Azure NRP files the LB creation with following error: Cloud Provider should limit the LoadBalancing rule's resource name length less than 80 characters long.

E0920 15:42:22.281115 1 azure_backoff.go:565] processHTTPRetryResponse: backoff failure, will retry, err=network.LoadBalancersClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="InvalidResourceName" Message="Resource name a3c276d5bdbbd11e9a0ab065512a4bb2-ms-72164-rice-vmn-eastus2-n-services-subnet-TCP-20201 is invalid. The name can be up to 80 characters long. It must begin with a word character, and it must end with a word character or with ''. The name may contain word characters or '.', '-', ''." Details=[]

Decoupling Cloud Providers from Kubernetes e2e testing framework

In the past few releases we've been focused on migrating in-tree cloud providers (k8s.io/kubernetes/pkg/cloudprovider/providers) out-of-tree but we've neglected the providers that are a part of the e2e framework (https://github.com/kubernetes/kubernetes/tree/master/test/e2e/framework/providers) which also needs to be removed before we can stop vendoring cloud SDKs. There's a lot of refactoring needed in the e2e framework before this is possible so this will likely take a few releases.

This is mainly just a tracking issue for kubernetes/kubernetes#75604 & kubernetes/kubernetes#75601.

cc @timothysc @stevesloka @neolit123 @pohly

Standardize the Cloud Controller Manager Build/Release Process

Right now each provider is building/releasing the external cloud controller manager in their own way. It might be beneficial to standardize this going forward or at least set some guidelines on what is expected from a cloud controller manager build/release.

Some questions to consider:

  • What should a CCM release include? Docker image? Binaries? Source Code?
  • What base images are acceptable for a CCM build? Does it even matter?

We've had this discussion multiple times at KubeCONs and SIG calls, would be great to get some of those ideas vocalized here and formalize this in a doc going forward.

cc @cheftako @jagosan @hogepodge @frapposelli @yastij @dims @justaugustus

Handle volume scheduling when nodes are shutdown

Currently if a node gets shutdown, pods using volumes don't get rescheduled, Since we don't know if the volumes are still being used.

Two solutions:

  • Create a flow that has an interlock between node lifecycle controller, taintManager, PodGC, attach_detach_controller and kubelet, one of the drawbacks of this solution is that we need to tie taint removal to finishing the eviction (more specifically to finishing volume detach)

  • Rely on nodeReadiness Gate and let cloud provider implement a node condition shutdown, act upon it to detach the volume and remove it, one of the drawbacks of relying on condition is that we cannot tolerate them.

cc'ing folks involved for thoughts @smarterclayton @liggitt @andrewsykim @jingxu97 and @yujuhong

/assign

Finalizer Protection for Service LoadBalancers

This is mainly a SIG Cloud Provider tracking/backlog issue, more details on the problem here kubernetes/kubernetes#53451

We have two (stale) PRs open to add finalizer protection support for cloud provider LBs:
kubernetes/kubernetes#54569
kubernetes/kubernetes#65912

We have users who have report issues that their cloud LBs are not being deleted when they delete the corresponding Service. Adding finalizer protection would ensure the Service resource is not fully deleted until the correlating LB is also deleted.

Remove unecessary flags in cloud-controller-manager

There's a bunch of unnecessary flags that were added to the cloud-controller-manager as it was back ported from the kube-controller-manager. We should remove (or deprecate) those flags, specifically flags associated with a single provider (e.g. --cloud-provider-gce-lb-src-cidrs)

cc @timoreimann

Investigate API throttling in routes API calls

The Azure provider implemented caching in front of their routes interface to overcome API rate limiting (routes controller is extremely aggressive with the way it calls APIs), we should investigate if this is a common problem across providers (especially at larger scale) and come up with a common solution if it makes sense.

ref: kubernetes/kubernetes#60646

Support of instance metadata service

Before CCM, kubelet supports getting Node information by cloud provider's instance metadata service. This includes:

• NodeName
• ProviderID
• NodeAddresses
• InstanceType
• AvailabilityZone

Instance metadata service could help to reduce API throttling issues, and increase the nodes initialization duration. This is especially helpful for large clusters.

But with CCM, this is not possible anymore because the above functionality has been moved to cloud controller manager. We should add this back into Kubelet.

Since cloud providers are moving to external, we may need to add new plugins into Kubelet, e.g. via grpc.

Refer #14 (#14 is focused on node controller, while this one is focused on cloud API throttling).

@andrewsykim @justaugustus @craiglpeters Any ideas on this?

Nodes are being registered multiple times in the cloud-controller-manager

In some situations, the cloud-controller-manager tries to register the same node twice. I need to dig into this a bit more, but my guess is that we are receiving a 2nd update event on a node that is currently being registered.

I think one reasonable solution to this is to store a map of nodes that are currently being registered and skip registration if that node exists in the map. Delete the entry in the map after the node is done registration so it can be registered again later if desired.

Typically processing control loops twice is not an issue due to the idempotent nature of controllers, but node registration for a cloud provider can become pretty expensive so there may be some value in optimizing this. @tghartland has kindly left a detailed report of this in kubernetes/kubernetes#75285.

Investigate usage/requirements for ClusterID

In kubernetes/kubernetes#48954 & kubernetes/kubernetes#49215 we made ClusterID a requirement, and added a flag --allow-untagged-cloud on the kube-controller-manager. The intention there was to allow clusters to get away with not setting ClusterID for a few releases but eventually make it a requirement. It seems we never followed through with cleaning up the --allow-untagged-cloud flag.

More interestingly, it's not exactly clear how ClusterID is being consumed by both in-tree and out-of-tree cloud providers. It seems it's critical to AWS/GCE but not really used by others. Do we still need ClusterID? Should we use a more generic approach with labels/annotations? If we need it, should we go ahead and remove the --allow-untagged-cloud flag?

If the plan is to continue to support ClusterID, we should at least add better documentation for how this works.

cc @justinsb @rrati

Consuming cloud-provider appears problematic in 1.20

I've just upgraded the Brightbox cloud-controller to 1.20 and, in common with the AWS updates, I found the new interfaces into the library to be problematic, and the sample 'main' code to not work as I'd expected.

Primarily the call to s.Config appears to be too early and the command line flags have not been enumerated at that point, so version and help doesn't work and neither do command line flags.

I had to rewrite the top level interface along the lines of the AWS controller and move the call to s.Config back within the Cobra command function.

Of course it's possible I've completely misunderstood how this interface is supposed to work in its new configuration and if anybody can explain that I'd be grateful.

The code I ended up with is here: https://github.com/brightbox/brightbox-cloud-controller-manager/blob/8acb44fc63a74ca14c96daec039a4d2473881e52/app/cloudcontroller.go

Which is based on the AWS main file here: https://github.com/kubernetes/cloud-provider-aws/blob/3b384bb6e144446cb8015ab83269d6f99ac00898/cmd/aws-cloud-controller-manager/main.go

Document migration steps to CCM

We should document how a user would manually migrate their clusters from using in-tree cloud providers to out-of-tree cloud provider. The documented steps can be manual or via a tool like kubeadm.

Move cmd/cloud-controller-manager and its controllers to k8s.io/cloud-provider (staging)

Right now, out-of-tree providers end up vendoring a lot of k8s.io/kubernetes because they need to import cmd/cloud-controller-manager. cmd/cloud-controller-manager should be easier to consume, ideally living in it's own repository or moved to k8s.io/cloud-provider.

Similar to what we did for in-tree providers (k8s.io/kubernetes/pkg/cloudprovider/providers) we need to start pruning some of the internal dependencies for cmd/cloud-controller-manager as a starting point

Define requirements for cloud config

Each provider is doing something different with cloud config, and we don't have a consistent story for what we expect providers to do with it. We should have a discussion with the necessary stakeholders (cloud providers, api reviewers, etc) to better define what is expected for cloud config (future plans, depreciation policy, backwards compatibility, etc). Maybe the way things are today is fine (letting the provider define how cloud config is used), but we should at least document this expectation better.

Backoff not respected due to resync

I’ve been investigating an issue in our custom controller-manager where it appears that the backoff settings from service.Controller (exponential from 5s to 5m) weren’t respected. Our backend was timing out, and the controller retried soon after (within 30s), which compounded the issue.

I believe that this is due to the resync interval:

Down in processNextWorkItem(), at the end of a failed attempt, the controller will:

  1. Schedule the service to be re-enqueued after backoff. (s.queue.AddRateLimited(key))
  2. Remove the service from the queue. (s.queue.Done(key))
  3. Wait for the backoff interval to elapse.
  4. Re-enqueue the service.

However, if a resync happens, then service.Controller re-enqueues it without backoff. Because of the way that DelayingQueue works, an item is not (as far as I can tell) considered to be queued during (3), so the service key is not deduplicated as it would be for consecutive calls to Add().

The end result is that, despite maxRetryDelay of 5 minutes, service.Controller will never wait more than 30s before its next attempt to sync services.

There is a separate, related issue in kubernetes/client-go#131. It’s sort of the reverse. Here, I think that the call to AddRateLimited() should cancel a future call to Add(). There, the author wants a call to Add() to cancel a past call to AddRateLimited().

Investigate support for multiple route tables

Problem Description

Many Cloud support multiple route tables under VPC. CCM Cloud Interface has poor support for multi-route tables. Consider ListRoutes() interface, It is confused for ListRoutes() to return which route tables`s entry.
For example, RouteTableA has route entry [1,2,3,4], RouteTableB has route entry [2,3,4,5]
What result should ListRoutes() return? Neither [1,2,3,4,5] nor [2,3,4] is good. For case [1,2,3,4,5] , CreateRoute() would not be called for entry [1] and [5] which is needed.
For case [2,3,4] Delete() would not be called for the similar reason.

Potential Solutions

  1. Modify exist Interface, ListRoutes(tableid string). Add interface AllRouteTables() []Tables

  2. ListRoute() randomly return each route-table once a time. Reconcile would make sure the eventually consistency for route tables.

Better cloud LB names

LBs provisioned by Kubernetes on any cloud provider uses auto-generated names based on the Service's UUID (e.g. a44e18e4c552b11e683bb02fff13e176) which is not very human-friendly. With kubernetes/kubernetes#66589 merged it should be doable to have each provider set LB names based on naming requirements.

SIG CP tracking issue for kubernetes/kubernetes#69293.

This will need a KEP reviewed by SIG network & cloud-provider.

Extracting/Migrating the Credential Provider: KEP + Alpha Implementation

As part of the cloud provider extraction/migration, we should start to look into how the credential provider is going to be extracted so that the kubelet does not rely on cloud SDKs for image pulling credentials. Also to support future credential providers without adding it into the main tree.

Need to work with SIG Auth and propose a KEP to extract/migrate credential providers to move out-of-tree.

related: kubernetes/kubernetes#68810

cc @justinsb @mcrute

Removing cloud provider dependencies to k8s.io/kubernetes

Duplicating kubernetes/kubernetes#69585 to track milestones more easily.

As part of a long running initiative to remove cloud providers out of kubernetes/kubernetes, it's required to remove dependencies to kubernetes/kubernetes so we can place them into a staging directory. The following dependencies need to be removed from k8s.io/kubernetes/pkg/cloudprovider/providers:

Dependency checklist:

Module support for cloud-provider library

Updating the cloud provider libraries is a bit of a chore.

Really should be able to pull in all the updated module dependencies with

go get k8s.io/[email protected]

rather than delving around in the dependency libraries.

Pulling in the staged libraries is particularly difficult as there is no dependency information between them. A go.mod containing those would be a start

I use the following script as a workaround for now

[ $# -eq 1 ] || { echo "Supply new version number" >&2; exit 1; }

go get k8s.io/kubernetes@v$1 \
	k8s.io/cloud-provider@kubernetes-$1\
	k8s.io/api@kubernetes-$1\
	k8s.io/apimachinery@kubernetes-$1\
	k8s.io/apiserver@kubernetes-$1\
	k8s.io/apiextensions-apiserver@kubernetes-$1\
	k8s.io/cloud-provider@kubernetes-$1\
	k8s.io/csi-api@kubernetes-$1\
	k8s.io/kube-controller-manager@kubernetes-$1 \
	k8s.io/client-go@kubernetes-$1

Allow NotReady nodes to take part in load balancing

This code says that only Ready nodes will be used when configuring cloud load balancers for a service. This means that k8s control-plane problems may lead to workload outages. If for any reason k8s master will see nodes as 'not ready' - connectivity problems between master and nodes, kubelet network issues like this one, cloud firewall misconfiguration - anything, such nodes will be removed from the load balancing. It seems to me, that if such problems occur in the cluster, more stable behaviour would be to keep load balancer targets just as they are, and let the load balancer to rely on kube-proxy healthchecks (default or service-specific) to determine, which nodes are ready to serve traffic.

kube-controller-manager -> cloud-controller-manager HA migration: KEP + alpha implementation

We need a KEP outlining how we intend to migrate existing clusters from using the kube-controller-manager to the cloud-controller-manager for the cloud provider specific parts of Kubernetes.

At KubeCON NA 2018, we discussed grouping the existing cloud controllers under 1 leader election that is shared by the kube-controller-manager and the cloud-controller-manager. For single node control planes this is not needed, but for HA control planes we need a mechanism to ensure that not more than 1 kube-controller-manager or cloud-controller-manager is running the set of cloud controllers in a cluster.

Stage all in-tree cloud providers

Blocked on #1 & kubernetes/kubernetes#69585.

Phase 1 of removing in-tree cloud providers is to stage them and publish them to their respective out-of-tree repositories. See KEP-removing-in-tree-providers for more details.

Note that when we stage the providers, we actually only want to stage a subdirectory which acts as the provider package that is imported by cloud-controller-manager and kube-controller-manager. For example, we want to move k8s.io/kubernetes/pkg/cloudprovider/providers/gce to k8s.io/kubernetes/staging/src/k8s.io/cloud-provider/gce/provider. Once that move is complete, we would publish k8s.io/kubernetes/staging/src/k8s.io/cloud-provider/gce/provider to k8s.io/cloud-provider-gce/provider. This allows owners of k8s.io/cloud-provider-gce to continue to develop other parts of the repository as long as the provider package is left untouched and developed through k8s.io/kubernetes. This is required because many providers already support out-of-tree providers so we need a way to opt into only syncing the provider code without overwriting the entire repository. Some updates will be required from publishing bot, see kubernetes/publishing-bot#156 for more details.

Outdated services may be sent in UpdateLoadBalancer() interface

Refer the following codes:

func (s *Controller) nodeSyncInternal(workers int) {
startTime := time.Now()
defer func() {
latency := time.Since(startTime).Seconds()
klog.V(4).Infof("It took %v seconds to finish nodeSyncInternal", latency)
nodeSyncLatency.Observe(latency)
}()
if !s.needFullSyncAndUnmark() {
// The set of nodes in the cluster hasn't changed, but we can retry
// updating any services that we failed to update last time around.
s.servicesToUpdate = s.updateLoadBalancerHosts(s.servicesToUpdate, workers)
return
}
klog.V(2).Infof("Syncing backends for all LB services.")
// Try updating all services, and save the ones that fail to try again next
// round.
s.servicesToUpdate = s.cache.allServices()
numServices := len(s.servicesToUpdate)
s.servicesToUpdate = s.updateLoadBalancerHosts(s.servicesToUpdate, workers)
klog.V(2).Infof("Successfully updated %d out of %d load balancers to direct traffic to the updated set of nodes",
numServices-len(s.servicesToUpdate), numServices)
}

when updateLoadBalancerHosts() fails, the failed services would be saved locally in servicesToUpdate and they would be consumed for next retry. But at the same time, those services may be updated by clients, and hence the service spec in servicesToUpdate would be outdated. And hence, the wrong configurations may be applied when cloud providers use those outdated service specs to reconcile load balancer.

/kind bug

Cloud controllers should not reuse names and cluster roles of KCM controllers

Currently, a cluster operator can run their external cloud controller manager with --use-service-account-credentials enabled or disabled. If the flag is enabled, then each controller (control loop) gets its own service account token, using a hardcoded name (like "node-controller"). This is the name used for the service account, so it also needs to be present in the role binding for that controller. Currently, cloud providers may not have taken the time to recreate each controllers role and rolebinding separately from the upstream roles, and are (at least in AWS's case) piggybacking on the upstream roles. We want to move to custom cluster roles and role bindings, and it might make sense to deprecate the upstream roles eventually to reduce confusion.

I propose we differentiate the names of the controllers once they move to the external CCM, so that they can be differentiated in audit logs and from similar loops in the KCM. (e.g., the "node-controller" currently refers to three separate controllers, the cloud-node, the cloud-node-lifecycle, and the in-tree node-lifecycle (and maybe node-ipam too :D).

If we decided to do this, we could allow these names to be plumbed in from each cloud provider repository, so they can be cloud provider specific, or we could all agree on something like cloud-*-controller.

PV Admission breaks when external provider's CloudConfig diverges

cloud-provider-aws recently added a new NodeIPFamilies field to CloudConfig. This was not added to the in-tree implementation.

PV admission breaks on a cluster with the new CloudConfig field set along with these flags set on kube-controller-manager and kube-apiserver: --cloud-provider=external --cloud-config=/path/to/cloud.config

persistentvolumes "aws-" is forbidden: error querying AWS EBS volume aws://eu-west-1a/vol-0ee079da636e6e7d8: unable to read AWS cloud provider config file: warnings: can't store data at section "global", variable "NodeIPFamilies"

This can be seen on the In-Tree Volumes test failures in kops' IPv6 tests: job output and pod definitions for kube-controller-manager, kube-apiserver, and aws-cloud-controller-manager. pod logs are also available here.

I wasn't sure where to open this issue but I'm wondering how this should be handled. Should we not set apiserver's --cloud-config? Or add an equivalent dummy field in-tree so that cloud.config can be parsed? Or create a different cloud.config file for in-tree components vs the external CCM? or can the PV admission be replaced or removed?

From searching CloudConfigFile in k/k, the PV admission seems to be the only use for apiserver's --cloud-config flag. #4 discusses PV admission but not in the context of the external migration.

--cloud-provider cannot be empty

Using one of the sample main results in an error --cloud-provider cannot be empty with an exit status 1. Specifying this flag doesn't change anything.

Investigate API Throttling in Node Controller

ref: kubernetes/kubernetes#75016

For large clusters, we're seeing API throttling from providers becoming more common. Taking node-controller as an example, it will call a "get node" api request per node on every sync loop. For a 1000 node cluster that's could be 1000 get requests per minute which can result in users running out of API quotas.

Allow discovering node changes for load-balancer targets more frequently

The nodeSyncPeriod in service_controller.go defines the interval at which changes in nodes (additions, removals) will be discovered for the purpose of updating a load-balancer's target node set. It is currently hard-coded to 100 seconds and defined as a constant. This means that an update in a node pool can take up to 100 seconds to be reflected in a cloud load-balancer.

I'd like to explore opportunities to reduce latency at which node changes can propagate to load-balancers.

option for cloud-provider specific tags

Can we have the option for a CCM implementation to add cloud-provider-specific tags to nodes when they are created?

NOTE: This is the result of a slack discussion on the cloud-provider channel here and the follow-on thread with @andrewsykim here

The way I had envisioned this is to extend the Instances interface as follows:

	// InstanceTags returns a map of cloud-provider specific tags for the specified instance.
	// May be called multiple times. The keys of the map always will be prefixed with the
	// name of the cloud provider as "cloudprovider.kubernetes.io/<providername>/".
	InstanceTags(ctx context.Context, name types.NodeName) (map[string]string, error)
	// InstanceTagsByProviderID returns a map of cloud-provider specific tags for the specified instance.
	// May be called multiple times. The keys of the map always will be prefixed with the
	// name of the cloud provider as "cloudprovider.kubernetes.io/<providername>/".
	InstanceTagsByProviderID(ctx context.Context, providerID string) (map[string]string, error)

These would be idempotent, similar to (most of) the other functions in the interface, e.g. InstanceType() or InstanceTypeByProviderID().

Also note that the tags would have their keys prefixed by a cloud-provider-specific prefix, so as to prevent returned tags from trashing anything the user put on, or k8s-native tags. I picked cloudprovider.kubernetes.io/<providername>/ but anything will do.

@andrewsykim raised the valid issue that this might lead to all users of a CCM getting all of the tags. I think this is a problem for each CCM provider to solve in its own way. The cloud-provider implementation here would give each CCM the option to add tags; each CCM implementor would choose how to handle it: some would never add tags, because their user base wouldn't want it; others would always add tags, because their user base does; others would have tags, but controllable via config, whether CLI or env var options in the manifest that deploys the CCM, a CCM-controlling ConfigMap or some other mechanism. The key point is to create the option for each provider.

@andrewsykim also raised a possible alternate option, which may be complementary, specifically that we have an "add node" hook, also likely under Instances interface that would pass the node definition and allow the CCM to do whatever it needs with the definition of the node in kubernetes. This has the same tag (and other) issues as above, which could be involved in the same way. It is less idempotent, but has more options for CCM control of the node addition.

We could do both.

Finally, the implementation of it in cloud-provider is fairly straightforward. We would add the two funcs to the Instances interface, as above, and then extend getNodeModifiersFromCloudProvider to get the tags as modifiers, see here.

Looking forward to comments and feedback.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.