kubernetes-sigs / aws-ebs-csi-driver Goto Github PK
View Code? Open in Web Editor NEWCSI driver for Amazon EBS https://aws.amazon.com/ebs/
License: Apache License 2.0
CSI driver for Amazon EBS https://aws.amazon.com/ebs/
License: Apache License 2.0
fsType format depends on mount util.
Supported fs type will be whatever is supported by mkfs:
After KubeletPluginsWatcher
graduated to beta [1], this feature comes
enabled by default.
Basically, it replaces the older driver registration
mechanism, where the sidecar container used register the driver in the
Kubelet. With KubeletPluginsWather
enabled, this registrarion is done
by the Kubelet itself.
As a result of this, the currently sample manifests no longer work.
[1] https://github.com/kubernetes/kubernetes/pull/68200/files
/kind bug
/assign
Add to list here: https://kubernetes-csi.github.io/docs/Drivers.html
We need to host the driver in a container registry. Currently it's being stored in my personal account [1], but it'd be better to store it in its own account.
[1] https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/Makefile#L1
Currently, the attacher and provisioner are setting service.spec.port to be dummy port which is unnecessary. Because attacher/provisioner should be deployed as stateful set where loading balancing is not needed.
Instead Headless service could be used as best practice in the example yaml file.
It would be nice if we have a README.md file under https://github.com/d-nishi/aws-ebs-csi-driver/tree/master/deploy/kubernetes to instruct first time user on how to deploy EBS CSI driver on AWS. Basic step and best practices all could be added there.
Currently, user has to rely on the sample manifest files under deploy/kubernetes to deploy EBS CSI driver.
As a user who is looking for production ready EBS CSI driver to deploy on kubernetes, he shouldn't need to think about editing manifest file or asking best practices of editing it. We could leverage helmchart to make this a lot easier for user.
/kind feature
cloud interface should differentiate client side error vs server side error from ec2 client. Otherwise, it will be very confusing when a client side is reported as internal error, and it will block caller from retrying the call properly.
Further more, with error code reported, external-provisioner and external-attacher can selectively choose to retry on only server error. Because retrying client side error is not very helpful and the error could be returned immediately in favor of fail fast.
One example error is:
I1018 21:51:31.438313 1 controller.go:32] CreateVolume: called with args &csi.CreateVolumeRequest{Name:"pvc-f5305039-d31e-11e8-81f1-0a75e9a76798", CapacityRange:(*csi.CapacityRange)(0xc00016d230), VolumeCapabilities:[]*csi.VolumeCapability{(*csi.VolumeCapability)(0xc00001c6c0)}, Parameters:map[string]string{"iopsPerGB":"100", "type":"io1"}, ControllerCreateSecrets:map[string]string(nil), VolumeContentSource:(*csi.VolumeContentSource)(nil), AccessibilityRequirements:(*csi.TopologyRequirement)(0xc00035c500), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
E1018 21:51:31.740435 1 driver.go:94] GRPC error: rpc error: code = Internal desc = Could not create volume "pvc-f5305039-d31e-11e8-81f1-0a75e9a76798": could not create volume in EC2: InvalidParameterValue: Iops to volume size ratio of 100.000000 is too high; maximum is 50
status code: 400, request id: f79f1ca4-bec5-4135-9800-dee0ddfcf1a3
I1018 21:55:54.079264 1 controller.go:190] ControllerGetCapabilities: called with args &csi.ControllerGetCapabilitiesRequest{XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Similar to #38
There are several issues with our current build process:
This is to help CO to decide how many volume maximum could be attached to the node without going over the limit.
Reference:
https://github.com/container-storage-interface/spec/blob/master/spec.md#nodegetinfo
With latest topology support:
>> make test
go test -v -race github.com/bertinatto/ebs-csi-driver/pkg/...
# github.com/bertinatto/ebs-csi-driver/pkg/cloud/devicemanager
pkg/cloud/devicemanager/manager.go:217: Verbose.Infof format %s has arg device of wrong type *devicemanager.Device
The logging format for each incoming request prints memory address within a struct if the request contains pointer. This will not be very useful during debugging. We could improve it to log concrete value.
Once example of CreateVolume
request:
I1019 05:02:56.884487 1 controller.go:32] CreateVolume: called with args &csi.CreateVolumeRequest{Name:"pvc-2cf57f1d-d35c-11e8-81f1-0a75e9a76798", CapacityRange:(*csi.CapacityRange)(0xc00044c720), VolumeCapabilities:[]*csi.VolumeCapability{(*csi.VolumeCapability)(0xc0001b0640)}, Parameters:map[string]string(nil), ControllerCreateSecrets:map[string]string(nil), VolumeContentSource:(*csi.VolumeContentSource)(nil), AccessibilityRequirements:(*csi.TopologyRequirement)(0xc00035c870), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
We cloud do similar logging as external-provisioner logging format for request/response. As an example:
I1022 16:50:32.235405 1 controller.go:432] CreateVolumeRequest {Name:pvc-8f552e58-d61a-11e8-81f1-0a75e9a76798 CapacityRange:required_bytes:4294967296 VolumeCapabilities:[mount:<> access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] ControllerCreateSecrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1c" > > requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1a" > > requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1b" > > preferred:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1c" > > preferred:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1a" > > preferred:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1b" > > XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
Any PR should have test coverage reported on PR request. Here is an example:
kubernetes-sigs/aws-load-balancer-controller#647
We had bot setup. However, it isn't working obviously.
Originally posted by @jsafrane in #8 (comment)
A contributing guideline is needed for contributor. We could follow general kubernates guideline
Currently, AWS cloud interface used by EBS CSI controller service is instantiated with the same AZ as where controller service is running. This causes the issue that EBS volume is only created at the AZ where controller service runs.
For example, if there are three nodes, each in us-west-2a, us-west-2b and us-west-2c. And controller service runs in us-west-2b while application is deployed to us-west-2a. During volume creation, the new volume will always be created in us-west-2b which causes application deployment to fail (since EBS volume and instance have to be in the same AZ).
One way to fix this is using node selector to always deploy application in the same AZ as controller. Or use static provision where PV/PVC created statically. However, both approach is non-ideal. They either enforces where controller service can run or invalidates the use case of dynamic provisioning.
Driver implementation should support following operations:
References:
This README should help users to on-board to the driver in an easy way.
/kind bug
Inside AttachDisk
call, c.waitForAttachmentState
is called even when the device is already assigned. We should only call it when device is not attached.
As noted here
This is to ensure AttachVolume
is fully completed before proceeding to mount.
We will also need to do similar thing for DetachVolume
, CreateVolume
and DeleteVolume
Update should include:
Should be added to hack/verify-all
and similar to gofmt
The Travis job should do:
Current import paths are depending on github.com/bertinatto/ebs-csi-driver
which should be renamed to github.com/kubernetes-sigs/aws-ebs-csi-driver
This have several benefits that:
ldflags should be set to strip symbol using:
go build -ldflags '-s -w'
Creating encrypted EBS volume is currently supported in-tree AWS EBS driver storage class. To achieve feature parity with in-tree implementation, CSI driver should have this feature as well.
We need to handler extra error case where:
AWS.CreatVolume
does not return errors when creating an encrypted volume using either non-existing or non-available KMS key.We should update this repo to use go module to manage dependencies.
Detailed docs: https://github.com/golang/go/wiki/Modules
Volume type will include: gp2, io1, st1 and sc1. When io1 is specified, an extra iopsPerGB
field should be taken
./aws-ebs-csi-driver --version
should return version with following information in Json formation including:
This should help on coding standards with external contributor.
There is a list of APIs that does volume operations but they are not consuming context object from CSI driver interface and populating it down to underlying AWS API calls.
Those APIs should be updated to consume and populate context object. It has the benefit that, once any call to EBS CSI driver is cancelled the whole call stack will be cancelled properly.
Requirement
The automation tool/service should be able to:
Re-opening #1 as an issue.
The intent of this pull request is to complete the steps necessary to migrate coreos/alb-ingress-controller repository to k8s-sig-aws as a subproject.
The steps below use rules for new and donated repositories:
TODO - Complete check-list:
COMPLETED - as part of this PR:
PR #42
changed the CreateDisk
behaviour by creating a volume in a random AZ
if such isn't provided. However, the volume might be created in an AZ where there's no driver
running.
This is the error that I'm getting:
Warning FailedAttachVolume 84s attachdetach-controller (combined from similar events): AttachVolume.Attach failed for volume "pvc-44c2c835-cc91-11e8-93bb-0e3fe2b5fad4" : rpc error: code = Internal desc = Could not attach volume "vol-06899fef1736f24b9" to node "i-0a7e7ab68218ef0e0": could not attach volume "vol-06899fef1736f24b9" to node "i-0a7e7ab68218ef0e0": InvalidVolume.ZoneMismatch: The volume 'vol-06899fef1736f24b9' is not in the same availability zone as instance 'i-0a7e7ab68218ef0e0'
zone/zones
are needed for k8s version v1.10 and v1.11. These two fields are deprecated in v1.12 in favor of allowedTopologies
.
Currently, we are using Travis as a work around to run unit test and sanity test. It is suggested that everything should be running in Prow.
TODOs:
Reference:
/kind bug
The e2e test ec2 client has a hard coded region, which causes e2e test to fail when it runs in a region other than us-east-1
We can do similar thing as discussed here: kubernetes/test-infra#9976
The current cache implement has some issues that could be improved:
Deprioritize
each time GetNext
is called. The only time Deprioritize
is called is after GetNext
, we can combine the two methods together to help on maintainability and understanbility and reduce the chance that someone forget to call Deprioritize
GetNext
is nondeterministic, this is caused by sort.Sort which uses quick sort that is unstable sort.Proposal:
Next()
each time attaching a volume. Inside Next()
call, looping through all the combinations of device names and returns the first one that is not in the cache set. This won't cause any performance issue, since the loop iteration count is very small (in the scale of number of device names, less than 100)Remove(deviceName)
each time detaching a volume. Inside Remove()
call, remove the device from the device name set.There is a list of storage class parameters that will be passed down to driver and they should be picked up by driver if specified. See here for full list with descriptions.
There are several parameters we need to implement:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.