Giter VIP home page Giter VIP logo

aws-ebs-csi-driver's People

Contributors

andrewsirenko avatar andyxiangli avatar ayberk avatar bertinatto avatar connorjc3 avatar dkoshkin avatar dobsonj avatar gliptak avatar gnufied avatar gtxu avatar hanyuel avatar ialidzhikov avatar jieyu avatar jsafrane avatar k8s-ci-robot avatar kaezon avatar keznikl avatar krmichel avatar leakingtapan avatar mtougeron avatar nirmalaagash avatar rdpsin avatar risinger avatar t0rr3sp3dr0 avatar talnevo avatar tomdymond avatar torredil avatar vdhanan avatar wongma7 avatar zacharya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-ebs-csi-driver's Issues

Manifests files no longer work

After KubeletPluginsWatcher graduated to beta [1], this feature comes
enabled by default.

Basically, it replaces the older driver registration
mechanism, where the sidecar container used register the driver in the
Kubelet. With KubeletPluginsWather enabled, this registrarion is done
by the Kubelet itself.

As a result of this, the currently sample manifests no longer work.

[1] https://github.com/kubernetes/kubernetes/pull/68200/files

/kind bug
/assign

Create Helm chart for driver

Currently, user has to rely on the sample manifest files under deploy/kubernetes to deploy EBS CSI driver.

As a user who is looking for production ready EBS CSI driver to deploy on kubernetes, he shouldn't need to think about editing manifest file or asking best practices of editing it. We could leverage helmchart to make this a lot easier for user.

https://hub.helm.sh/

Differentiate client side error vs server side error for volume cloud interface

/kind feature

cloud interface should differentiate client side error vs server side error from ec2 client. Otherwise, it will be very confusing when a client side is reported as internal error, and it will block caller from retrying the call properly.

Further more, with error code reported, external-provisioner and external-attacher can selectively choose to retry on only server error. Because retrying client side error is not very helpful and the error could be returned immediately in favor of fail fast.

One example error is:

I1018 21:51:31.438313       1 controller.go:32] CreateVolume: called with args &csi.CreateVolumeRequest{Name:"pvc-f5305039-d31e-11e8-81f1-0a75e9a76798", CapacityRange:(*csi.CapacityRange)(0xc00016d230), VolumeCapabilities:[]*csi.VolumeCapability{(*csi.VolumeCapability)(0xc00001c6c0)}, Parameters:map[string]string{"iopsPerGB":"100", "type":"io1"}, ControllerCreateSecrets:map[string]string(nil), VolumeContentSource:(*csi.VolumeContentSource)(nil), AccessibilityRequirements:(*csi.TopologyRequirement)(0xc00035c500), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
E1018 21:51:31.740435       1 driver.go:94] GRPC error: rpc error: code = Internal desc = Could not create volume "pvc-f5305039-d31e-11e8-81f1-0a75e9a76798": could not create volume in EC2: InvalidParameterValue: Iops to volume size ratio of 100.000000 is too high; maximum is 50
        status code: 400, request id: f79f1ca4-bec5-4135-9800-dee0ddfcf1a3
I1018 21:55:54.079264       1 controller.go:190] ControllerGetCapabilities: called with args &csi.ControllerGetCapabilitiesRequest{XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}

Fix docker build issues

There are several issues with our current build process:

  • Its referencing fedora-minimal as base image which not available without registry prefix
  • Driver build environment depends on where Makefile is running, which has two issues:
    • golang version various on different environment
    • its cannot yet to build for linux on mac

make test fails becuase glog format

>> make test
go test -v -race github.com/bertinatto/ebs-csi-driver/pkg/...
# github.com/bertinatto/ebs-csi-driver/pkg/cloud/devicemanager
pkg/cloud/devicemanager/manager.go:217: Verbose.Infof format %s has arg device of wrong type *devicemanager.Device

Improve request logging format

The logging format for each incoming request prints memory address within a struct if the request contains pointer. This will not be very useful during debugging. We could improve it to log concrete value.

Once example of CreateVolume request:

I1019 05:02:56.884487       1 controller.go:32] CreateVolume: called with args &csi.CreateVolumeRequest{Name:"pvc-2cf57f1d-d35c-11e8-81f1-0a75e9a76798", CapacityRange:(*csi.CapacityRange)(0xc00044c720), VolumeCapabilities:[]*csi.VolumeCapability{(*csi.VolumeCapability)(0xc0001b0640)}, Parameters:map[string]string(nil), ControllerCreateSecrets:map[string]string(nil), VolumeContentSource:(*csi.VolumeContentSource)(nil), AccessibilityRequirements:(*csi.TopologyRequirement)(0xc00035c870), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}

We cloud do similar logging as external-provisioner logging format for request/response. As an example:

I1022 16:50:32.235405       1 controller.go:432] CreateVolumeRequest {Name:pvc-8f552e58-d61a-11e8-81f1-0a75e9a76798 CapacityRange:required_bytes:4294967296  VolumeCapabilities:[mount:<> access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] ControllerCreateSecrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1c" > > requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1a" > > requisite:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1b" > > preferred:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1c" > > preferred:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1a" > > preferred:<segments:<key:"com.amazon.aws.csi.ebs/zone" value:"us-east-1b" > >  XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}

Dynamic Provisioning with Topology Awareness Support

Currently, AWS cloud interface used by EBS CSI controller service is instantiated with the same AZ as where controller service is running. This causes the issue that EBS volume is only created at the AZ where controller service runs.

For example, if there are three nodes, each in us-west-2a, us-west-2b and us-west-2c. And controller service runs in us-west-2b while application is deployed to us-west-2a. During volume creation, the new volume will always be created in us-west-2b which causes application deployment to fail (since EBS volume and instance have to be in the same AZ).

One way to fix this is using node selector to always deploy application in the same AZ as controller. Or use static provision where PV/PVC created statically. However, both approach is non-ideal. They either enforces where controller service can run or invalidates the use case of dynamic provisioning.

Implement wait for volume attach/detach

As noted here

This is to ensure AttachVolume is fully completed before proceeding to mount.

We will also need to do similar thing for DetachVolume, CreateVolume and DeleteVolume

Add Travis as CI

The Travis job should do:

  • Build the package and verify build pass
  • Run unit test and verify tests succeed
  • Run sanity test and verity it succeeds

Implement support for encrypted volume

Creating encrypted EBS volume is currently supported in-tree AWS EBS driver storage class. To achieve feature parity with in-tree implementation, CSI driver should have this feature as well.

We need to handler extra error case where:

  • AWS.CreatVolume does not return errors when creating an encrypted volume using either non-existing or non-available KMS key.

Add version command to ebs-csi-driver cmd

./aws-ebs-csi-driver --version should return version with following information in Json formation including:

  • git commit sha
  • driver version
  • go compiler version
  • build date
  • Platform OS and Arch

Add golint check

This should help on coding standards with external contributor.

Update cloud API to consume context

There is a list of APIs that does volume operations but they are not consuming context object from CSI driver interface and populating it down to underlying AWS API calls.

Those APIs should be updated to consume and populate context object. It has the benefit that, once any call to EBS CSI driver is cancelled the whole call stack will be cancelled properly.

Setup automation to push container image to registry

Requirement
The automation tool/service should be able to:

  • build container image when PR is merged on github - this ease the process to build container image and helps preventing human errors during container build
  • push container image to container registry - Mainline build should be pushed to registry with latest tag. If the github event has a release tag, build the image with tag as release tag and push it with the same tag. This also prevents from human accidentally push image and potentially override the tag.
  • push to different container registries eg, dockerhub and ECR - This give us flexibility on the long run since once the driver is in production, it might be a requirement from aws that the driver should run on ECR.
  • build could be triggered from different github repo of sig-aws eg. aws-alb-ingress controller, etc

Transfer ownership to kubernetes-sigs as sig-aws subproject

Re-opening #1 as an issue.

The intent of this pull request is to complete the steps necessary to migrate coreos/alb-ingress-controller repository to k8s-sig-aws as a subproject.

The steps below use rules for new and donated repositories:

TODO - Complete check-list:

COMPLETED - as part of this PR:

  • Add the Kubernetes Code of Conduct to the repo.
  • Ensure that all code projects use the Apache License version 2.0.
  • Ensure that All OWNERS of the project must also be active SIG members.
  • Ensure SIG membership votes using lazy consensus to create a new repository.

Cannot attach volume to instance in a different AZ

PR #42
changed the CreateDisk behaviour by creating a volume in a random AZ
if such isn't provided. However, the volume might be created in an AZ where there's no driver
running.

This is the error that I'm getting:

Warning  FailedAttachVolume  84s  attachdetach-controller  (combined from similar events): AttachVolume.Attach failed for volume "pvc-44c2c835-cc91-11e8-93bb-0e3fe2b5fad4" : rpc error: code = Internal desc = Could not attach volume "vol-06899fef1736f24b9" to node "i-0a7e7ab68218ef0e0": could not attach volume "vol-06899fef1736f24b9" to node "i-0a7e7ab68218ef0e0": InvalidVolume.ZoneMismatch: The volume 'vol-06899fef1736f24b9' is not in the same availability zone as instance 'i-0a7e7ab68218ef0e0'

Basic testing infra with Prow

Currently, we are using Travis as a work around to run unit test and sanity test. It is suggested that everything should be running in Prow.

TODOs:

  • Every PR should trigger bot to run Prow job for AWS EBS CSI driver
  • Trigger unit test, sanity test, code verification/lint
  • Trigger integration test
  • Trigger e2e test
  • Trigger performance test (long term)

Reference:

Improve Cache Implementation

The current cache implement has some issues that could be improved:

  • It requires explicit Deprioritize each time GetNext is called. The only time Deprioritize is called is after GetNext, we can combine the two methods together to help on maintainability and understanbility and reduce the chance that someone forget to call Deprioritize
  • The behavior of GetNext is nondeterministic, this is caused by sort.Sort which uses quick sort that is unstable sort.
  • Cache is not invalidated after volume is detached.

Proposal:

  • Maintain a set of device names that are current attached
  • Calling Next() each time attaching a volume. Inside Next() call, looping through all the combinations of device names and returns the first one that is not in the cache set. This won't cause any performance issue, since the loop iteration count is very small (in the scale of number of device names, less than 100)
  • Calling Remove(deviceName) each time detaching a volume. Inside Remove() call, remove the device from the device name set.

Add support for storage class parameters

There is a list of storage class parameters that will be passed down to driver and they should be picked up by driver if specified. See here for full list with descriptions.

There are several parameters we need to implement:

  • type #63
  • iopsPerGB #63
  • fsType #62
  • encrypted #16
  • kmsKeyId #16
  • zone/zones (deprecated in 1.12) #64
  • allowedTopologies #64

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.