aws / aws-k8s-tester Goto Github PK
View Code? Open in Web Editor NEWAWS Kubernetes tester, kubetest2 deployer implementation
License: Apache License 2.0
AWS Kubernetes tester, kubetest2 deployer implementation
License: Apache License 2.0
Cluster creation is expensive taking >20-min.
Create cluster once.
Run add-on tests.
Clean up add-on tests resources.
Run more add-on tests.
Clean up add-on tests resources.
(Optional) delete cluster.
e.g.
{
"version": "1.0",
"dataItems": [
{
"data": {
"Perc50": 2870.804716,
"Perc90": 6642.876753,
"Perc99": 9695.783054
},
"unit": "ms",
"labels": {
"Metric": "run_to_watch"
}
},
{
"data": {
"Perc50": 4842.424464,
"Perc90": 8896.50057,
"Perc99": 12095.952701
},
"unit": "ms",
"labels": {
"Metric": "schedule_to_watch"
}
},
{
"data": {
"Perc50": 4842.424464,
"Perc90": 8896.50057,
"Perc99": 12095.952701
},
"unit": "ms",
"labels": {
"Metric": "pod_startup"
}
},
{
"data": {
"Perc50": 0,
"Perc90": 0,
"Perc99": 0
},
"unit": "ms",
"labels": {
"Metric": "create_to_schedule"
}
},
{
"data": {
"Perc50": 2000,
"Perc90": 3000,
"Perc99": 10000
},
"unit": "ms",
"labels": {
"Metric": "schedule_to_run"
}
}
]
}
Here is an example:
# aws-k8s-tester version
{"git-commit":"36fe29fdb301","release-version":"v0.9.6","build-time":"2020-03-31_22:38:35"}
# export AWS_K8S_TESTER_EKS_ADD_ON_MANAGED_NODE_GROUPS_ENABLE=true
# export AWS_K8S_TESTER_EKS_ADD_ON_MANAGED_NODE_GROUPS_MNGS='{"eks-mng-8775":{"name":"eks-mng-8775","ami-type":"AL2_x86_64","asg-min-size":3,"asg-max-size":3,"asg-desired-capacity":3,"instance-types":["c5.xlarge"],"volume-size":40}}'
*********************************
overwriting config file from environment variables...
panic: reflect: call of reflect.Value.Field on zero Value
goroutine 1 [running]:
reflect.Value.Field(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2b4b780)
/usr/local/go/src/reflect/value.go:827 +0x123
github.com/aws/aws-k8s-tester/eksconfig.parseEnvs(0x260843b, 0x2e, 0x1fdeb20, 0x0, 0x1fdeba0, 0x0, 0x0, 0x0)
/home/ANT.AMAZON.COM/leegyuho/go/src/github.com/aws/aws-k8s-tester/eksconfig/env.go:206 +0x4d6
github.com/aws/aws-k8s-tester/eksconfig.(*Config).UpdateFromEnvs(0xc0009110e0, 0x0, 0x0)
/home/ANT.AMAZON.COM/leegyuho/go/src/github.com/aws/aws-k8s-tester/eksconfig/env.go:85 +0x273
github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/eks.configFunc(0xc000890000, 0xc0002add80, 0x0, 0x2)
/home/ANT.AMAZON.COM/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/eks/create.go:45 +0x10b
github.com/spf13/cobra.(*Command).execute(0xc000890000, 0xc0002add60, 0x2, 0x2, 0xc000890000, 0xc0002add60)
/home/ANT.AMAZON.COM/leegyuho/go/src/github.com/aws/aws-k8s-tester/vendor/github.com/spf13/cobra/command.go:830 +0x29d
github.com/spf13/cobra.(*Command).ExecuteC(0x3f09760, 0xc000811680, 0xc0004ff400, 0xc0004ff400)
/home/ANT.AMAZON.COM/leegyuho/go/src/github.com/aws/aws-k8s-tester/vendor/github.com/spf13/cobra/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
/home/ANT.AMAZON.COM/leegyuho/go/src/github.com/aws/aws-k8s-tester/vendor/github.com/spf13/cobra/command.go:864
main.main()
/home/ANT.AMAZON.COM/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/main.go:33 +0x31
Sync()
calls IsEnabledAddOnManagedNodeGroups
and other IsEnableXXX
functions which will nil fields if they are not enabled. Then UpdateFromEnvs
will fail with the above error because field to parse is nil now.
aws-k8s-tester/eksconfig/config.go
Lines 292 to 301 in 36fe29f
aws-k8s-tester/cmd/aws-k8s-tester/eks/create.go
Lines 39 to 41 in 36fe29f
What do you think? Fix in UpdateFromEnvs
or don't nil fields in IsEnableXXX
functions?
/tmp/kubectl-test-v1.17.6 \
--kubeconfig=/tmp/sparklingbzw4ba.kubeconfig.yaml \
-n eks-2020061521-embarku7aaxl-cronjob \
get all
No resources found in eks-2020061521-embarku7aaxl-cronjob namespace.
export KUBECONFIG=/tmp/sparklingbzw4ba.kubeconfig.yaml
kubectl get namespace "eks-2020061521-embarku7aaxl-cronjob" -o json \
| tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \
| kubectl replace --raw /api/v1/namespaces/eks-2020061521-embarku7aaxl-cronjob/finalize -f -
Store results in s3, compute delta in next run, fail if delta exceeds threshold.
$ git clone https://github.com/aws/aws-k8s-tester.git
Cloning into 'aws-k8s-tester'...
remote: Enumerating objects: 858, done.
remote: Counting objects: 100% (858/858), done.
remote: Compressing objects: 100% (520/520), done.
remote: Total 37086 (delta 421), reused 550 (delta 235), pack-reused 36228
Receiving objects: 100% (37086/37086), 34.73 MiB | 8.92 MiB/s, done.
Resolving deltas: 100% (20344/20344), done.
error: invalid path 'eks/cluster-loader/artifacts/MetricsForE2E_load_2020-06-18T22:18:53-07:00.json'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
$ git --version
git version 2.27.0.windows.1
All stackes combined into one.
Due to the fact that the eks package is internal, it seems I cannot import aws-k8s-tester as a library in order to do cluster setup in the test itself (without shelling out to a binary of aws-k8s-tester). For example:
var cfg = &eksconfig.Config{}
func setupCluster() {
cfg = eksconfig.NewDefault()
tester, err := eks.NewTester(cfg)
}
Is there (or should there) be another way to do this, or should the eks package be split/made to be importable?
Currently, kubeadm (v1.13.2) does not work on Amazon Linux 2.
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "ip-192-168-222-150.us-west-2.compute.internal" as an annotation
[kubelet-check] Initial timeout of 40s passed.
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition
W1121 19:22:20.855] 2018/11/21 19:22:20 process.go:155: Step '/tmp/aws-k8s-tester767933017 eks —path=/tmp/aws-k8s-tester510470242 check cluster' finished in 6.625144142s
W1121 19:22:20.904] 2018/11/21 19:22:20 process.go:153: Running: ./cluster/kubectl.sh —match-server-version=false version
W1121 19:22:21.341] The connection to the server localhost:8080 was refused - did you specify the right host or port?
W1121 19:22:21.345] 2018/11/21 19:22:21 process.go:155: Step './cluster/kubectl.sh —match-server-version=false version' finished in 440.928862ms
W1121 19:22:21.345] 2018/11/21 19:22:21 e2e.go:341: Failed to reach api. Sleeping for 10 seconds before retrying…
W1121 19:22:31.345] 2018/11/21 19:22:31 process.go:153: Running: ./cluster/kubectl.sh —match-server-version=false version
W1121 19:22:31.438] The connection to the server localhost:8080 was refused - did you specify the right host or port?
W1121 19:22:31.441] 2018/11/21 19:22:31 process.go:155: Step './cluster/kubectl.sh —match-server-version=false version' finished in 96.048207ms
W1121 19:22:31.441] 2018/11/21 19:22:31 e2e.go:341: Failed to reach api. Sleeping for 10 seconds before retrying…
W1121 19:22:41.442] 2018/11/21 19:22:41 process.go:153: Running: ./cluster/kubectl.sh —match-server-version=false version
W1121 19:22:41.539] The connection to the server localhost:8080 was refused - did you specify the right host or port?
W1121 19:22:41.542] 2018/11/21 19:22:41 process.go:155: Step './cluster/kubectl.sh —match-server-version=false version' finished in 100.389731ms
W1121 19:22:41.542] 2018/11/21 19:22:41 e2e.go:341: Failed to reach api. Sleeping for 10 seconds before retrying…
W1121 19:22:51.542] 2018/11/21 19:22:51 process.go:153: Running: ./cluster/kubectl.sh —match-server-version=false version
W1121 19:22:51.638] The connection to the server localhost:8080 was refused - did you specify the right host or port?
W1121 19:22:51.640] 2018/11/21 19:22:51 process.go:155: Step './cluster/kubectl.sh —match-server-version=false version' finished in 97.6761ms
W1121 19:22:51.640] 2018/11/21 19:22:51 e2e.go:341: Failed to reach api. Sleeping for 10 seconds before retrying…
After debugging https://gubernator.k8s.io/build/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_aws-ebs-csi-driver/183/pull-aws-ebs-csi-driver-integration/79, the timeout is caused by test is stuck at detaching and the output is not piped back to the aws-k8s-tester.
aws-k8s-test should pipe real time stdout from the test instance.
Currently, we wrap "kubectl" binaries to create/delete Kubernetes resources, which also requires https://github.com/kubernetes-sigs/aws-iam-authenticator binary.
Replace it with https://github.com/kubernetes/client-go.
https://github.com/aws/aws-k8s-tester/blob/master/eks/tester/tester.go could be redefined to reduce redundancy further.
// Tester defines tester.
type Tester interface {
// Name returns the name of the tester.
Name() string
// Create creates test objects, and waits for completion.
Create() error
// Delete deletes all test objects.
Delete() error
}
Some testers persist results on local node, and waiting to be downloaded by log fetcher at the end. Can we just upload it to s3 or run log fetcher more efficiently? Node may go away, and we will lose the results.
Add role name argument to tests which can be used for both EKS cluster creation and node group creation. This way administrator can create roles before hand and run tests with lesser permissions.
When running, the tests do a lot of polling waiting for state changes. It would be really neat to save the times each event took and print as a final log. Later on, we could use the times to delay the first poll until the estimated completion. For some events, this might vary too much, or they are too quick to matter, but for some multi minute actions, like creating or deleting an EKS cluster, we should be able to get some interesting metrics.
aws-k8s-tester v0.5.2
Running aws-k8s-tester delete cluster
after successfully running aws-k8s-tester create cluster
results in an attempt to delete a CloudFormation stack with the ManagedNodeGroupRole and this delete fails but the polling mechanism continues polling for 15 minutes even though DELETE_FAILED
is a terminal state:
{"level":"info","ts":"2020-01-02T09:08:36.117-0500","caller":"eks/managed-node-group-role.go:163","msg":"deleting managed node group role CFN stack","managed-node-group-role-cfn-stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-managed-node-group-role/1c47aba0-2d64-11ea-b0b8-0a134f485a1c"}
{"level":"info","ts":"2020-01-02T09:08:36.348-0500","caller":"cloudformation/cloudformation.go:40","msg":"polling stack","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-managed-node-group-role/1c47aba0-2d64-11ea-b0b8-0a134f485a1c","want":"DELETE_COMPLETE"}
{"level":"info","ts":"2020-01-02T09:08:46.896-0500","caller":"cloudformation/cloudformation.go:112","msg":"sleeping","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-managed-node-group-role/1c47aba0-2d64-11ea-b0b8-0a134f485a1c","initial-wait":"1m0s"}
{"level":"info","ts":"2020-01-02T09:09:46.897-0500","caller":"cloudformation/cloudformation.go:141","msg":"polling","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-managed-node-group-role/1c47aba0-2d64-11ea-b0b8-0a134f485a1c","stack-name":"eks-2020010213-pdx-us-west-2-belcb-managed-node-group-role","current":"DELETE_FAILED","want":"DELETE_COMPLETE","reason":"The following resource(s) failed to delete: [ManagedNodeGroupRole]. ","request-started":"1 minute ago"}
... continues for 15 minutes ...
{"level":"info","ts":"2020-01-02T09:23:26.751-0500","caller":"cloudformation/cloudformation.go:141","msg":"polling","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-managed-node-group-role/1c47aba0-2d64-11ea-b0b8-0a134f485a1c","stack-name":"eks-2020010213-pdx-us-west-2-belcb-managed-node-group-role","current":"DELETE_FAILED","want":"DELETE_COMPLETE","reason":"The following resource(s) failed to delete: [ManagedNodeGroupRole]. ","request-started":"14 minutes ago"}
{"level":"warn","ts":"2020-01-02T09:23:36.348-0500","caller":"cloudformation/cloudformation.go:53","msg":"wait aborted","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-managed-node-group-role/1c47aba0-2d64-11ea-b0b8-0a134f485a1c","error":"context deadline exceeded"}
{"level":"warn","ts":"2020-01-02T09:23:36.394-0500","caller":"eks/eks.go:508","msg":"failed to delete managed node group role","error":"context deadline exceeded"}
{"level":"info","ts":"2020-01-02T09:23:36.394-0500","caller":"eks/eks.go:513","msg":"sleeping before cluster deletion","wait":"10s"}
I believe that when a DELETE_FAILED
is found, even if the expected status is DELETE_COMPLETE
, that the polling agent should exit its loop.
When attempting to delete a cluster that was successfully created using aws-k8s-tester eks create cluster
, I am getting "failed to detach ENI: AuthFailure: You do not have permission to access the specified resource" errors.
Here is the log output of the time in question, showing deleting the VPC CloudFormation stack failing, force-deleting subnets failing because of dependency violations, and then failure to detach ENIs that subnets were dependent on due to permissions issues:
{"level":"info","ts":"2020-01-02T09:35:17.542-0500","caller":"eks/vpc.go:367","msg":"deleting VPC CFN stack","vpc-cfn-stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-vpc/04d0d020-2d62-11ea-a307-02ca0801967a"}
{"level":"info","ts":"2020-01-02T09:35:18.222-0500","caller":"cloudformation/cloudformation.go:40","msg":"polling stack","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-vpc/04d0d020-2d62-11ea-a307-02ca0801967a","want":"DELETE_COMPLETE"}
{"level":"info","ts":"2020-01-02T09:35:38.623-0500","caller":"cloudformation/cloudformation.go:112","msg":"sleeping","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-vpc/04d0d020-2d62-11ea-a307-02ca0801967a","initial-wait":"1m30s"}
{"level":"info","ts":"2020-01-02T09:37:08.624-0500","caller":"cloudformation/cloudformation.go:141","msg":"polling","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-vpc/04d0d020-2d62-11ea-a307-02ca0801967a","stack-name":"eks-2020010213-pdx-us-west-2-belcb-vpc","current":"DELETE_IN_PROGRESS","want":"DELETE_COMPLETE","reason":"","request-started":"1 minute ago"}
{"level":"info","ts":"2020-01-02T09:37:09.090-0500","caller":"cloudformation/cloudformation.go:141","msg":"polling","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-vpc/04d0d020-2d62-11ea-a307-02ca0801967a","stack-name":"eks-2020010213-pdx-us-west-2-belcb-vpc","current":"DELETE_IN_PROGRESS","want":"DELETE_COMPLETE","reason":"","request-started":"1 minute ago"}
{"level":"info","ts":"2020-01-02T09:37:18.627-0500","caller":"cloudformation/cloudformation.go:141","msg":"polling","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-vpc/04d0d020-2d62-11ea-a307-02ca0801967a","stack-name":"eks-2020010213-pdx-us-west-2-belcb-vpc","current":"DELETE_IN_PROGRESS","want":"DELETE_COMPLETE","reason":"","request-started":"2 minutes ago"}
{"level":"info","ts":"2020-01-02T09:37:38.625-0500","caller":"cloudformation/cloudformation.go:141","msg":"polling","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-vpc/04d0d020-2d62-11ea-a307-02ca0801967a","stack-name":"eks-2020010213-pdx-us-west-2-belcb-vpc","current":"DELETE_IN_PROGRESS","want":"DELETE_COMPLETE","reason":"","request-started":"2 minutes ago"}
{"level":"info","ts":"2020-01-02T09:37:58.629-0500","caller":"cloudformation/cloudformation.go:141","msg":"polling","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-vpc/04d0d020-2d62-11ea-a307-02ca0801967a","stack-name":"eks-2020010213-pdx-us-west-2-belcb-vpc","current":"DELETE_IN_PROGRESS","want":"DELETE_COMPLETE","reason":"","request-started":"2 minutes ago"}
{"level":"info","ts":"2020-01-02T09:38:18.748-0500","caller":"cloudformation/cloudformation.go:141","msg":"polling","stack-id":"arn:aws:cloudformation:us-west-2:750630568209:stack/eks-2020010213-pdx-us-west-2-belcb-vpc/04d0d020-2d62-11ea-a307-02ca0801967a","stack-name":"eks-2020010213-pdx-us-west-2-belcb-vpc","current":"DELETE_IN_PROGRESS","want":"DELETE_COMPLETE","reason":"","request-started":"3 minutes ago"}
{"level":"warn","ts":"2020-01-02T09:38:18.748-0500","caller":"eks/vpc.go:398","msg":"deleting VPC for longer than 3 minutes; initiating force deletion","vpc-id":"vpc-0ac41feb956cec704"}
{"level":"warn","ts":"2020-01-02T09:38:19.344-0500","caller":"eks/vpc.go:408","msg":"tried force-delete subnet","subnet-id":"subnet-07c710757186382b4","error":"DependencyViolation: The subnet 'subnet-07c710757186382b4' has dependencies and cannot be deleted.\n\tstatus code: 400, request id: 5382ce4a-7e12-4d22-816b-052329cc54b4"}
{"level":"warn","ts":"2020-01-02T09:38:19.789-0500","caller":"eks/vpc.go:408","msg":"tried force-delete subnet","subnet-id":"subnet-0be1986fa232621a5","error":"DependencyViolation: The subnet 'subnet-0be1986fa232621a5' has dependencies and cannot be deleted.\n\tstatus code: 400, request id: 81078339-d414-440c-9b73-41cb1f8627b1"}
{"level":"warn","ts":"2020-01-02T09:38:20.237-0500","caller":"eks/vpc.go:408","msg":"tried force-delete subnet","subnet-id":"subnet-01bd134698a8c42b7","error":"DependencyViolation: The subnet 'subnet-01bd134698a8c42b7' has dependencies and cannot be deleted.\n\tstatus code: 400, request id: cdc65440-5a33-48de-95f7-17304ae3a5ce"}
{"level":"warn","ts":"2020-01-02T09:38:20.237-0500","caller":"eks/vpc.go:422","msg":"cleaning VPC dependencies","vpc-id":"vpc-0ac41feb956cec704"}
{"level":"info","ts":"2020-01-02T09:38:20.722-0500","caller":"eks/vpc.go:438","msg":"found ENI","eni-id":"eni-0afc8302280e22af7"}
{"level":"info","ts":"2020-01-02T09:38:20.722-0500","caller":"eks/vpc.go:438","msg":"found ENI","eni-id":"eni-0f263269a369cd713"}
{"level":"info","ts":"2020-01-02T09:38:20.722-0500","caller":"eks/vpc.go:438","msg":"found ENI","eni-id":"eni-020ecdf50af3a410f"}
{"level":"info","ts":"2020-01-02T09:38:20.722-0500","caller":"eks/vpc.go:438","msg":"found ENI","eni-id":"eni-06ae894f5173ffb23"}
{"level":"info","ts":"2020-01-02T09:38:20.722-0500","caller":"eks/vpc.go:438","msg":"found ENI","eni-id":"eni-0958fe720f0008ec1"}
{"level":"info","ts":"2020-01-02T09:38:20.722-0500","caller":"eks/vpc.go:438","msg":"found ENI","eni-id":"eni-097418533ad55149d"}
{"level":"warn","ts":"2020-01-02T09:38:20.722-0500","caller":"eks/vpc.go:449","msg":"detaching ENI","eni-id":"eni-0afc8302280e22af7"}
{"level":"error","ts":"2020-01-02T09:38:21.150-0500","caller":"eks/vpc.go:476","msg":"failed to detach ENI","eni-id":"eni-0afc8302280e22af7","error":"AuthFailure: You do not have permission to access the specified resource.\n\tstatus code: 400, request id: 8be91608-f92e-4e1d-85b5-588bbefe4631","stacktrace":"github.com/aws/aws-k8s-tester/eks.(*Tester).deleteVPC\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/vpc.go:476\ngithub.com/aws/aws-k8s-tester/eks.(*Tester).down\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/eks.go:526\ngithub.com/aws/aws-k8s-tester/eks.(*Tester).Down\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/eks.go:546\ngithub.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/eks.deleteClusterFunc\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/eks/delete.go:48\ngithub.com/spf13/cobra.(*Command).execute\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:830\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:914\ngithub.com/spf13/cobra.(*Command).Execute\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:864\nmain.main\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/main.go:37\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
{"level":"error","ts":"2020-01-02T09:38:26.652-0500","caller":"eks/vpc.go:476","msg":"failed to detach ENI","eni-id":"eni-0afc8302280e22af7","error":"AuthFailure: You do not have permission to access the specified resource.\n\tstatus code: 400, request id: 40bb0efd-8f1b-472f-9daa-0a2bdc321a0c","stacktrace":"github.com/aws/aws-k8s-tester/eks.(*Tester).deleteVPC\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/vpc.go:476\ngithub.com/aws/aws-k8s-tester/eks.(*Tester).down\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/eks.go:526\ngithub.com/aws/aws-k8s-tester/eks.(*Tester).Down\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/eks.go:546\ngithub.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/eks.deleteClusterFunc\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/eks/delete.go:48\ngithub.com/spf13/cobra.(*Command).execute\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:830\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:914\ngithub.com/spf13/cobra.(*Command).Execute\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:864\nmain.main\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/main.go:37\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
{"level":"error","ts":"2020-01-02T09:38:32.156-0500","caller":"eks/vpc.go:476","msg":"failed to detach ENI","eni-id":"eni-0afc8302280e22af7","error":"AuthFailure: You do not have permission to access the specified resource.\n\tstatus code: 400, request id: dc6721a1-77b6-4f27-b6a3-f0eadb13b73d","stacktrace":"github.com/aws/aws-k8s-tester/eks.(*Tester).deleteVPC\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/vpc.go:476\ngithub.com/aws/aws-k8s-tester/eks.(*Tester).down\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/eks.go:526\ngithub.com/aws/aws-k8s-tester/eks.(*Tester).Down\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/eks.go:546\ngithub.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/eks.deleteClusterFunc\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/eks/delete.go:48\ngithub.com/spf13/cobra.(*Command).execute\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:830\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:914\ngithub.com/spf13/cobra.(*Command).Execute\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:864\nmain.main\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/main.go:37\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
{"level":"error","ts":"2020-01-02T09:38:37.716-0500","caller":"eks/vpc.go:476","msg":"failed to detach ENI","eni-id":"eni-0afc8302280e22af7","error":"AuthFailure: You do not have permission to access the specified resource.\n\tstatus code: 400, request id: 91b030ef-4120-4d85-9dd9-f574475eb1d5","stacktrace":"github.com/aws/aws-k8s-tester/eks.(*Tester).deleteVPC\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/vpc.go:476\ngithub.com/aws/aws-k8s-tester/eks.(*Tester).down\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/eks.go:526\ngithub.com/aws/aws-k8s-tester/eks.(*Tester).Down\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/eks/eks.go:546\ngithub.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/eks.deleteClusterFunc\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/eks/delete.go:48\ngithub.com/spf13/cobra.(*Command).execute\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:830\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:914\ngithub.com/spf13/cobra.(*Command).Execute\n\t/Users/leegyuho/go/pkg/mod/github.com/spf13/[email protected]/command.go:864\nmain.main\n\t/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/main.go:37\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
I have no idea why the user would not have permission to access a resource that it had created just a few minutes earlier when creating the cluster itself... :(
If deletion of namespace fails because of resource not found, https://github.com/aws/aws-k8s-tester/blame/7708d34f8a152b8878746dabb2c355cc576a2e10/eks/namespace.go#L40 then we throw 4xx. We should instead handle it similar to
Test run in CNI repo where test failed
https://app.circleci.com/jobs/github/aws/amazon-vpc-cni-k8s/185
aws-k8s-tester EKS deleteNamespace ("/tmp/cni-test/cluster-cni-test-24392/cni-test-24392.yaml", 'kubectl --kubeconfig=/tmp/cni-test/cluster-cni-test-24392/kubeconfig --namespace=cni-test-24392')
{"level":"info","ts":"2020-02-27T08:00:27.389Z","caller":"eks/namespace.go:40","msg":"deleting namespace","namespace":"cni-test-24392"}
{"level":"warn","ts":"2020-02-27T08:00:28.105Z","caller":"eks/eks.go:787","msg":"failed to delete namespace","namespace":"cni-test-24392","error":"namespaces \"cni-test-24392\" not found"}
I'd like to know how to configure the setup and what variables are available
Scale up existing ng/mng from x to y.
how to reproduce:
aws-k8s-tester eks create cluster --path <config>
...
aws-k8s-tester eks delete cluster --path <config>
...
run aws-k8s-tester eks create cluster --path <config>
again and will fail with the following error:
createCluster ("/tmp/kubetest2.eks.fuyecheng", "/tmp/kubectl-test-1.14.10 --kubeconfig=/tmp/kubetest2.eks.fuyecheng.kubeconfig.yaml")
{"level":"info","ts":"2020-02-20T14:44:18.182+0800","caller":"eks/role.go:130","msg":"non-empty role given; no need to create a new one"}
{"level":"info","ts":"2020-02-20T14:44:18.182+0800","caller":"eks/vpc.go:259","msg":"non-empty VPC given; no need to create a new one"}
{"level":"info","ts":"2020-02-20T14:44:18.182+0800","caller":"eks/cluster.go:111","msg":"non-empty cluster given; no need to create a new one"}
{"level":"info","ts":"2020-02-20T14:44:18.184+0800","caller":"eks/kubeconfig.go:83","msg":"writing KUBECONFIG with 'aws eks update-kubeconfig'","kubeconfig-path":"/tmp/kubetest2.eks.fuyecheng.kubeconfig.yaml","aws-cli-path":"/usr/local/bin/aws","aws-args":["eks","--region=us-west-2","update-kubeconfig","--name=fuyecheng","--kubeconfig=/tmp/kubetest2.eks.fuyecheng.kubeconfig.yaml","--verbose"]}
Up.defer start ("/tmp/kubetest2.eks.fuyecheng", "/tmp/kubectl-test-1.14.10 --kubeconfig=/tmp/kubetest2.eks.fuyecheng.kubeconfig.yaml")
{"level":"warn","ts":"2020-02-20T14:44:19.805+0800","caller":"eks/eks.go:437","msg":"Up failed","request-started":"1 second ago","error":"'aws eks update-kubeconfig' failed (output \"\\nAn error occurred (ResourceNotFoundException) when calling the DescribeCluster operation: No cluster found for name: fuyecheng.\\n\", error exit status 254)"}
{"level":"warn","ts":"2020-02-20T14:44:19.805+0800","caller":"eks/eks.go:448","msg":"reverting resource creation"}
{"level":"info","ts":"2020-02-20T14:44:19.805+0800","caller":"eks/eks.go:451","msg":"waiting before clean up","wait":"1m40s"}
...
the role/vpc/cluster names exist, however they were deleted in aws. maybe we can empty them in eks delete cluster
phase?
CSI integration test is no longer used. We should deprecate it by either remove the code or mark it as deprecated feature.
This could be helpful for user who is trying to understand what is required for the tool to run
chmod +x /tmp/aws-k8s-tester
/tmp/aws-k8s-tester csi test integration --terminate-on-exit=true --timeout=20m --csi=166
{"level":"info","ts":1546496846.499395,"caller":"csi/test.go:68","msg":"starting CSI integration tests","csi":"166","timeout":1200}
{"level":"info","ts":1546496846.5020103,"caller":"csi/iam.go:119","msg":"created session and IAM client"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xc799fa]
goroutine 1 [running]:
github.com/aws/aws-k8s-tester/internal/csi.(*iamResources).deleteIAMResources(0x0, 0xc0000e0000, 0xc0005b56b0)
/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/internal/csi/iam.go:195 +0x3a
github.com/aws/aws-k8s-tester/internal/csi.createIAMResources.func1(0xc0005b5a58, 0xc0005b5a50)
/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/internal/csi/iam.go:102 +0x56
github.com/aws/aws-k8s-tester/internal/csi.createIAMResources(0x183d8dd, 0x9, 0x0, 0x1a914a0, 0xc0002f0130)
/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/internal/csi/iam.go:129 +0x70a
github.com/aws/aws-k8s-tester/internal/csi.createPermissions(0xc000472500, 0xc000443830, 0x1, 0xc0002eeac0)
/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/internal/csi/csi.go:68 +0x3c
github.com/aws/aws-k8s-tester/internal/csi.NewTester(0xc000472500, 0x1840001, 0xc, 0x7ffc5ac45537, 0x3)
/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/internal/csi/csi.go:34 +0x80
github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/csi.testIntegrationFunc(0xc000462f00, 0xc000443680, 0x0, 0x3)
/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/csi/test.go:75 +0x512
github.com/aws/aws-k8s-tester/vendor/github.com/spf13/cobra.(*Command).execute(0xc000462f00, 0xc0004435c0, 0x3, 0x3, 0xc000462f00, 0xc0004435c0)
/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/vendor/github.com/spf13/cobra/command.go:766 +0x2cc
github.com/aws/aws-k8s-tester/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x2a04f00, 0xc00046c780, 0xc000468780, 0xc000468280)
/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/vendor/github.com/spf13/cobra/command.go:852 +0x2fd
github.com/aws/aws-k8s-tester/vendor/github.com/spf13/cobra.(*Command).Execute(0x2a04f00, 0xc000462000, 0xc0003fbf88)
/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/vendor/github.com/spf13/cobra/command.go:800 +0x2b
main.main()
/Users/leegyuho/go/src/github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/main.go:46 +0x31
We should try to change this soon - in an effort to reduce our custom dependencies.
/cc @gyuho
https://github.com/aws/aws-k8s-tester/blob/master/go.mod#L5 pulls in a now non-existent version of Apache Thrift.
Recommend it is bumped (as a newer release references to new Github repo for Apache Thrift), or deleted, as it's not clear where it's being used.
Affects kubernetes/test-infra#14172
For example:
github.com/aws/aws-k8s-tester/eks: create zip: malformed file path "eks/cluster-loader/artifacts/MetricsForE2E_load_2020-06-18T22:18:53-07:00.json": invalid char ':'
Go seems to be creating a zip archive which doesn't import those characters which in turn leads to the non-importability. Worse, fetching it with the proxy results in a 410 which I believe is originally caused by the inability to create a zip archive.
I meet an issue that if I don't have ~/.aws/credential but using AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
instead. I can not create cluster and it fails in config.ValidateAndSetDefaults().
aws-k8s-tester/eksconfig/config.go
Lines 779 to 781 in ee248f9
aws-k8s-tester eks create cluster --path ./aws-k8s-tester-eks.yaml
failed to validate configuration "./aws-k8s-tester-eks.yaml" (AWSCredentialToMountPath "/Users/myname/.aws/credentials" does not exist)
Credential file and env are all native experiences we need to support. The use case is I mount aws secret on the pod which is used to create cluster.
Very low priority, I can help add this support later.
Separated into a new issue from #44
Seeing an NPE:
$ ./aws-k8s-tester eks create cluster --path ./xyz-config
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x88 pc=0x19ce62e]
goroutine 1 [running]:
github.com/aws/aws-k8s-tester/eksconfig.(*Config).ValidateAndSetDefaults(0xc000276000, 0xd, 0xc000276000)
/Volumes/Unix/Projects/aws-k8s-tester/eksconfig/config.go:663 +0x97e
github.com/aws/aws-k8s-tester/cmd/aws-k8s-tester/eks.createClusterFunc(0xc00035cf00, 0xc0002009a0, 0x0, 0x2)
/Volumes/Unix/Projects/aws-k8s-tester/cmd/aws-k8s-tester/eks/create.go:72 +0xa0
github.com/spf13/cobra.(*Command).execute(0xc00035cf00, 0xc000200880, 0x2, 0x2, 0xc00035cf00, 0xc000200880)
/Volumes/Unix/Projects/go/pkg/mod/github.com/spf13/[email protected]/command.go:766 +0x2ae
github.com/spf13/cobra.(*Command).ExecuteC(0x42b9040, 0xc0000d8c80, 0xc0001b6280, 0xc000214000)
/Volumes/Unix/Projects/go/pkg/mod/github.com/spf13/[email protected]/command.go:852 +0x2ec
github.com/spf13/cobra.(*Command).Execute(...)
/Volumes/Unix/Projects/go/pkg/mod/github.com/spf13/[email protected]/command.go:800
main.main()
/Volumes/Unix/Projects/aws-k8s-tester/cmd/aws-k8s-tester/main.go:45 +0x32
My config:
$ cat fico-config
aws-credential-to-mount-path: /Users/nic/.aws/credentials
aws-iam-authenticator-download-url: https://amazon-eks.s3-us-west-2.amazonaws.com/1.12.7/2019-03-27/bin/darwin/amd64/aws-iam-authenticator
aws-iam-authenticator-path: /tmp/aws-k8s-tester/aws-iam-authenticator
aws-k8s-tester-download-url: https://github.com/aws/aws-k8s-tester/releases/download/0.2.9/aws-k8s-tester-0.2.9-darwin-amd64
aws-k8s-tester-path: /tmp/aws-k8s-tester/aws-k8s-tester
aws-region: us-west-2
cf-stack-vpc-parameter-subnet-01-block: "192.168.0.0/25"
cf-stack-vpc-parameter-subnet-02-block: "192.168.0.128/25"
cf-stack-vpc-parameter-subnet-03-block: "192.168.1.0/24"
cf-stack-vpc-parameter-vpc-block: "192.168.0.0/23"
cluster-name: a8-eks-190426-xyz-test
cluster-state:
created: "0001-01-01T00:00:00Z"
status-cluster-created: false
status-key-pair-created: false
status-policy-attached: false
status-role-created: false
status-vpc-created: false
status-worker-node-created: false
config-path: /Volumes/Unix/Projects/aws-k8s-tester/cni-test/xyz-config
down: true
eks-custom-endpoint: ""
enable-worker-node-ha: true
enable-worker-node-privileged-port-access: true
enable-worker-node-ssh: true
kubeconfig-path: /Users/nic/.kube/aws-k8s-tester/kubeconfig-xyz-test
kubectl-download-url: https://amazon-eks.s3-us-west-2.amazonaws.com/1.12.7/2019-03-27/bin/darwin/amd64/kubectl
kubectl-path: /tmp/aws-k8s-tester/kubectl
kubernetes-version: "1.12"
log-access: false
log-debug: false
log-outputs:
- stderr
security-group-id: ""
subnet-ids: null
tag: a8-eks-190426
updated-at: "2019-04-26T21:10:05.169623Z"
upload-bucket-expire-days: 2
upload-kubeconfig: false
upload-tester-logs: false
upload-worker-node-logs: false
vpc-id: ""
wait-before-down: 60000000000
worker-node-ami: ami-0923e4b35a30a5f53
worker-node-asg-desired-capacity: 1
worker-node-asg-max: 1
worker-node-asg-min: 1
worker-node-instance-type: m3.xlarge
worker-node-private-key-path: /Users/nic/.ssh/kube_aws_rsa-xyz-test
worker-node-volume-size-gb: 20
There are a bunches of lingering role in the test account after test is finished. Some of them are created in Jan. Seems there is a resource cleaning issue needs to be addressed.
We have several open source projects that need to be tested with utilities to create/tear down cluster, setting up containers under test. However, we developed similar testing utilities in different places which makes it harder to maintain and share any improvements to the utility. Following are the list of testing utilities:
We are going to use this repo to share golang testing code that could be used by multiple open source projects including EBS/EFS/FSx CSI drivers, CNI plugins, App mesh webhook and Alb Ingress controller.
By hosting the code in aws-k8s-tester, we get several benefits:
make
comanddocker build
commandgo test...
run-e2e-test
#49 @leakingtapan @wongma7As an alternative to VPC ID, allow specification of VPC/subnet CIDR ranges.
Currently only supported for Node Groups. Should be ported for MNG, as well.
hi, @gyuho
it seems that test get-worker-node-logs
and test dump-cluster-logs
does not exist anymore. How can I dump cluster logs now?
We are going to have a POC by rewriting the EBS CSI driver's run-e2e-test script. And use it for FSx CSI driver to test out its usability.
The framework will implement the same logic as implemented in the run-e2e-test script. And it will be configurable to be consumed by different projects. Assume running the test as the project root. The configured will be exposed through a configuration file:
k8s-test-config.yaml:
cluster:
aws:
region: us-west-2
nodeCount: 3
nodeSize: c5.large
kubernetesVersion: 1.14
kubeAPIServer:
featureGates:
CSIDriverRegistry: "true"
CSINodeInfo: "true"
CSIBlockVolume: "true"
VolumeSnapshotDataSource: "true"
kubelet:
featureGates:
CSIDriverRegistry: "true"
CSINodeInfo: "true"
CSIBlockVolume: "true"
build: |
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
IMAGE_TAG=$TEST_ID
IMAGE_NAME=$AWS_ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/aws-ebs-csi-driver
docker build -t $IMAGE_NAME:$IMAGE_TAG .
install: |
echo "Deploying driver"
source $(dirname "${BASH_SOURCE}")/utils/helm.sh
helm::install
helm::init
helm::wait_tiller
helm install --name aws-ebs-csi-driver \
--set enableVolumeScheduling=true \
--set enableVolumeResizing=true \
--set enableVolumeSnapshot=true \
--set image.repository=$IMAGE_NAME \
--set image.tag=$IMAGE_TAG \
./aws-ebs-csi-driver
uninstall: |
echo "Removing driver"
helm del --purge aws-ebs-csi-driver
test: |
go get -u github.com/onsi/ginkgo/ginkgo
export KUBECONFIG=$HOME/.kube/config
ginkgo -p -nodes=$NODES -v --focus="$FOCUS" tests/e2e -- -report-dir=$ARTIFACTS
/cc @wongma7
ref: aws/amazon-vpc-cni-k8s#770
We want to use an updated aws-k8s-tester binary (UpdateFromEnv was added in 0.5.1) but unfortunately v0.5.1 fails for me because it attempts to chmod
the aws CLI:
failed to create EKS deployer error doing chmod on "/usr/local/bin/aws": chmod /usr/local/bin/aws: operation not permitted
Note that I have the aws binary already installed and is world-executable:
jaypipes@thelio:~/go/src/github.com/aws/amazon-vpc-cni-k8s$ ls -l `which aws`
-rwxr-xr-x 1 root root 815 Jun 28 10:25 /usr/local/bin/aws
so I'm not sure why aws-k8s-tester is trying to chmod anything.
Running Ubuntu 18.04.3 LTS
Trying to install kubetest on ubuntu master node in kubernetes cluster I get this
go get -u k8s.io/test-infra/kubetest package github.com/aws/aws-k8s-tester/ekstester: cannot find package "github.com/aws/aws-k8s-tester/ekstester" in any of: /usr/lib/go-1.13/src/github.com/aws/aws-k8s-tester/ekstester (from $GOROOT) /home/vagrant/go/src/github.com/aws/aws-k8s-tester/ekstester (from $GOPATH)
Currently, env var overrides are only possible for kubetest. These would be helpful when creating a cluster or creating a config, however, so that we more easily control the cluster creation in a script without using sed on the config file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.