nwcdheap / kops-cn Goto Github PK

AWS**宁夏区域/北京区域，快速Kops部署K8S集群

License: Apache License 2.0

Shell 76.60% Makefile 23.40%

kops-cn's Introduction

免责说明

建议测试过程中使用此方案，生产环境使用请自行考虑评估。
当您对方案需要进一步的沟通和反馈后，可以联系 [email protected] 获得更进一步的支持。
欢迎联系参与方案共建和提交方案需求, 也欢迎在 github 项目issue中留言反馈bugs。

重要通知

随着EKS在China区域的推出，本项目逐步停止维护，计划于2020年10月31日关闭本项目，相关镜像文件将删除。
各位可参考新项目container-mirror使用相关镜像。
即日起，本项目镜像地址有变更，由937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn变更为048912060910.dkr.ecr.cn-north-1.amazonaws.com.cn。

kops-cn项目介绍

本项目用于指导客户使用开源自动化部署工具Kops在AWS宁夏区域或北京区域搭建K8S集群。本项目已经将K8S集群搭建过程中需要拉取的镜像或文件拉回国内，因此您无需任何翻墙设置。

特性

集群创建过程中所需的docker镜像已存放在北京区域的Amazon ECR中。
集群创建过程中所需的二进制文件或配置文件已存放在北京区域的Amazon S3桶中。
简单快速的集群搭建和部署
无需任何VPN代理或翻墙设置
如有新的Docker镜像拉取需求，您可以创建Github push or pull request,您的request会触发CodeBuild(buildspec-nwcd.yml) 去拉取镜像并存放到AWS cn-north-1 的ECR中。查看：镜像列表.
一个make create-cluster命令即可创建集群

当前版本

现仅提供1.15版本

主版本	Kops最新版本	K8s搭配版本	AMI
1.15	1.15.2(#118)	1.15.10	kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13（#96）

步骤

下载项目到本地

$ git clone https://github.com/nwcdlabs/kops-cn
$ cd kops-cn

在本机安装kops and kubectl命令行客户端：安装指导

您也可以直接从以下链接的AWS**区域的S3桶中下载 kops and kubectl 的二进制文件：

kops_version='1.15.2'
k8s_version='v1.15.10'
#
# for Linux Environment
#
# download kops for linux
$ curl -L https://s3.cn-northwest-1.amazonaws.com.cn/kops-file/fileRepository/kops/$kops_version/linux/amd64/kops -o kops
$ chmod +x $_

# download kubectl for linux
$ curl -L https://s3.cn-northwest-1.amazonaws.com.cn/kops-file/fileRepository/kubernetes-release/release/$k8s_version/bin/linux/amd64/kubectl -o kubectl
$ chmod +x $_

#
# for Mac OS X Environment
#

# download kops for mac os x
$ curl -L https://s3.cn-northwest-1.amazonaws.com.cn/kops-file/fileRepository/kops/$kops_version/darwin/amd64/kops -o kops
$ chmod +x $_

# download kubectl for mac os x
$ curl -L https://s3.cn-northwest-1.amazonaws.com.cn/kops-bjs/fileRepository/kubernetes-release/release/$k8s_version/bin/darwin/amd64/kubectl -o kubectl
$ chmod +x $_


#
# 将kops与kubectl放到$PATH
#
$ sudo mv ./kops /usr/local/bin/
$ sudo mv ./kubectl /usr/local/bin/

#
# 再次确认kops and kubectl是当前稳定版本
#
$ kops version
$ kubectl version

请注意

如果你之前安装过kops客户端或者曾经升级了kops，请再次使用kops version确认kops客户端是否为最新稳定版本，版本不一致可能会造成功能上的问题。

编辑 Makefile文件. 您需要设置如下变量

Name	Description	values
TARGET_REGION	选择将集群部署在aws北京或宁夏区域	cn-north-1 or cn-northwest-1
AWS_PROFILE	选择制定其他不同的`AWS_PROFILE`	default
KOPS_STATE_STORE	您需要提供一个S3桶给KOPS存放配置信息	`s3://YOUR_S3_BUCKET_NANME`
VPCID	选择将您的集群部署在哪个VPC中	vpc-xxxxxxxx
MASTER_COUNT	master节点的机器数量	3 (建议不要修改)
MASTER_SIZE	master节点的机器类型
NODE_SIZE	工作节点的机器类型
NODE_COUNT	工作节点的机器数量
SSH_PUBLIC_KEY	本地ssh公钥的存放路径(或参考这里生成一个新的公钥)	~/.ssh/id_rsa.pub [default]
KUBERNETES_VERSION	指定kubernetes版本	(建议不要修改)
KOPS_VERSION	指定kops版本	(建议不要修改)

创建集群

make create-cluster

编辑集群

make edit-cluster

将 spec-nwcd.yml 中内容贴到spec 下并保存退出。

更新集群

make update-cluster

完成

验证

集群的创建大概需要 3-5 分钟时间。之后，使用

kops validate cluster

或是

make validate-cluster

来验证集群是否是 ready状态。

查看集群对外接口信息、版本信息

恭喜您已顺利完成!

最后，您可以这样删除整个集群资源

make delete-cluster

插件安装

FAQ

目前使用什麼AMI，可以使用其他AMI吗？例如CentoOS, Amazon Linux 2等？

目前缺省AMI使用Debian Linux AMI，也是Kops上游的标准AMI(说明), 然而Kops官方并没有直接发布AMI到cn-west-1 and cn-northwest-1(请大家帮顶一下这个issue), 在官方AMI直接发布到这两个China regions之前，我们手动的把AMI透过这里说明的方式取回北京与宁夏Region，需要特别提醒这AMI并不是官方直接发布，建议大家也可以自行透过上面链接的说明，透过这里查找官方现在对应最新的Debian AMI自行取回国内。

除了Debian Linux之外，其他官方支持的AMI理论上也都支持，但也可能存在一些已知的问题，请参考#91 and #96。

集群验证失败?

查看 issue #5

如何SSH上master节点和worker节点 ?

查看 issue #6, 需要注意Debian AMI ssh方式为 ssh admin@IP

我可以把master nodes运行在private subnet吗？如何配置？

参考这个说明:#94

我需要的docker镜像在ECR中不存在.

aws北京区域ECR中的镜像仓库containerRegistry 中的已有镜像见required-images-mirrored.txt, (參考#105)如您在集群创建过程中需要其他镜像, 请您编辑 required-images.txt ，这将会在您的GitHub账户中 fork 一个新的分支，之后您可以提交PR（pull request）。 Merge您的PR会触发CodeBuild 去拉取 required-images.txt 中定义的镜像回ECR库。数分钟后，您可以看到图标从in progress变为passing

当前状态：

required-images相关文档用途

filename	description
required-images.txt	新的需求镜像可以透过PR来共同编辑记录在这里，触发CD来mirror
required-imags-mirrored.txt	已经mirror完成的镜像列表, 请勿提交PR修改这个文档
required-images-daily.txt	部分image需要每日自动mirror更新一次的清单

如何得知`required-images.txt`在ECR所对应的完整路径？

参考这里

查看所有FAQs 这里

kops-cn's People

Contributors

Stargazers

Watchers

kops-cn's Issues

How to specify c5.large spot instance in Ningxia region?

C5 instance type with Nitro hypervisor is available in Ningxia and Beijing regions and it has very great cost-price ratio.

Let's make a note on how to build kops-cn in Ningxia with C5 spot instance to save the cost up to +60% off.

aws-alb-ingress-controller need acm support

If you try to deploy aws-alb-ingress-controller in bjs&zhy region, it will need acm service. While ACM is not currently supported in ZHY&BJS YET.

The issue link in aws-alb-ingress-controller project is right here:
kubernetes-sigs/aws-load-balancer-controller#439

TODO - update to kops 1.11.0 (k8s 1.11.6)

https://github.com/kubernetes/kops/releases/tag/1.11.0
https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
recommendedVersion: 1.11.6

can't create cluster in China - error fetching https://.../channels/stable

summary

cluster can't be created if the kops client has poor internet connection to github. You can test the connectivity by

curl https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable

how to re-produce this issue

turning on -v flag in kops create

kops create cluster \
     -v 9 \
     --cloud=aws \
...

running create-cluster.sh

9801a7a9620b:kops-cn hunhsieh $ bash create-cluster.sh
I0129 01:36:33.093275 18727 create_cluster.go:1407] Using SSH public key: /Users/hunhsieh/.ssh/id_rsa.pub
I0129 01:36:33.093795 18727 factory.go:68] state store s3://pahud-kops-state-store-zhy
I0129 01:36:33.342730 18727 s3context.go:194] found bucket in region "cn-northwest-1"
I0129 01:36:33.342805 18727 s3fs.go:220] Reading file "s3://pahud-kops-state-store-zhy/cluster.zhy.k8s.local/config"
I0129 01:36:33.487717 18727 channel.go:97] resolving "stable" against default channel location "https://raw.githubusercontent.com/kubernetes/kops/master/channels/"
I0129 01:36:33.487769 18727 channel.go:102] Loading channel from "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable"
I0129 01:36:33.489467 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:37:03.492838 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:37:03.993923 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:37:33.997853 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:37:35.000502 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:38:05.001673 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:38:07.002652 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:38:37.009644 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:38:41.010692 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:39:11.012203 18727 context.go:231] hit maximum retries 5 with error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout

error reading channel "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout

how to ssh the master or node

after finish creating the cluster, when I use ssh
"ssh -i ~/.ssh/id_rsa.pub [email protected]"
to login the master or node, it need to input the password, what's the password?

fix CVE-2018-1002105

https://github.com/nwcdlabs/kops-cn/blob/6a13f9d831c6c68792ba04219a06b97a4013ff26/mirror/fileRepository/mirro.sh#L5

mirror fileRepository 1.10.11 to China
kubernetes/kops#6146

How to get the ECR repo full path from required-images.txt ?

git clone the repo and make sure the latest required-images.txt and display-remote-repos.sh are in the ./mirror sub-directory.

$ cd ./mirror
$ bash display-remote-repos.sh

You'll immediate get the full ECR repo path.

$ bash display-remote-repos.sh 
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kope-dns-controller:1.11.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-dnsmasq-nanny-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-sidecar-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-kube-dns-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/cluster-proportional-autoscaler-amd64:1.1.2-r2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/coredns:1.1.3
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/coredns:1.2.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:2.2.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/pause-amd64:3.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-controller-manager:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-scheduler:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-proxy:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-apiserver:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io-heptio-images-authenticator:v0.3.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/602401143452.dkr.ecr.us-west-2.amazonaws.com-amazon-k8s-cni:v1.3.2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-coreos-flannel:v0.10.0-amd64
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/ottoyiu-k8s-ec2-srcdst:v0.2.0-3-gc0c26eca
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/ottoyiu-k8s-ec2-srcdst:v0.2.2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/894847497797.dkr.ecr.us-west-2.amazonaws.com-aws-alb-ingress-controller:v1.1.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-node:v3.4.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-cni:v3.4.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-node:v2.6.12
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-cni:v1.11.8
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-kube-controllers:v1.0.5
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-kube-policy-controller:v0.7.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-calico-upgrade:v1.0.5
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/defaultbackend:1.4
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-kubernetes-ingress-controller-nginx-ingress-controller:0.20.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:3.2.18
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:3.2.24
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kubernetes-dashboard-amd64:v1.10.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxy_init:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxyv2:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/galley:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/pilot:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/mixer:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/kubectl:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/sidecar_injector:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxy_init:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxyv2:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/galley:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/pilot:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/mixer:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/kubectl:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/sidecar_injector:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io-kubernetes-helm-tiller:v2.12.3

update aws alb ingress document to support v1.0.1

https://github.com/nwcdlabs/kops-cn/blob/master/doc/aws-alb-ingress_en.md

new image URL should be: 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/894847497797.dkr.ecr.us-west-2.amazonaws.com-aws-alb-ingress-controller:v1.0.1

How is secret managed?

Where do you store $ak_kms_cipherblob? Not quite familiar with CodeBuild.

https://github.com/nwcdlabs/kops-cn/blob/448b5fc45d47d525c9db6c6e54dfbb19a34b1c73/mirror/codebuild/bin/bjs_ecr_auth.sh#L6

k8s.gcr.io/kube-proxy:v1.11.6 is missing

After upgrading to 1.11.6, the kube-proxy pod failed to come up since this image is mirrored. (I guess it is the same for many other 1.11.6 image)

I wonder if we can just have automatic script to enumerate all versions of the white listed images and make a mirror of it, so that we don't need to worry about that any more.

Can't pull image for Istio release-1.0-latest-daily

Got ImagePullBackOff errors during install istio

Events:
  Type     Reason                 Age                From                                                  Message
  ----     ------                 ----               ----                                                  -------
  Normal   Scheduled              22m                default-scheduler                                     Successfully assigned istio-citadel-5768b899d4-jg226 to ip-172-31-60-71.cn-north-1.compute.internal
  Normal   SuccessfulMountVolume  22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  MountVolume.SetUp succeeded for volume "istio-citadel-service-account-token-skxg9"
  Warning  Failed                 22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Failed to pull image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily": rpc error: code = Unknown desc = Error response from daemon: manifest for 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily not found
  Warning  Failed                 22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Error: ErrImagePull
  Normal   BackOff                22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Back-off pulling image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily"
  Warning  Failed                 22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Error: ImagePullBackOff
  Normal   Pulling                22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  pulling image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily"

require multiple

https://github.com/nwcdlabs/kops-cn/blob/d3ce3f04487ae7585a2989935c4bfd360d9d2485/mirror/required-images.txt#L48

node NotReady after create and update

when i create cluster and edit then update cluster follow the instruction with 3 master and 1 node in cn-northwest-1. but the node is NotReady status.

if i change the Makefile set the NETWORKING to flannel-vxlan, it is ok.

i guess because the node have muli private ips, if the primary ip is not the first, the node is NotReady status.

error when: kops edit cluster

kops edit cluster
tried to add:
assets:
containerRegistry: 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn
fileRepository: https://s3.cn-north-1.amazonaws.com.cn/kops-bjs/fileRepository/
docker:
logDriver: ""
registryMirrors:
- https://registry.docker-cn.com

wq: to save, however ,editor re-opens and getting error:
error populating cluster spec: error building complete spec: options did not converge after 10 iterations

i tried to ignore and keep going to the final step : kops validate cluster
unexpected error during validation: error listing nodes: Get https://api-cluster-zhy-k8s-local-qpbf7n-1465482247.cn-northwest-1.elb.amazonaws.com.cn/api/v1/nodes: EOF

go to aws console, find:
the three master instances out of service.
i checked the security group rules are all right.
i already made icp exception for 80/8080/443 , and i can telnet elb:443.

and i googled it ,find similar issue: kubernetes/kops#5061

how to fix it ?

HOWTO - create multiple clusters in a VPC

這個範例示範如何在寧夏regon同一個VPC裡面創建兩個kops集群，並且指定不同的cluster_name

cluster name

1st cluster: cluster1.zhy.k8s.local
2nd cluster: cluster2.zhy.k8s.local
(注意，必須k8s.local結尾)

subnets

在一個VPC裡面準備六個subnet如下，在這個範例我們每個cluster將會用其中三個subnets

Makefiles

準備兩個Makefile分別是cluster1.mk and cluster2.mk 內容範例：

https://github.com/nwcdlabs/kops-cn/blob/master/samples/multi-clusters-in-shared-vpc/cluster1.mk
https://github.com/nwcdlabs/kops-cn/blob/master/samples/multi-clusters-in-shared-vpc/cluster2.mk

創建第一個cluster

create cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk create-cluster

edit cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk edit-cluster

將spec.yml內容貼進去

update cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk update-cluster

validate cluster

切換context到cluster1

$ kubectl config use-context cluster1.zhy.k8s.local

validate cluster1

$ AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk validate-cluster

創建第二個cluster

create cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk create-cluster

edit cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk edit-cluster

將spec.yml內容貼進去

update cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk update-cluster

validate cluster

切換context到cluster2

$ kubectl config use-context cluster2.zhy.k8s.local

validate cluster2

$ AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk validate-cluster

get po

兩個cluster都可以列出所有kube-system內的Pod，全部都正常Running

prepare file assets for K8s 1.11.7 for Kops 1.11.1

We got the upgrade from Kops stream with Kops 1.11.1 and K8s 1.11.7
https://github.com/kubernetes/kops/releases/tag/1.11.1

kubernetes/kops@239d599

https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable

  - range: ">=1.11.0"
    recommendedVersion: 1.11.7
    requiredVersion: 1.11.0

mirror required fileRepository
update required-images.txt 70f5920

error parsing SSH public key: ssh: no key found

I run into following issues when creating the cluster
I0416 09:48:45.708089 3672 create_cluster.go:1407] Using SSH public key: /home/ec2-user/.ssh/id_rsa.pub

error reading cluster configuration "cluster.zhy.k8s.local": error reading s3://liuhongxi-kops-cn/cluster.zhy.k8s.local/config: Unable to list AWS regions: NoCredentialProviders: no valid providers in chain
caused by: EnvAccessKeyNotFound: failed to find credentials in the environment.
SharedCredsLoad: failed to load profile, default.
EC2RoleRequestError: no EC2 instance role found
caused by: EC2MetadataError: failed to make EC2Metadata request
caused by:

<title>404 - Not Found</title>

404 - Not Found

make: *** [create-cluster] Error 1

How to pull alb-ingress-controller docker image from ECR?

According to this:

https://github.com/nwcdlabs/kops-cn/blob/e84b7e00b43fc3209952f1ab788faee7a20e71a0/mirror/required-images.txt#L18

you will be able to pull image from

937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-coreos-alb-ingress-controller:1.0-beta.7

Please note the version tag may change over time.

Add Istio release-1.0-latest-daily image mirror in required-images.txt

#24 (comment)

Primary interface IP is unable to reach, which caused out-of-service behind ELB

I am using Private topology and Internal ELB to create a K8S cluster.

kops create cluster
#omit other parameters
--topology=private 
--networking=amazon-vpc-routed-eni 
--api-loadbalancer-type=internal

After the cluster started, each nodes was assigned with two interfaces. However, I found only one node is in-service behind ELB.

From the working node, the ip route table is:


core@ip-172-20-53-190 ~ $ ip route
default via 172.20.32.1 dev eth0 proto dhcp src 172.20.53.190 metric 1024
default via 172.20.32.1 dev eth1 proto dhcp src 172.20.61.156 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.20.32.0/19 dev eth0 proto kernel scope link src 172.20.53.190
172.20.32.1 dev eth0 proto dhcp scope link src 172.20.53.190 metric 1024
172.20.32.1 dev eth1 proto dhcp scope link src 172.20.61.156 metric 1024

You can see the eth0 is on the top. That explains why you can reach the eth0 IP.

From the other two defunc nodes, the ip route is like:

core@ip-172-20-114-248 ~ $ ip route
default via 172.20.96.1 dev eth1 proto dhcp src 172.20.98.243 metric 1024
default via 172.20.96.1 dev eth0 proto dhcp src 172.20.114.248 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.20.96.0/19 dev eth1 proto kernel scope link src 172.20.98.243
172.20.96.0/19 dev eth0 proto kernel scope link src 172.20.114.248
172.20.96.1 dev eth1 proto dhcp scope link src 172.20.98.243 metric 1024
172.20.96.1 dev eth0 proto dhcp scope link src 172.20.114.248 metric 1024
core@ip-172-20-114-248 ~ $

The entry of eth1 is on the top, thus you can only reach the IP of eth1.

I am not sure if there is something wrong with my creation. Hope someone can help. Thank you.

--Beta

CNI version will roll back to 1.3.0 after upgrade to 1.40

The CNI version will roll back to 1.3.0 after upgrading to 1.3.3 or 1.4.0

Associate instance role with instance group

refer to kubernetes/kops#6215

faster mirror improvement

compare the image digest before docker push

e.g.

  res=$(aws --profile bjs --region $ECR_REGION ecr describe-images --repository-name "$repo" \
--query "imageDetails[?(@.imageDigest=='$2')].contains(@.imageTags, '$tag') | [0]")

   if [ "$res" == "true" ]; then 
    return 0 
  else
    return 1
  fi

If this returns 0 we don't have to push this image to ecr as image already exists.

Create your cluster with exsiting subnet

Customer need to deploy their cluster into existing subnets, while Kops official page is not clear, as you could see here
https://github.com/kubernetes/kops/blob/master/docs/run_in_existing_vpc.md#shared-subnets

Specifically for those parts,

export SUBNET_ID=subnet-12345678 # replace with your subnet id
export SUBNET_CIDR=10.100.0.0/24 # replace with your subnet CIDR
export SUBNET_IDS=$SUBNET_IDS # replace with your comma separated subnet ids

What you really need is to just modify the script like below according to your subnet and zones, and keep other scripts no change in Makefile (PS: Only include necessary parts, )

.PHONY: create-cluster
create-cluster:  
	@KOPS_STATE_STORE=$(KOPS_STATE_STORE) \
	AWS_PROFILE=$(AWS_PROFILE) \
	AWS_REGION=$(AWS_REGION) \
	AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \
	kops create cluster \
     --cloud=aws \
     --name=$(CLUSTER_NAME) \
     --image=$(AMI) \
     --master-count=$(MASTER_COUNT) \
     --master-size=$(MASTER_SIZE) \
     --node-count=$(NODE_COUNT) \
     --node-size=$(NODE_SIZE)  \
     --vpc=$(VPCID) \
     --kubernetes-version=$(KUBERNETES_VERSION_URI) \
     --networking=amazon-vpc-routed-eni \
     --ssh-public-key=$(SSH_PUBLIC_KEY) \
     --zones=cn-northwest-1a,cn-northwest-1b \
     --subnets=subnet-2cf25a45,subnet-9315d7e8

1.Delete the original zone option
2.And then add last two lines.
3.Your subnet‘s order must comply with your zone’s order

error reading hash file from storage.googleapis.com

https://github.com/kubernetes/kops/blob/6cf4f35970ea9690acc9a97cac9cb80d1b05535c/pkg/assets/builder.go#L297

amazon-vpc-routed-eni 节点会出现多个ip导致ssh 不到节点

目前使用默认的ami会导致机器一直重启，所以我看官方又说**区可以通过拷贝的姿势拿到最新的ami：kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17

链接：
https://github.com/kubernetes/kops/blob/master/docs/aws-china.md
kubernetes-retired/kube-aws#390 (comment)

support k8s 1.11.8 and 1.11.9

https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable

 - range: ">=1.11.0"
    recommendedVersion: 1.11.8

mirror fileRepository for 1.11.8
mirror fileRepository for 1.11.9

ref: aws/containers-roadmap#188

cann't validate the cluster

after finish creating the cluster, all the related services are been created sucessfully, but when I "kops validate cluster", it cann't connect the ELB, the log below:

unexpected error during validation: error listing nodes: Get https://api-cluster-bjs-k8s-local-c9l1qd-2011066806.cn-north-1.elb.amazonaws.com.cn/api/v1/nodes: dial tcp 54.222.209.4:443: i/o timeout

Anyone know the reason?
Thanks

amazon-vpc-routed-eni as default networking for cluster creation

https://github.com/nwcdlabs/kops-cn/blob/448b5fc45d47d525c9db6c6e54dfbb19a34b1c73/create-cluster.sh#L7-L18

As AWS ALB Ingress is v1.0.0 now and the ip mode requires AWS VPC CNI as the default networking, we should use it as the default networking mode.

--networking amazon-vpc-routed-eni

the new creation script would be like this

 kops create cluster \ 
      --cloud=aws \ 
      --name=$cluster_name \ 
      --image=$ami \ 
      --zones=$zones \ 
      --master-count=$master_count \ 
      --master-size=$master_size \ 
      --node-count=$node_count \ 
      --node-size=$node_size  \ 
      --vpc=$vpcid \ 
      --networking amazon-vpc-routed-eni \
      --kubernetes-version="$kubernetesVersion" \ 
      --ssh-public-key=$ssh_public_key

Run create-cluster.sh faild

I run the install on my mac.
when I had configured env.config and run the create-cluster.sh, I got the error msg below:

I0130 10:08:37.058539 5466 create_cluster.go:1407] Using SSH public key: /Users/wangqi/.ssh/id_rsa
I0130 10:08:38.840289 5466 subnets.go:184] Assigned CIDR 172.0.32.0/19 to subnet cn-northwest-1a
I0130 10:08:38.840325 5466 subnets.go:184] Assigned CIDR 172.0.64.0/19 to subnet cn-northwest-1b
I0130 10:08:38.840362 5466 subnets.go:184] Assigned CIDR 172.0.96.0/19 to subnet cn-northwest-1c

error determining default DNS zone: error querying zones: RequestError: send request failed
caused by: Get https://route53.cn-northwest-1.amazonaws.com.cn/2013-04-01/hostedzone: dial tcp: lookup route53.cn-northwest-1.amazonaws.com.cn on 172.20.53.163:53: no such host

please upgrade helm docker version in reg to latest 2.11.0

when trying to use 'helm ls',
got this error "Error: incompatible versions client[v2.11.0] server[v2.9.1]"

Add a sample for extra instance group

add a sample for the extra instance group with mixed instance types and purchase options

https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#creating-a-instance-group-of-mixed-instances-types-aws-only

Dependency

fix kubernetes/kops#6763

Unable to bring up k8s-ec2-srcdst deployment on CentOS 7

Due to the ca certificate in different location on CentOS, the container k8s-ec2-srcdst is unable to go up because it cannot find the certificate. It is mentioned in kubernetes/kops#4331

Default certificate path: /etc/ssl/certs/ca-certificates.crt
Centos path: /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt

One workaround is to run the 'kubectl patch' command mentioned in the issue. Another thought I have is to change the source code of kops and recompile it.

Welcome any other good advice, thank you.

running kops-cn with AWS VPC CNI 1.3

Amazon EKS now supports AWS VPC CNI 1.3

https://docs.aws.amazon.com/en_us/eks/latest/userguide/cni-upgrades.html

Let's make sure kops-cn goes well with CNI 1.3

For CN mainland scenario without ICP recordal, could we have an internal ELB being used to kubectl internally ?

Simplify the cluster create/update/delete with a Makefile

let's simplify the control with a single Makefile and some entrypoints like:

make create-cluster
make edit-ig-nodes
make update-cluster
make delete-cluster

Add AWS VPC CNI v1.4.0 image

https://github.com/nwcdlabs/kops-cn/blob/a811a288d1e17cdc42667d25f2b0ea889238b02e/mirror/required-images.txt#L19

https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.4.0

helm installation requires dependency update

In the helm official document, helm repo add and helm dep update is required before the helm install

Add istio.io chart repository and point to the daily release:

$ helm repo add istio.io https://storage.googleapis.com/istio-prerelease/daily-build/master-latest-daily/charts
Build the Helm dependencies:

$ helm dep update install/kubernetes/helm/istio

failed to install helm tiller with v2.13.1 image

etcd3 as the default cluster

According to Kops and etcd roadmap document
https://github.com/kubernetes/kops/blob/af4df08b694e2a1f8814a7b3649060477be67c86/docs/etcd/roadmap.md

etcd3 will eventually become the default cluster version, however, for some reason it still sticks to v2.2 at this moment.

We have a PR trying to get it sorted but at this moment, to align to the Kops upstream, we still stick to v2.2 now.

If you prefer to provision etcd3 as the default cluster, you can update the spec.yml like this
https://github.com/nwcdlabs/kops-cn/pull/29/files#diff-ce22796966d5547919fe1967f7781563

Hi @jansony1 , feel free to update this issue if you have any other useful insights.

Thanks.

Pulling private image (ECR) from outside china

It may not be the right place to ask this, but just wanted to check if there is a fast way to pull ECR image from another AWS account outside china into aws china.

How to use customized nodeup binary ?

We need to use a customized nodeup for our cluster. When I try to override the default URL with

export NODEUP_URL='https://s3-us-west-2.amazonaws.com/my-bucket/nodeup/linux/amd64/01/23/18/1516747024/nodeup'

It seems hijacked by the fileRepository setting.

I1214 16:28:21.474161   94429 builder.go:297] error reading hash file "https://s3.cn-north-1.amazonaws.com.cn/kops-bjs/fileRepository/my-bucket/nodeup/linux/amd64/01/23/18/1516747024/nodeup.sha1": file does not exist

you may have not staged your files correctly, please execute kops update cluster using the assets phase

Any suggestions?

Can't pull heptio-images-authenticator

Cant pull 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/heptio-images-authenticator:v0.3.0 when running aws-iam-authenticator.

Containers:
  aws-iam-authenticator:
    Container ID:
    Image:         937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/heptio-images-authenticator:v0.3.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      server
      --config=/etc/aws-iam-authenticator/config.yaml
      --state-dir=/var/aws-iam-authenticator
      --generate-kubeconfig=/etc/kubernetes/aws-iam-authenticator/kubeconfig.yaml

I have seen the image in https://github.com/nwcdlabs/kops-cn/blob/master/mirror/required-images.txt#L32.

默认的配置，dns controller启动失败， kope/dns-controller: 1.11.0 找不到

dns controller启动失败， kope/dns-controller: 1.11.0 找不到

Add more script to makefile

As some of our customer may edit wrong in "make edit-cluster" step or they need to update their cluster, so they need rolling update options. Also, if they choose use makefile to doing their kops operation, we my better list all operations so that they have no need to maintain another set of
Environment variable.

Here are two my customer requirement, so add it here.

.PHONY: rolling-cluster
rolling-cluster:
@KOPS_STATE_STORE=$(KOPS_STATE_STORE)
AWS_PROFILE=$(AWS_PROFILE)
AWS_REGION=$(AWS_REGION)
AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION)
kops rolling-update cluster --name $(CLUSTER_NAME) --yes --cloudonly

.PHONY: get-cluster
get-cluster:
@KOPS_STATE_STORE=$(KOPS_STATE_STORE)
AWS_PROFILE=$(AWS_PROFILE)
AWS_REGION=$(AWS_REGION)
AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION)
kops get cluster --name $(CLUSTER_NAME)

docker repository for Amazon VPC CNI

Symptom
Deploy kops with Amazon VPC CNI(--networking=amazon-vpc-routed-eni), the daemonset aws-node will be failed due to ImagePullBackOff.

Root Cause
The generated image url of aws-node is invalid:
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/602401143452.dkr.ecr.us-west-2.amazonaws.com-amazon-k8s-cni:1.0.0

From the yaml template, image url will be generated from parameter "Networking.AmazonVPC.ImageName" or the default image url from us-west-2 ECR.

It works well after changing image url of aws-node to "pahud/amazon-k8s-cni:1.0.0"

Suggested Solution

Specify "Networking.AmazonVPC.ImageName" in kops edit as following:
networking:
amazonvpc:
imageName: amazon-k8s-cni:1.0.0
Add image amazon-k8s-cni:1.0.0 in docker registry 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn

change the default AMI from CoreOS to Amazon Linux 2 LTS

Background

to make sure the OS is more compatible with other components such as

AWS VPC CNI
AWS ALB Ingress

And eliminate potential complexity of maintainence in the future.

TODO

make sure the latest AMI in cn-north-1 and cn-northwest-1 is compatible with the latest stable kops
make sure latest AWS VPC CNI (1.3) is compatible
kubernetes/kops#6341 need to be identified and fixed
CoreDNS as the replacement of kube-dns
make sure AWS ALB Ingress is compatible
update env.config and set Amazon Linux 2 LTS as the default AMI

require

https://github.com/nwcdlabs/kops-cn/blob/d3ce3f04487ae7585a2989935c4bfd360d9d2485/mirror/required-images.txt#L48

TODO - update to Kops 1.10.1

https://github.com/kubernetes/kops/releases/tag/1.10.1
https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable

mirror the assets and FileRepository
update README(English)
update README(Chinese)

multiple versions of amazon vpc cni mirror required

We only have 1.0.0 now. According to the release notes, we may need multiple versions including 1.3.0, 1.2.1, 1.2.0,etc.

https://github.com/nwcdlabs/kops-cn/blob/d3ce3f04487ae7585a2989935c4bfd360d9d2485/mirror/required-images.txt#L48

nwcdheap / kops-cn Goto Github PK

kops-cn's Introduction

免责说明

重要通知

kops-cn项目介绍

特性

当前版本

步骤

请注意

验证

插件安装

FAQ

目前使用什麼AMI，可以使用其他AMI吗？例如CentoOS, Amazon Linux 2等？

集群验证失败?

如何SSH上master节点和worker节点 ?

我可以把master nodes运行在private subnet吗？如何配置？

我需要的docker镜像在ECR中不存在.

required-images相关文档用途

如何得知required-images.txt在ECR所对应的完整路径？

查看所有FAQs 这里

kops-cn's People

Contributors

Stargazers

Watchers

Forkers

kops-cn's Issues

summary

how to re-produce this issue

cluster name

subnets

Makefiles

創建第一個cluster

validate cluster

創建第二個cluster

validate cluster

get po

404 - Not Found

Dependency

Background

TODO

Recommend Projects

Recommend Topics

Recommend Org

如何得知`required-images.txt`在ECR所对应的完整路径？