Giter VIP home page Giter VIP logo

kops-cn's Introduction

English README

免责说明

建议测试过程中使用此方案,生产环境使用请自行考虑评估。
当您对方案需要进一步的沟通和反馈后,可以联系 [email protected] 获得更进一步的支持。
欢迎联系参与方案共建和提交方案需求, 也欢迎在 github 项目issue中留言反馈bugs。

重要通知

随着EKS在China区域的推出,本项目逐步停止维护,计划于2020年10月31日关闭本项目,相关镜像文件将删除。
各位可参考新项目container-mirror使用相关镜像。
即日起,本项目镜像地址有变更,由937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn变更为048912060910.dkr.ecr.cn-north-1.amazonaws.com.cn

kops-cn项目介绍

本项目用于指导客户使用开源自动化部署工具Kops在AWS宁夏区域或北京区域搭建K8S集群。 本项目已经将K8S集群搭建过程中需要拉取的镜像或文件拉回国内,因此您无需任何翻墙设置。

特性

  • 集群创建过程中所需的docker镜像已存放在 北京 区域的Amazon ECR中。
  • 集群创建过程中所需的二进制文件或配置文件已存放在 北京 区域的Amazon S3桶中 。
  • 简单快速的集群搭建和部署
  • 无需任何VPN代理或翻墙设置
  • 如有新的Docker镜像拉取需求,您可以创建Github push or pull request,您的request会触发CodeBuild(buildspec-nwcd.yml) 去拉取镜像并存放到AWS cn-north-1 的ECR中。查看: 镜像列表.
  • 一个make create-cluster命令即可创建集群

当前版本

现仅提供1.15版本

主版本 Kops最新版本 K8s搭配版本 AMI
1.15 1.15.2(#118) 1.15.10 kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-05-13(#96

步骤

  1. 下载项目到本地
$ git clone https://github.com/nwcdlabs/kops-cn
$ cd kops-cn
  1. 在本机安装kops and kubectl命令行客户端: 安装指导

您也可以直接从以下链接的AWS**区域的S3桶中下载 kops and kubectl 的二进制文件:

kops_version='1.15.2'
k8s_version='v1.15.10'
#
# for Linux Environment
#
# download kops for linux
$ curl -L https://s3.cn-northwest-1.amazonaws.com.cn/kops-file/fileRepository/kops/$kops_version/linux/amd64/kops -o kops
$ chmod +x $_

# download kubectl for linux
$ curl -L https://s3.cn-northwest-1.amazonaws.com.cn/kops-file/fileRepository/kubernetes-release/release/$k8s_version/bin/linux/amd64/kubectl -o kubectl
$ chmod +x $_

#
# for Mac OS X Environment
#

# download kops for mac os x
$ curl -L https://s3.cn-northwest-1.amazonaws.com.cn/kops-file/fileRepository/kops/$kops_version/darwin/amd64/kops -o kops
$ chmod +x $_

# download kubectl for mac os x
$ curl -L https://s3.cn-northwest-1.amazonaws.com.cn/kops-bjs/fileRepository/kubernetes-release/release/$k8s_version/bin/darwin/amd64/kubectl -o kubectl
$ chmod +x $_


#
# 将kops与kubectl放到$PATH
#
$ sudo mv ./kops /usr/local/bin/
$ sudo mv ./kubectl /usr/local/bin/

#
# 再次确认kops and kubectl是当前稳定版本
#
$ kops version
$ kubectl version

请注意

如果你之前安装过kops客户端或者曾经升级了kops,请再次使用kops version确认kops客户端是否为最新稳定版本,版本不一致可能会造成功能上的问题。

  1. 编辑 Makefile文件. 您需要设置如下变量
Name Description values
TARGET_REGION 选择将集群部署在aws北京或宁夏区域 cn-north-1 or cn-northwest-1
AWS_PROFILE 选择制定其他不同的AWS_PROFILE default
KOPS_STATE_STORE 您需要提供一个S3桶给KOPS存放配置信息 s3://YOUR_S3_BUCKET_NANME
VPCID 选择将您的集群部署在哪个VPC中 vpc-xxxxxxxx
MASTER_COUNT master节点的机器数量 3 (建议不要修改)
MASTER_SIZE master节点的机器类型
NODE_SIZE 工作节点的机器类型
NODE_COUNT 工作节点的机器数量
SSH_PUBLIC_KEY 本地ssh公钥的存放路径(或参考这里生成一个新的公钥) ~/.ssh/id_rsa.pub [default]
KUBERNETES_VERSION 指定kubernetes版本 (建议不要修改)
KOPS_VERSION 指定kops版本 (建议不要修改)
  1. 创建集群
make create-cluster
  1. 编辑集群
make edit-cluster

spec-nwcd.yml 中内容贴到spec 下并保存退出。

  1. 更新集群
make update-cluster
  1. 完成

验证

集群的创建大概需要 3-5 分钟时间。之后,使用

kops validate cluster

或是

make validate-cluster

来验证集群是否是 ready状态。

查看集群对外接口信息、版本信息

恭喜您已顺利完成!

最后,您可以这样删除整个集群资源

make delete-cluster

插件安装

FAQ

目前使用什麼AMI,可以使用其他AMI吗?例如CentoOS, Amazon Linux 2等?

目前缺省AMI使用Debian Linux AMI,也是Kops上游的标准AMI(说明), 然而Kops官方并没有直接发布AMI到cn-west-1 and cn-northwest-1(请大家帮顶一下这个issue), 在官方AMI直接发布到这两个China regions之前,我们手动的把AMI透过这里说明的方式取回北京与宁夏Region,需要特别提醒这AMI并不是官方直接发布,建议大家也可以自行透过上面链接的说明,透过这里查找官方现在对应最新的Debian AMI自行取回国内。

除了Debian Linux之外,其他官方支持的AMI理论上也都支持,但也可能存在一些已知的问题,请参考#91 and #96

集群验证失败?

查看 issue #5

如何SSH上master节点和worker节点 ?

查看 issue #6, 需要注意Debian AMI ssh方式为 ssh admin@IP

我可以把master nodes运行在private subnet吗?如何配置?

参考这个说明:#94

我需要的docker镜像在ECR中不存在.

aws北京区域ECR中的镜像仓库containerRegistry 中的已有镜像见required-images-mirrored.txt, (參考#105)如您在集群创建过程中需要其他镜像, 请您编辑 required-images.txt ,这将会在您的GitHub账户中 fork 一个新的分支,之后您可以提交PR(pull request)。 Merge您的PR会触发CodeBuild 去拉取 required-images.txt 中定义的镜像回ECR库。 数分钟后,您可以看到图标从in progress变为passing

当前状态:

required-images相关文档用途

filename description
required-images.txt 新的需求镜像可以透过PR来共同编辑记录在这里,触发CD来mirror
required-imags-mirrored.txt 已经mirror完成的镜像列表, 请勿提交PR修改这个文档
required-images-daily.txt 部分image需要每日自动mirror更新一次的清单

如何得知required-images.txt在ECR所对应的完整路径?

参考这里

查看所有FAQs 这里

kops-cn's People

Contributors

alyanli avatar bigdrum avatar cplo avatar dean205 avatar eagleye1115 avatar elbertwang avatar fromthebridge avatar fsadykov avatar jansony1 avatar jhaohai avatar jonkeyguan avatar missingcharacter avatar nowfox avatar pahud avatar satchinjoshi avatar seimutig avatar skrieger82 avatar sunfuze avatar totorochina avatar walkley avatar xmubeta avatar yizhizoe avatar yujunz avatar zhangquanhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kops-cn's Issues

How to specify c5.large spot instance in Ningxia region?

C5 instance type with Nitro hypervisor is available in Ningxia and Beijing regions and it has very great cost-price ratio.

Let's make a note on how to build kops-cn in Ningxia with C5 spot instance to save the cost up to +60% off.

image

can't create cluster in China - error fetching https://.../channels/stable

summary

cluster can't be created if the kops client has poor internet connection to github. You can test the connectivity by

curl https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable

how to re-produce this issue

  1. turning on -v flag in kops create
kops create cluster \
     -v 9 \
     --cloud=aws \
...
  1. running create-cluster.sh

9801a7a9620b:kops-cn hunhsieh $ bash create-cluster.sh
I0129 01:36:33.093275 18727 create_cluster.go:1407] Using SSH public key: /Users/hunhsieh/.ssh/id_rsa.pub
I0129 01:36:33.093795 18727 factory.go:68] state store s3://pahud-kops-state-store-zhy
I0129 01:36:33.342730 18727 s3context.go:194] found bucket in region "cn-northwest-1"
I0129 01:36:33.342805 18727 s3fs.go:220] Reading file "s3://pahud-kops-state-store-zhy/cluster.zhy.k8s.local/config"
I0129 01:36:33.487717 18727 channel.go:97] resolving "stable" against default channel location "https://raw.githubusercontent.com/kubernetes/kops/master/channels/"
I0129 01:36:33.487769 18727 channel.go:102] Loading channel from "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable"
I0129 01:36:33.489467 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:37:03.492838 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:37:03.993923 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:37:33.997853 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:37:35.000502 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:38:05.001673 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:38:07.002652 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:38:37.009644 18727 context.go:227] retrying after error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout
I0129 01:38:41.010692 18727 context.go:159] Performing HTTP request: GET https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable
I0129 01:39:11.012203 18727 context.go:231] hit maximum retries 5 with error error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout

error reading channel "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": error fetching "https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable": Get https://raw.githubusercontent.com/kubernetes/kops/master/channels/stable: dial tcp 151.101.196.133:443: i/o timeout

How to get the ECR repo full path from required-images.txt ?

  1. git clone the repo and make sure the latest required-images.txt and display-remote-repos.sh are in the ./mirror sub-directory.
$ cd ./mirror
$ bash display-remote-repos.sh 

You'll immediate get the full ECR repo path.

$ bash display-remote-repos.sh 
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kope-dns-controller:1.11.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-dnsmasq-nanny-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-sidecar-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/k8s-dns-kube-dns-amd64:1.14.10
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/cluster-proportional-autoscaler-amd64:1.1.2-r2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/coredns:1.1.3
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/coredns:1.2.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:2.2.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/pause-amd64:3.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-controller-manager:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-scheduler:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-proxy:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kube-apiserver:v1.11.6
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io-heptio-images-authenticator:v0.3.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/602401143452.dkr.ecr.us-west-2.amazonaws.com-amazon-k8s-cni:v1.3.2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-coreos-flannel:v0.10.0-amd64
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/ottoyiu-k8s-ec2-srcdst:v0.2.0-3-gc0c26eca
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/ottoyiu-k8s-ec2-srcdst:v0.2.2
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/894847497797.dkr.ecr.us-west-2.amazonaws.com-aws-alb-ingress-controller:v1.1.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-node:v3.4.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-cni:v3.4.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-node:v2.6.12
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-cni:v1.11.8
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-kube-controllers:v1.0.5
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-kube-policy-controller:v0.7.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-calico-calico-upgrade:v1.0.5
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/defaultbackend:1.4
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/quay.io-kubernetes-ingress-controller-nginx-ingress-controller:0.20.0
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:3.2.18
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/etcd:3.2.24
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/kubernetes-dashboard-amd64:v1.10.1
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxy_init:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxyv2:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/galley:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/pilot:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/mixer:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/kubectl:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/sidecar_injector:master-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxy_init:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/proxyv2:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/galley:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/pilot:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/mixer:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/kubectl:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/sidecar_injector:release-1.0-latest-daily
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io-kubernetes-helm-tiller:v2.12.3

k8s.gcr.io/kube-proxy:v1.11.6 is missing

After upgrading to 1.11.6, the kube-proxy pod failed to come up since this image is mirrored. (I guess it is the same for many other 1.11.6 image)

I wonder if we can just have automatic script to enumerate all versions of the white listed images and make a mirror of it, so that we don't need to worry about that any more.

Can't pull image for Istio release-1.0-latest-daily

Got ImagePullBackOff errors during install istio

Events:
  Type     Reason                 Age                From                                                  Message
  ----     ------                 ----               ----                                                  -------
  Normal   Scheduled              22m                default-scheduler                                     Successfully assigned istio-citadel-5768b899d4-jg226 to ip-172-31-60-71.cn-north-1.compute.internal
  Normal   SuccessfulMountVolume  22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  MountVolume.SetUp succeeded for volume "istio-citadel-service-account-token-skxg9"
  Warning  Failed                 22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Failed to pull image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily": rpc error: code = Unknown desc = Error response from daemon: manifest for 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily not found
  Warning  Failed                 22m                kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Error: ErrImagePull
  Normal   BackOff                22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Back-off pulling image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily"
  Warning  Failed                 22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  Error: ImagePullBackOff
  Normal   Pulling                22m (x2 over 22m)  kubelet, ip-172-31-60-71.cn-north-1.compute.internal  pulling image "937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/gcr.io/istio-release/citadel:release-1.0-latest-daily"

node NotReady after create and update

when i create cluster and edit then update cluster follow the instruction with 3 master and 1 node in cn-northwest-1. but the node is NotReady status.

if i change the Makefile set the NETWORKING to flannel-vxlan, it is ok.

i guess because the node have muli private ips, if the primary ip is not the first, the node is NotReady status.

error when: kops edit cluster

kops edit cluster
tried to add:
assets:
containerRegistry: 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn
fileRepository: https://s3.cn-north-1.amazonaws.com.cn/kops-bjs/fileRepository/
docker:
logDriver: ""
registryMirrors:
- https://registry.docker-cn.com

wq: to save, however ,editor re-opens and getting error:
error populating cluster spec: error building complete spec: options did not converge after 10 iterations

i tried to ignore and keep going to the final step : kops validate cluster
unexpected error during validation: error listing nodes: Get https://api-cluster-zhy-k8s-local-qpbf7n-1465482247.cn-northwest-1.elb.amazonaws.com.cn/api/v1/nodes: EOF

go to aws console, find:
the three master instances out of service.
i checked the security group rules are all right.
i already made icp exception for 80/8080/443 , and i can telnet elb:443.

and i googled it ,find similar issue: kubernetes/kops#5061

how to fix it ?

HOWTO - create multiple clusters in a VPC

這個範例示範如何在寧夏regon同一個VPC裡面創建兩個kops集群,並且指定不同的cluster_name

cluster name

1st cluster: cluster1.zhy.k8s.local
2nd cluster: cluster2.zhy.k8s.local
(注意,必須k8s.local結尾)

subnets

在一個VPC裡面準備六個subnet如下,在這個範例我們每個cluster將會用其中三個subnets

image

Makefiles

準備兩個Makefile分別是cluster1.mk and cluster2.mk 內容範例:

https://github.com/nwcdlabs/kops-cn/blob/master/samples/multi-clusters-in-shared-vpc/cluster1.mk
https://github.com/nwcdlabs/kops-cn/blob/master/samples/multi-clusters-in-shared-vpc/cluster2.mk

創建第一個cluster

create cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk create-cluster

edit cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk edit-cluster

spec.yml內容貼進去

update cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk update-cluster

validate cluster

切換context到cluster1

$ kubectl config use-context cluster1.zhy.k8s.local

validate cluster1

$ AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster1.zhy.k8s.local \
make -f cluster1.mk validate-cluster

image

創建第二個cluster

create cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk create-cluster

edit cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk edit-cluster

spec.yml內容貼進去

update cluster

AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk update-cluster

validate cluster

切換context到cluster2

$ kubectl config use-context cluster2.zhy.k8s.local

validate cluster2

$ AWS_PROFILE=cn CUSTOM_CLUSTER_NAME=cluster2.zhy.k8s.local \
make -f cluster2.mk validate-cluster

image

get po

兩個cluster都可以列出所有kube-system內的Pod,全部都正常Running

image

error parsing SSH public key: ssh: no key found

I run into following issues when creating the cluster
I0416 09:48:45.708089 3672 create_cluster.go:1407] Using SSH public key: /home/ec2-user/.ssh/id_rsa.pub

error reading cluster configuration "cluster.zhy.k8s.local": error reading s3://liuhongxi-kops-cn/cluster.zhy.k8s.local/config: Unable to list AWS regions: NoCredentialProviders: no valid providers in chain
caused by: EnvAccessKeyNotFound: failed to find credentials in the environment.
SharedCredsLoad: failed to load profile, default.
EC2RoleRequestError: no EC2 instance role found
caused by: EC2MetadataError: failed to make EC2Metadata request
caused by:

<title>404 - Not Found</title>

404 - Not Found

make: *** [create-cluster] Error 1

Primary interface IP is unable to reach, which caused out-of-service behind ELB

I am using Private topology and Internal ELB to create a K8S cluster.

kops create cluster
#omit other parameters
--topology=private 
--networking=amazon-vpc-routed-eni 
--api-loadbalancer-type=internal

After the cluster started, each nodes was assigned with two interfaces. However, I found only one node is in-service behind ELB.

From the working node, the ip route table is:


core@ip-172-20-53-190 ~ $ ip route
default via 172.20.32.1 dev eth0 proto dhcp src 172.20.53.190 metric 1024
default via 172.20.32.1 dev eth1 proto dhcp src 172.20.61.156 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.20.32.0/19 dev eth0 proto kernel scope link src 172.20.53.190
172.20.32.1 dev eth0 proto dhcp scope link src 172.20.53.190 metric 1024
172.20.32.1 dev eth1 proto dhcp scope link src 172.20.61.156 metric 1024

You can see the eth0 is on the top. That explains why you can reach the eth0 IP.

From the other two defunc nodes, the ip route is like:

core@ip-172-20-114-248 ~ $ ip route
default via 172.20.96.1 dev eth1 proto dhcp src 172.20.98.243 metric 1024
default via 172.20.96.1 dev eth0 proto dhcp src 172.20.114.248 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.20.96.0/19 dev eth1 proto kernel scope link src 172.20.98.243
172.20.96.0/19 dev eth0 proto kernel scope link src 172.20.114.248
172.20.96.1 dev eth1 proto dhcp scope link src 172.20.98.243 metric 1024
172.20.96.1 dev eth0 proto dhcp scope link src 172.20.114.248 metric 1024
core@ip-172-20-114-248 ~ $

The entry of eth1 is on the top, thus you can only reach the IP of eth1.

I am not sure if there is something wrong with my creation. Hope someone can help. Thank you.

--Beta

faster mirror improvement

  1. compare the image digest before docker push

e.g.

  res=$(aws --profile bjs --region $ECR_REGION ecr describe-images --repository-name "$repo" \
--query "imageDetails[?(@.imageDigest=='$2')].contains(@.imageTags, '$tag') | [0]")

   if [ "$res" == "true" ]; then 
    return 0 
  else
    return 1
  fi

If this returns 0 we don't have to push this image to ecr as image already exists.

Create your cluster with exsiting subnet

Customer need to deploy their cluster into existing subnets, while Kops official page is not clear, as you could see here
https://github.com/kubernetes/kops/blob/master/docs/run_in_existing_vpc.md#shared-subnets

Specifically for those parts,

export SUBNET_ID=subnet-12345678 # replace with your subnet id
export SUBNET_CIDR=10.100.0.0/24 # replace with your subnet CIDR
export SUBNET_IDS=$SUBNET_IDS # replace with your comma separated subnet ids

What you really need is to just modify the script like below according to your subnet and zones, and keep other scripts no change in Makefile (PS: Only include necessary parts, )

.PHONY: create-cluster
create-cluster:  
	@KOPS_STATE_STORE=$(KOPS_STATE_STORE) \
	AWS_PROFILE=$(AWS_PROFILE) \
	AWS_REGION=$(AWS_REGION) \
	AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION) \
	kops create cluster \
     --cloud=aws \
     --name=$(CLUSTER_NAME) \
     --image=$(AMI) \
     --master-count=$(MASTER_COUNT) \
     --master-size=$(MASTER_SIZE) \
     --node-count=$(NODE_COUNT) \
     --node-size=$(NODE_SIZE)  \
     --vpc=$(VPCID) \
     --kubernetes-version=$(KUBERNETES_VERSION_URI) \
     --networking=amazon-vpc-routed-eni \
     --ssh-public-key=$(SSH_PUBLIC_KEY) \
     --zones=cn-northwest-1a,cn-northwest-1b \
     --subnets=subnet-2cf25a45,subnet-9315d7e8

1.Delete the original zone option
2.And then add last two lines.
3.Your subnet‘s order must comply with your zone’s order

amazon-vpc-routed-eni as default networking for cluster creation

https://github.com/nwcdlabs/kops-cn/blob/448b5fc45d47d525c9db6c6e54dfbb19a34b1c73/create-cluster.sh#L7-L18

As AWS ALB Ingress is v1.0.0 now and the ip mode requires AWS VPC CNI as the default networking, we should use it as the default networking mode.

--networking amazon-vpc-routed-eni

the new creation script would be like this

 kops create cluster \ 
      --cloud=aws \ 
      --name=$cluster_name \ 
      --image=$ami \ 
      --zones=$zones \ 
      --master-count=$master_count \ 
      --master-size=$master_size \ 
      --node-count=$node_count \ 
      --node-size=$node_size  \ 
      --vpc=$vpcid \ 
      --networking amazon-vpc-routed-eni \
      --kubernetes-version="$kubernetesVersion" \ 
      --ssh-public-key=$ssh_public_key 

Run create-cluster.sh faild

I run the install on my mac.
when I had configured env.config and run the create-cluster.sh, I got the error msg below:

I0130 10:08:37.058539 5466 create_cluster.go:1407] Using SSH public key: /Users/wangqi/.ssh/id_rsa
I0130 10:08:38.840289 5466 subnets.go:184] Assigned CIDR 172.0.32.0/19 to subnet cn-northwest-1a
I0130 10:08:38.840325 5466 subnets.go:184] Assigned CIDR 172.0.64.0/19 to subnet cn-northwest-1b
I0130 10:08:38.840362 5466 subnets.go:184] Assigned CIDR 172.0.96.0/19 to subnet cn-northwest-1c

error determining default DNS zone: error querying zones: RequestError: send request failed
caused by: Get https://route53.cn-northwest-1.amazonaws.com.cn/2013-04-01/hostedzone: dial tcp: lookup route53.cn-northwest-1.amazonaws.com.cn on 172.20.53.163:53: no such host

Unable to bring up k8s-ec2-srcdst deployment on CentOS 7

Due to the ca certificate in different location on CentOS, the container k8s-ec2-srcdst is unable to go up because it cannot find the certificate. It is mentioned in kubernetes/kops#4331

Default certificate path: /etc/ssl/certs/ca-certificates.crt
Centos path: /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt

One workaround is to run the 'kubectl patch' command mentioned in the issue. Another thought I have is to change the source code of kops and recompile it.

Welcome any other good advice, thank you.

helm installation requires dependency update

In the helm official document, helm repo add and helm dep update is required before the helm install

Add istio.io chart repository and point to the daily release:

$ helm repo add istio.io https://storage.googleapis.com/istio-prerelease/daily-build/master-latest-daily/charts
Build the Helm dependencies:

$ helm dep update install/kubernetes/helm/istio

etcd3 as the default cluster

According to Kops and etcd roadmap document
https://github.com/kubernetes/kops/blob/af4df08b694e2a1f8814a7b3649060477be67c86/docs/etcd/roadmap.md

etcd3 will eventually become the default cluster version, however, for some reason it still sticks to v2.2 at this moment.

We have a PR trying to get it sorted but at this moment, to align to the Kops upstream, we still stick to v2.2 now.

If you prefer to provision etcd3 as the default cluster, you can update the spec.yml like this
https://github.com/nwcdlabs/kops-cn/pull/29/files#diff-ce22796966d5547919fe1967f7781563

Hi @jansony1 , feel free to update this issue if you have any other useful insights.

Thanks.

How to use customized nodeup binary ?

We need to use a customized nodeup for our cluster. When I try to override the default URL with

export NODEUP_URL='https://s3-us-west-2.amazonaws.com/my-bucket/nodeup/linux/amd64/01/23/18/1516747024/nodeup'

It seems hijacked by the fileRepository setting.

I1214 16:28:21.474161   94429 builder.go:297] error reading hash file "https://s3.cn-north-1.amazonaws.com.cn/kops-bjs/fileRepository/my-bucket/nodeup/linux/amd64/01/23/18/1516747024/nodeup.sha1": file does not exist

you may have not staged your files correctly, please execute kops update cluster using the assets phase

Any suggestions?

Can't pull heptio-images-authenticator

Cant pull 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/heptio-images-authenticator:v0.3.0 when running aws-iam-authenticator.

Containers:
  aws-iam-authenticator:
    Container ID:
    Image:         937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/heptio-images-authenticator:v0.3.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      server
      --config=/etc/aws-iam-authenticator/config.yaml
      --state-dir=/var/aws-iam-authenticator
      --generate-kubeconfig=/etc/kubernetes/aws-iam-authenticator/kubeconfig.yaml

I have seen the image in https://github.com/nwcdlabs/kops-cn/blob/master/mirror/required-images.txt#L32.

Add more script to makefile

As some of our customer may edit wrong in "make edit-cluster" step or they need to update their cluster, so they need rolling update options. Also, if they choose use makefile to doing their kops operation, we my better list all operations so that they have no need to maintain another set of
Environment variable.

Here are two my customer requirement, so add it here.

.PHONY: rolling-cluster
rolling-cluster:
@KOPS_STATE_STORE=$(KOPS_STATE_STORE)
AWS_PROFILE=$(AWS_PROFILE)
AWS_REGION=$(AWS_REGION)
AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION)
kops rolling-update cluster --name $(CLUSTER_NAME) --yes --cloudonly

.PHONY: get-cluster
get-cluster:
@KOPS_STATE_STORE=$(KOPS_STATE_STORE)
AWS_PROFILE=$(AWS_PROFILE)
AWS_REGION=$(AWS_REGION)
AWS_DEFAULT_REGION=$(AWS_DEFAULT_REGION)
kops get cluster --name $(CLUSTER_NAME)

docker repository for Amazon VPC CNI

Symptom
Deploy kops with Amazon VPC CNI(--networking=amazon-vpc-routed-eni), the daemonset aws-node will be failed due to ImagePullBackOff.

Root Cause
The generated image url of aws-node is invalid:
937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn/602401143452.dkr.ecr.us-west-2.amazonaws.com-amazon-k8s-cni:1.0.0

From the yaml template, image url will be generated from parameter "Networking.AmazonVPC.ImageName" or the default image url from us-west-2 ECR.

It works well after changing image url of aws-node to "pahud/amazon-k8s-cni:1.0.0"

Suggested Solution

  1. Specify "Networking.AmazonVPC.ImageName" in kops edit as following:
    networking:
    amazonvpc:
    imageName: amazon-k8s-cni:1.0.0
  2. Add image amazon-k8s-cni:1.0.0 in docker registry 937788672844.dkr.ecr.cn-north-1.amazonaws.com.cn

change the default AMI from CoreOS to Amazon Linux 2 LTS

Background

to make sure the OS is more compatible with other components such as

  1. AWS VPC CNI
  2. AWS ALB Ingress

And eliminate potential complexity of maintainence in the future.

TODO

  • make sure the latest AMI in cn-north-1 and cn-northwest-1 is compatible with the latest stable kops
  • make sure latest AWS VPC CNI (1.3) is compatible
  • kubernetes/kops#6341 need to be identified and fixed
  • CoreDNS as the replacement of kube-dns
  • make sure AWS ALB Ingress is compatible
  • update env.config and set Amazon Linux 2 LTS as the default AMI

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.