vmware / container-service-extension Goto Github PK
View Code? Open in Web Editor NEWContainer Service for VMware vCloud Director
Home Page: https://vmware.github.io/container-service-extension
License: Other
Container Service for VMware vCloud Director
Home Page: https://vmware.github.io/container-service-extension
License: Other
Attempting to refresh Ubuntu-16.04 template envountering the following (reproduced issue on 2 different CSE installs so reasonably sure this is a scripting error):
Command: cse install --config config.yaml --template ubuntu-16.04 --update --amqp skip --ext skip
During the block (where the 'waiting to finish x is incrementing each repeat):
:::
Customizing vApp template 'ubuntu1604-temp'
Attempt #1
waiting for guest tools, status: vm='vim.VirtualMachine:vm-168', status=guestToolsNotRunning
waiting for guest tools, status: vm='vim.VirtualMachine:vm-168', status=guestToolsRunning
waiting for process 1299 on vm 'vim.VirtualMachine:vm-168' to finish (x)
:::
Get to a point where the following error is displayed:
exception, will retry in a few seconds, vm 'vim.VirtualMachine:vm-168'
exception: (vim.fault.GuestOperationsUnavailable) {
dynamicType = <unset>,
dynamicProperty = (vmodl.DynamicProperty) [],
msg = 'The guest operations agent could not be contacted.',
faultCause = <unset>,
faultMessage = (vmodl.LocalizableMessage) []
}
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vsphere_guest_run/vsphere.py", line 86, in execute_program_in_guest
processes = pm.ListProcessesInGuest(vm, creds, [pid])
File "/usr/lib/python3.6/site-packages/pyVmomi/VmomiSupport.py", line 580, in <lambda>
self.f(*(self.args + (obj,) + args), **kwargs)
File "/usr/lib/python3.6/site-packages/pyVmomi/VmomiSupport.py", line 386, in _InvokeMethod
return self._stub.InvokeMethod(self, info, args)
File "/usr/lib/python3.6/site-packages/pyVmomi/SoapAdapter.py", line 1366, in InvokeMethod
raise obj # pylint: disable-msg=E0702
pyVmomi.VmomiSupport.vim.fault.GuestOperationsUnavailable: (vim.fault.GuestOperationsUnavailable) {
dynamicType = <unset>,
dynamicProperty = (vmodl.DynamicProperty) [],
msg = 'The guest operations agent could not be contacted.',
faultCause = <unset>,
faultMessage = (vmodl.LocalizableMessage) []
}
This repeats every 5 seconds or so. If left alone after 5 attempts the customization will fail completely.
Logging in to the template VM and forcing dpkg reconfigure and reinstall of open-vm-tools allows the customization to complete successfully:
root@ubuntu1604-temp:~# apt install open-vm-tools
E: dpkg was interrupted, you must manually run 'dpkg --configure -a' to correct the problem.
root@ubuntu1604-temp:~# dpkg --configure -a
Setting up python-apt-common (1.1.0~beta1ubuntu0.16.04.1) ...
Setting up cloud-guest-utils (0.27-0ubuntu25) ...
Setting up libapt-inst2.0:amd64 (1.2.25) ...
:::
Processing triggers for initramfs-tools (0.122ubuntu8.10) ...
update-initramfs: Generating /boot/initrd.img-4.4.0-116-generic
W: mdadm: /etc/mdadm/mdadm.conf defines no arrays.
Processing triggers for shim-signed (1.33.1~16.04.1+13-0ubuntu2) ...
No DKMS packages installed: not changing Secure Boot validation state.
Errors were encountered while processing:
open-vm-tools
root@ubuntu1604-temp:~# apt install open-vm-tools
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
open-vm-tools-desktop
The following packages will be upgraded:
open-vm-tools
1 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.
1 not fully installed or removed.
Need to get 0 B/432 kB of archives.
After this operation, 0 B of additional disk space will be used.
(Reading database ... 82065 files and directories currently installed.)
Preparing to unpack .../open-vm-tools_2%3a10.0.7-3227872-5ubuntu1~16.04.2_amd64.deb ...
Unpacking open-vm-tools (2:10.0.7-3227872-5ubuntu1~16.04.2) over (2:10.0.7-3227872-5ubuntu1~16.04.1) ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...
Processing triggers for systemd (229-4ubuntu21.1) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up open-vm-tools (2:10.0.7-3227872-5ubuntu1~16.04.2) ...
Configuration file '/etc/vmware-tools/tools.conf'
==> Modified (by you or by a script) since installation.
==> Package distributor has shipped an updated version.
What would you like to do about it ? Your options are:
Y or I : install the package maintainer's version
N or O : keep your currently-installed version
D : show the differences between the versions
Z : start a shell to examine the situation
The default action is to keep your current version.
*** tools.conf (Y/I/N/O/D/Z) [default=N] ? Y
Installing new version of config file /etc/vmware-tools/tools.conf ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...
Once the next customization attempt starts after fixing open-vm-tools the customization completes successfully and the template VM is powered off and imported successfully as usual.
It looks like something is causing a reboot during the package installation which is halting/corrupting the process?
majorErrorCode="504" message="External service 'cse' failed to respond in the specified timeout (10 SECONDS)" minorErrorCode="GATEWAY_TIMEOUT"/>
Cluster deploys successfully and can be seen in cse:
$ vcd cse cluster list
IP master VMs name template vdc
192.168.0.108 4 test01 photon-v2 My Demo VDC
But attempting to view node details in cse gives an error:
$ vcd cse node list test01
Usage: vcd cse node list [OPTIONS] NAME
Error: no such child: {http://www.vmware.com/vcloud/v1.5}VmSpecSection
This works perfectly in my dev/test cluster, but not in a production cluster so keen to work out what difference between the environments is causing this. Kubernetes and the cluster seem fine - can deploy containers successfully etc.
When you run cse run config.yaml following the instructions, you get a directory not found error.
root@cse [ ~ ]# cse run config.yaml
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Connected to AMQP server (rmq.corp.local:5672): success
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Connected to vCloud Director as system administrator (vcloud.corp.local:443): success
Find catalog 'cse': success
Find master template 'k8s-photon.ova': success
Find node template 'k8s-photon.ova': success
Connected to vCenter Server as [email protected] (vcenter.corp.local:443): success
Container Service Extension for vCloud Director running
see file cse.log for details
press Ctrl+C to finish
PV provisioner thread started, cse_msg_dir: /tmp/cse
directory '/tmp/cse/req' not found, PV provisioner stopped
Probably add this to the documentation before running cse (from the cse_msg_dir
key in config.yaml)
mkdir -p /tmp/cse/{req,res}
Attempting to update our Ubuntu-16.04 templates for CSE 0.4.2 the 'Customization' process from config.py fails every time at the same point (log attached):
ubuntu-16.04-failed-customization.txt
It appears to be an issue with the (redirected) https request from https://cloud.weave.works/ resulting in a 400 Bad Request error.
Have confirmed that the box running CSE (and the network where the template is being prepared) can both see/access the https://cloud.weave.works/ site.
The lines in config.py that appear to be causing this are below (specifically the 2nd line):
export kubever=$(kubectl version --client | base64 | tr -d '\n')
wget -O weave.yml "https://cloud.weave.works/k8s/net?k8s-version=$kubever&version=2.1.3"
curl -L git.io/weave -o /usr/local/bin/weave
chmod a+x /usr/local/bin/weave
it should include all the steps including the AMQP settings
Just updated templates & CSE to 1.0.0 release and recomposed photon-v2 template. Template preparation / publish to catalog process completes without errors.
Attempting to deploy a new cluster based on photon-v2 template gives:
vcd cse cluster create mycluster --nodes 3 --template photon-v2 --cpu 2 --memory 4096 --ssh-key .ssh\id_pub --network "mynetwork" --storage-profile "mystorage"
create_cluster: Creating cluster mycluster(592ea858-1fc9-46eb-b7f4-c781e0ea5727)
create_cluster: Creating cluster vApp mycluster(592ea858-1fc9-46eb-b7f4-c781e0ea5727)
create_cluster: Creating master node for mycluster(592ea858-1fc9-46eb-b7f4-c781e0ea5727)
create_cluster: Initializing cluster mycluster(592ea858-1fc9-46eb-b7f4-c781e0ea5727)
task: 8eea1b16-82c4-437b-9482-37dc0511295f, result: error, message: Couldn't initialize cluster:
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.06.0-ce. Max validated version: 17.03
error: error validating "/root/weave.yml": error validating data: ValidationError(DaemonSet.spec.template.spec): unknown field "minReadySeconds" in io.k8s.api.core.v1.PodSpec; if you choose to ignore these errors, turn validation off with --validate=false
Appears to be a mismatch in the latest template provisioning between weave & Docker versions (?)
Have left this cluster in current state (with master node only deployed) if needed for further troubleshooting.
Hi @pacogomez,
I've just witnessed this so logging it.
Basically left my CSE environment running for a while, but as some environment might have this settings enabled & configured, it would probably be worth adding this check.
In vCloud Director you have the ability to setup lease settings for runtime & storage.
So after 7 days for example, VMs for which lease haven't been renewed by the user, will be powered off automatically by vCloud Director.
so from the vcd-cli cluster list
it "seems" everything is fine.
Maybe adding a "Status" column on the right side could be interesting, ensuring that all VMs & connectivity is working as expected ?
in addition to DHCP
Hi @pacogomez,
We have deployed around 15 different clusters since yesterday and often we do not get all the nodes added to the kubernetes cluster. (this happened a lot)
Example:
vcd cluster create c3-more-awesome --network DEMO-Network --nodes 3
This deploys correctly, we have 1 master & 3 nodes available in vCD.
root@cse [ ~ ]# vcd vapp list
isDeployed isEnabled memoryAllocationMB name numberOfCpus numberOfVMs ownerName status storageKB
------------ ----------- -------------------- ----------------------------------------------------- -------------- ------------- ----------- ---------- -----------
true true 2048 c3-more-awesome-m1 2 1 admin POWERED_ON 36864000
true true 2048 c3-more-awesome-n1 2 1 admin POWERED_ON 36864000
true true 2048 c3-more-awesome-n2 2 1 admin POWERED_ON 36864000
true true 2048 c3-more-awesome-n3 2 1 admin POWERED_ON 36864000
But the kubernetes configuration shows
❯ kubectl get nodes NAME STATUS ROLES AGE VERSION
c3-more-awesome-m1 Ready master 2h v1.8.1+f38e43b221d08
c3-more-awesome-n1 Ready <none> 2h v1.8.1+f38e43b221d08
c3-more-awesome-n2 Ready <none> 2h v1.8.1+f38e43b221d08
tsugliani at retina15 in ~
❯
It would be great to add the following feature to vcd-cli:
vcd cse cluster validate $cluster-name
to verify/validate the compliance of the resulting deployment.
Hi @pacogomez,
the documentation page could use some restructuring, especially for the "Appendix" part at the end, that you need at the beginning. (feels weird to scroll down to fetch the information and go back at the top to continue the process)
I would actually put it at the top of this page, or create another page with all the pre-requisites that would be linked at the beginning.
Also for Photon OS 2.0 GA setup, I had to add python3-devel
for pip3 install container-service-extension
to work (it was missing python headers to compile pycrypto successfully)
tdnf install -y build-essential python3-setuptools python3-tools python3-pip iputils
I added iputils
for convenience, to quickly test people can actually use ping
and test DNS records are configured correctly.
I'll add another issue on how to create a working RabbitMQ instance on top of Photon OS 2.0 later on.
this should send the server side version
Encountering 2 errors when using cse-install with the Ubuntu 16.04 template - both apparently being caused in /scripts/cust-ubuntu-16.04.sh:
The growpart /dev/sda 1 followed by resize2fs /dev/sda1 gives an error (unable to grow), since /dev/sda1 is already occupying the full (10GB) disk space in the ubuntu-temp VM. Not sure if this is due to a previous disk resize failing in my environment or not. Due to the error the script exits and no further customisation is performed.
Ubuntu has a strange habbit of attempting to connect to archive.ubuntu.com and security.ubuntu.com on IPv6 (even when the primary NIC doesn't have IPv6 configuration enabled). This results in the 'apt-get' lines getting stalled waiting for an IPv6 response on networks which don't have this configured.
Adding the following near the top of cust-ubuntu-16.04.sh disables this behaviour and allows updates to be retrieved/processed correctly:
echo 'net.ipv6.conf.all.disable_ipv6 = 1' >> /etc/sysctl.conf
echo 'net.ipv6.conf.default.disable_ipv6 = 1' >> /etc/sysctl.conf
echo 'net.ipv6.conf.lo.disable_ipv6 = 1' >> /etc/sysctl.conf
systemctl restart networking.service
(Obviously if this is done before the DNS systemctl restart networking.service then this only needs to be done once).
Finally, the Ubuntu template appears to use 'Flexible' network adapter instead of VMXNET3 - have tried changing this in the .ova prior to template customisation and appears to work fine, not sure if this can be built in to the script processing or not?
Hi @pacogomez,
I think one of the most important issue to fix early on is to change the vcd cluster
namespace, and probably use cse
or kubernetes
(I think cse
would be better as it is the project name)
a vCloud Director Cluster, is often referred as the vCloud Instance where a collection of cells for a cluster, designed for scale and availability, and has nothing to do with kubernetes nor the container-service-extension (cse) project.
Also some people might maybe confuse it with the vSphere Cluster constructs.
To avoid any confusion, I truly believe it's the first change that would make the most sense.
For example
tsugliani at retina15 in ~
❯ vcd cluster (621ms) 17:13:15
Usage: vcd cluster [OPTIONS] COMMAND [ARGS]...
Work with kubernetes clusters in vCloud Director.
Examples
vcd cluster list
Get list of kubernetes clusters in current virtual datacenter.
vcd cluster create k8s-cluster --nodes 2
Create a kubernetes cluster in current virtual datacenter.
vcd cluster delete k8s-cluster
Deletes a kubernetes cluster by id.
Options:
-h, --help Show this message and exit.
Commands:
config get cluster config
create create cluster
delete delete cluster
list list clusters
Using cse
namespace
tsugliani at retina15 in ~
❯ vcd cse (621ms) 17:13:15
Usage: vcd cse [OPTIONS] COMMAND [ARGS]...
Work with kubernetes clusters in vCloud Director.
Examples
vcd cse list
Get list of kubernetes clusters in current virtual datacenter.
vcd cse create k8s-cluster --nodes 2
Create a kubernetes cluster in current virtual datacenter.
vcd cse delete k8s-cluster
Deletes a kubernetes cluster by id.
Options:
-h, --help Show this message and exit.
Commands:
config get cse kubernetes cluster config
create create cse kubernetes cluster
delete delete cse kubernetes cluster
list list cse kubernetes cluster
Also on the vcloud extensibility side, the API namespaces should be adapted in the same way:
vcd system extension create cse cse cse vcdext '/api/cluster, /api/cluster/.*, /api/cluster/.*/.*'
Using cse
namespace
vcd system extension create cse cse cse vcdext '/api/cse, /api/cse/.*, /api/cse/.*/.*'
or if cluster
is wanted for clarity
vcd system extension create cse cse cse vcdext '/api/cse/cluster, /api/cse/cluster/.*, /api/cse/cluster/.*/.*'
Hope this makes sense.
Please document which versions of vCloud Director this is expected to be compatible with. Thanks!
Hi @pacogomez,
I've seen Issue #50, which could also be leveraged for this, but alternative method would also be nice.
configuration file (for example):
/etc/vmware-cse/config.yaml
enable the service:
systemctl enable vmware-cse
start the service:
systemctl start vmware-cse
service status:
systemctl status vmware-cse
<-- (maybe cse.log files)
etc.
Enhancement request
Can we have an easy way to specify additional packages which should be installed when clusters are built?
Obviously we can manually manipulate the files in the /scripts folder to accomplish this right now, but it would be more flexible to have a switch on vcd cse cluster create (?) Ideally one capable of taking a list of extra packages to be installed on each server (master & nodes).
e.g.
vcd cse cluster create mycluster01 --nodes 3 --network my-network --template photon-v2 --extra-packages ceph-common
This would need to translate into an appropriate installation command for the template type - e.g. 'tdnf install -yq ceph-common' in Photon or 'apt install -yq ceph-common' for Ubuntu templates.
The problem I can see is knowing what 'extra packages' were specified in the cluster creation and making sure that these are also added to any new nodes deployed into the same cluster - so would need to be recorded for each deployed cluster and referenced by vcd cse node add...
For bonus points - allow additional repositories to be configured/added as well as packages...
CSE Version: 0.3
vCD Version 9.0.0.2
vCenter Version 6.5 U1
Error Message:
vcd cse cluster create mycluster --network OrgtNet_A
create_cluster: Creating cluster mycluster(098da013-5741-48d8-9aef-ad7e230220f0)
create_cluster: Creating cluster vApp mycluster(098da013-5741-48d8-9aef-ad7e230220f0)
task: 0c46d46f-ffc7-4be3-ac25-3592558dc0ed, result: error, message: module 'random' has no attribute 'choices'
Config Check:
/root/.local/bin/cse check --config /cse/config.yaml
Validating CSE on vCD from file: /cse/config.yaml
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Connected to AMQP server (xxxxxxx:5672): success
Connected to vCenter Server as [email protected] (xxxxxx:443): success
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Connected to vCloud Director as system administrator (xxxxxx:443): success
Validating 'default' service broker
Find catalog 'CSE-Global': success
Validating template: photon-v1
Is default template: True
Find template 'CSE-Global', 'photon-custom-hw11-1.0-62c543d-k8s': success
Validating template: ubuntu-16.04
Is default template: False
Find template 'CSE-Global', 'ubuntu-16.04-server-cloudimg-amd64-k8s': success
The configuration is valid.
Service Start:
Service cse status
● cse.service - Container Service Extension for VMware vCloud Director
Loaded: loaded (/etc/systemd/system/cse.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-01-23 09:21:50 CET; 41s ago
Main PID: 1219 (bash)
Tasks: 7
Memory: 62.2M
CPU: 975ms
CGroup: /system.slice/cse.service
├─1219 bash /cse/cse.sh
└─1230 /usr/bin/python3 /root/.local/bin/cse run --config /cse/config.yaml
Jan 23 09:21:57 cse01 cse.sh[1219]: Validating template: photon-v1
Jan 23 09:21:57 cse01 cse.sh[1219]: Is default template: True
Jan 23 09:21:58 cse01 cse.sh[1219]: Find template 'CSE-Global', 'photon-custom-hw11-1.0-62c543d-k8s': success
Jan 23 09:21:58 cse01 cse.sh[1219]: Validating template: ubuntu-16.04
Jan 23 09:21:58 cse01 cse.sh[1219]: Is default template: False
Jan 23 09:21:58 cse01 cse.sh[1219]: Find template 'CSE-Global', 'ubuntu-16.04-server-cloudimg-amd64-k8s': success
Jan 23 09:21:58 cse01 cse.sh[1219]: Container Service Extension for vCloud Director running
Jan 23 09:21:58 cse01 cse.sh[1219]: config file: /cse/config.yaml
Jan 23 09:21:58 cse01 cse.sh[1219]: see file log file for details: cse.log
Jan 23 09:21:58 cse01 cse.sh[1219]: waiting for requests, press Ctrl+C to finish
`
Traceback (most recent call last):
File "/Users/pgomez/.virtualenvs/cse/bin/cse", line 10, in
sys.exit(cli())
File "/Users/pgomez/.virtualenvs/cse/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/Users/pgomez/.virtualenvs/cse/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Users/pgomez/.virtualenvs/cse/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/pgomez/.virtualenvs/cse/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/pgomez/.virtualenvs/cse/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/Users/pgomez/.virtualenvs/cse/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/Users/pgomez/vmware/container-service-extension/container_service_extension/cse.py", line 103, in run
service.run()
File "/Users/pgomez/vmware/container-service-extension/container_service_extension/service.py", line 44, in run
self.config = check_config(self.config_file)
File "/Users/pgomez/vmware/container-service-extension/container_service_extension/config.py", line 113, in check_config
connection = pika.BlockingConnection(parameters)
File "/Users/pgomez/.virtualenvs/cse/lib/python3.6/site-packages/pika-0.11.0-py3.6.egg/pika/adapters/blocking_connection.py", line 374, in init
File "/Users/pgomez/.virtualenvs/cse/lib/python3.6/site-packages/pika-0.11.0-py3.6.egg/pika/adapters/blocking_connection.py", line 414, in _process_io_for_connection_setup
File "/Users/pgomez/.virtualenvs/cse/lib/python3.6/site-packages/pika-0.11.0-py3.6.egg/pika/adapters/blocking_connection.py", line 468, in _flush_output
pika.exceptions.ConnectionClosed: Connection to 10.150.199.8:5672 failed: timeout
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.