Giter VIP home page Giter VIP logo

dcos-gce's Introduction

DCOS on Google Compute Engine

This repository contains scripts to configure a DC/OS cluster on Google Compute Engine.

A bootstrap node is required to run the scripts and to bootstrap the DC/OS cluster.

PLEASE READ THE ENTIRE DOCUMENT. YOU MUST MAKE CHANGES FOR THE SCRIPTS TO WORK IN YOUR GCE ENVIRONMENT.

Bootstrap node configuration

YOU MUST CREATE A PROJECT using the google cloud console. The author created a project called trek-treckr

You can create the bootstrap node using the google cloud console. The author used a n1-standard-1 instance running centos 7 with a 10 GB persistent disk in zone europe-west1-c. The bootstrap node must have "Allow full access to all Cloud APIs" in the Identity and API access section. Also enable Block project-wide SSH keys in the SSH Keys section. Create the instance.

After creating the boot instance run the following from the shell

sudo yum update google-cloud-sdk
sudo yum update
sudo yum install epel-release
sudo yum install python-pip
sudo pip install -U pip
sudo pip install 'apache-libcloud==1.2.1'
sudo pip install 'docker-py==1.9.0'
sudo yum install git-1.8.3.1 ansible-2.1.1.0

You need to create the rsa public/private keypairs to allow passwordless logins via SSH to the nodes of the DC/OS cluster. This is required by ansible to create the cluster nodes and install DC/OS on the nodes.

Run the following to generate the keys

ssh-keygen -t rsa -f ~/.ssh/id_rsa -C ajazam

PLEASE REPLACE ajazam with your username. Do not enter a password when prompted

Make a backup copy of id_rsa.

Open rsa pub key

sudo vi ~/.ssh/id_rsa.pub

shows

ssh-rsa abcdefghijklmaasnsknsdjfsdfjs;dfj;sdflkjsd ajazam

Prefix your username, followed by a colon, to the above line. Also replace ajazam at the end with your username.

ajazam:ssh-rsa abcdefghijklmaasnsknsdjfsdfjs;dfj;sdflkjsd ajazam

save contents of id_rsa.pub. Please replace the ajazam with your username.

Add the rsa public key to your project

chmod 400 ~/.ssh/id_rsa
gcloud compute project-info add-metadata --metadata-from-file sshKeys=~/.ssh/id_rsa.pub

Disable selinux for docker to work

make the following change to /etc/selinux/config

SELINUX=disabled

reboot host

To install docker add the yum repo

sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF

install the docker package

sudo yum install docker-engine-1.11.2

Add following changes to /usr/lib/systemd/system/docker.service

ExecStart=/usr/bin/docker daemon --storage-driver=overlay

reload systemd

sudo systemctl daemon-reload

Start docker

sudo systemctl start docker.service

Verify if docker works

sudo docker run hello-world

download the dcos-gce scripts

git clone https://github.com/dcos-labs/dcos-gce

change directory

cd dcos-gce

Please make appropriate changes to group_vars/all. You need to review project, subnet, login_name, bootstrap_public_ip & zone

insert following into ~/.ansible.cfg to stop host key checking

[defaults]
host_key_checking = False

[paramiko_connection]
record_host_keys = False

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null

Ensure the IP address for master0 in ./hosts is the next consecutive IP from bootstrap_public_ip.

To create and configure the master nodes run

ansible-playbook -i hosts install.yml

To create and configure the private nodes run

ansible-playbook -i hosts add_agents.yml --extra-vars "start_id=0001 end_id=0002 agent_type=private"

start_id=0001 and end_id=0002 specify the range of id's that are appended to the hostname "agent" to create unique agent names. If start_id is not specified then a default of 0001 is used. If the end_id is not specified then a default of 0001 is used.

When specifying start_id or end_id via CLI, the leading zeroes must be dropped for any agent id higher than 7 or ansible will throw a format error.

ansible-playbook -i hosts add_agents.yml --extra-vars "start_id=0006 end_id=10 agent_type=private"

The values for agent_type are either private or public. If an agent_type is not specified then it is assumed agent_type is private.

To create public nodes type

ansible-playbook -i hosts add_agents.yml --extra-vars "start_id=0003 end_id=0004 agent_type=public"

Configurable parameters

File './hosts' is an ansible inventory file. Text wrapped by [] represents a group name and individual entries after the group name represent hosts in that group. The [masters] group contains node names and IP addresses for the master nodes. In the supplied file the host name is master0 and the ip address 10.132.0.3 is assigned to master0. YOU MUST CHANGE the IP address for master0 for your network. You can create multiple entries e.g. master1, master2 etc. Each node must have a unique IP address.

The [agents] group has one entry. It specifies the names of all the agents one can have in the DC/OS cluster. The value specifies that agent0000 to agent9999, a total of 10,000 agents are allowed. This really is an artificial limit because it can easily be changed.

The [bootstrap] group has the name of the bootstrap node.

File './group_vars/all' contains miscellaneous parameters that will change the behaviour of the installation scripts. The parameters are split into two groups. Group 1 parameters must be changed to reflect your environment. Group 2 parameters can optionally be changed to change the behaviour of the scripts.

Group 1 parameters YOU MUST CHANGE for your environment

project

Your project id. Default: trek-trackr

subnet

Your network. Default: default

login_name

The login name used for accessing each GCE instance. Default: ajazam

bootstrap_public_ip

The bootstrap nodes public IP. Default: 10.132.0.2

zone

You may change this to your preferred zone. Default: europe-west1-c

Group 2 parameters which optionally change the behaviour of the installation scripts

master_boot_disk_size:

The size of the master node boot disk. Default 10 GB

master_machine_type

The GCE instance type used for the master nodes. Default: n1-standard-2

master_boot_disk_type

The master boot disk type. Default: pd-standard

agent_boot_disk_size

The size of the agent boot disk. Default 10 GB

agent_machine_type

The GCE instance type used for the agent nodes. Default: n1-standard-2

agent_boot_disk_type

The agent boot disk type. Default: pd-standard

agent_instance_type

Allows agents to be preemptible. If the value is "MIGRATE" then they are not preemptible. If the value is '"TERMINATE" --preemptible' then the instance is preemptible. Default: "MIGRATE"

agent_type

Can specify whether an agent is "public" or "private". Default: "private"

start_id

The number appended to the text agent is used to define the hostname of the first agent. e.g. agent0001. Intermediate agents between start_id and end_id will be created if required. Default: 0001

end_id

The number appended to the text agent is used to define the hostname of the last agent. e.g. agent0001. Intermediate agents between start_id and end_id will be created if required. Default: 0001

gcloudbin

The location of the gcloudbin binary. Default: /usr/local/bin/gcloud

image

The disk image used on the master and agent. Default: /centos-cloud/centos-7-v20161027

bootstrap_public_port

The port on the bootstrap node which is used to fetch the dcos installer from each of the master and agent nodes. Default: 8080

cluster_name

The name of the DC/OS cluster. Default: cluster_name

scopes

Don't change this. Required by the google cloud SDK

dcos_installer_filename

The filename for the DC/OS installer. Default dcos_generate_config.sh

dcos_installer_download_path

The location of where the dcos installer is available from dcos.io. Default: https://downloads.dcos.io/dcos/stable/{{ dcos_installer_filename }} The value of {{ dcos_installer_file }} is described above.

home_directory

The home directory for your logins. Default: /home/{{ login_name }} The value of {{ login_name }} is described above.

downloads_from_bootstrap

The concurrent downloads of the dcos installer to the cluster of master and agent nodes. You may need to experiment with this to get the best performance. The performance will be a function of the machine type used for the bootstrap node. Default: 2

dcos_bootstrap_container

Holds the name of the dcos bootstrap container running on the bootstrap node. Default: dcosinstaller

dcos-gce's People

Contributors

ajazam avatar ambakshi avatar andel7 avatar azulinho avatar oghma avatar sbolel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dcos-gce's Issues

ansible playbook not able to connect to master0

I'm following all the steps, but somehow the playbook execution fails. When I try to connect to master0 locally, I'm able to succeed.

Command :
ansible-playbook -i hosts install.yml

Output:

changed: [localhost] => (item={'_ansible_no_log': False, u'ansible_job_id': u'914054982487.3088', u'started': 1, '_ansible_item_result': True, 'item': u'master0', u'finished': 0, u'results_file': u'/root/.ansible_async/914054982487.3088'})

TASK [pause] *******************************************************************
Pausing for 20 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [localhost]

PLAY [initialise master dcos nodes] ********************************************

TASK [setup] *******************************************************************
The authenticity of host 'master0 (10.132.0.3)' can't be established.
ECDSA key fingerprint is e5:af:27:6a:f3:af:81:46:a4:57:90:d2:5e:90:24:ae.
Are you sure you want to continue connecting (yes/no)? yes
fatal: [master0]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh.", "unreachable": true}
 [WARNING]: Could not create retry file 'install.retry'.         [Errno 2] No such file or directory: ''


PLAY RECAP *********************************************************************
localhost                  : ok=13   changed=8    unreachable=0    failed=0   
master0                    : ok=0    changed=0    unreachable=1    failed=0   

Failure talking to yum

On the Ansible Task [install dependencies] on the master install step.

I got the following error:

TASK [install dependencies] ****************************************************************************************************************************************** failed: [master0] (item=[u'unzip', u'ipset', u'ntp']) => {"changed": false, "item": ["unzip", "ipset", "ntp"], "msg": "Failure talking to yum: failure: repodata/repom d.xml from google-cloud-compute: [Errno 256] No more mirrors to try.\nhttps://packages.cloud.google.com/yum/repos/google-cloud-compute-el7-x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for google-cloud-compute"} to retry, use: --limit @/opt/dcos-gce/install.retry

to solve that error i have costumed the tasks/configure_master_dcos_nodes.yml file

- name: upgrade all packages yum: name: '*' state: latest

Now i have the same error with the add agent task.
TASK [install dependencies] ****************************************************************************************************************************************** failed: [agent0001] (item=[u'unzip', u'ipset', u'ntp']) => {"changed": false, "item": ["unzip", "ipset", "ntp"], "msg": "Failure talking to yum: failure: repodata/rep omd.xml from google-cloud-compute: [Errno 256] No more mirrors to try.\nhttps://packages.cloud.google.com/yum/repos/google-cloud-compute-el7-x86_64/repodata/repomd.xm l: [Errno -1] repomd.xml signature could not be verified for google-cloud-compute"} failed: [agent0002] (item=[u'unzip', u'ipset', u'ntp']) => {"changed": false, "item": ["unzip", "ipset", "ntp"], "msg": "Failure talking to yum: failure: repodata/rep omd.xml from google-cloud-compute: [Errno 256] No more mirrors to try.\nhttps://packages.cloud.google.com/yum/repos/google-cloud-compute-el7-x86_64/repodata/repomd.xm l: [Errno -1] repomd.xml signature could not be verified for google-cloud-compute"} to retry, use: --limit @/opt/dcos-gce/add_agents.retry
debugging continues ....

Agents hostname is set as localhost in /etc/hostname

I am trying to use contraints for certain services. I would ideally want to do that with the hostname of the nodes. But I noticed that all the agents that are spawned have localhost set as their hostname in /etc/hostname
Can someone tell me how I can change the hostnames using ansible perhaps? DCOS shows IPs for the nodes in the hostname fields.

Requires consecutive IP

Why is this step necessary?

Ensure the IP address for master0 in ./hosts is the next consecutive IP from bootstrap_public_ip.

Does it have to be the consecutive ip? I'm getting

Could not fetch resource:", " - IP '10.142.0.13' is already being used by another resource

This seems incredibly fragile...

Ansible errors: master_discovery, docker-py and urlopen

I followed the instructions to deploy DC/OS to GCE and faced two issues:

ansible-playbook -i hosts install.yml yields the following error:

TASK [Generate customised build file] ******************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["/home/cko/dcos_generate_config.sh"], "delta": "0:00:00.824642", "end": "2016-09-30 08:37:09.038750", "failed": true, "rc": 1, "start": "2016-09-30 08:37:08.214108", "stderr": "\u001b[33m====> EXECUTING CONFIGURATION GENERATIONExecute the configuration generation (genconf).\u001b[0m\nGenerating configuration files...\n\u001b[1;31mmaster_discovery: Must set master_discovery, no way to calculate value.\u001b[0m", "stdout": "", "stdout_lines": [], "warnings": []}

I fixed this by adding master_discovery to the template/config.yaml file.

echo "master_discovery: static" >> template/config.yaml

After that, Ansible complained about docker-py:

TASK [start docker container] **************************************************
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Error: docker-py version is 1.10.3. Minimum version required is 1.7.0."}

I had to downgrade docker-py manually.

sudo pip install docker-py==1.7.0

Might be related to ansible/ansible#17495

FYI I also get:

TASK [remove masters from known_hosts file] ************************************
failed: [localhost] (item=master0) => {"changed": true, "cmd": ["/usr/bin/ssh-keygen", "-R", "master0"], "delta": "0:00:00.008812", "end": "2016-09-30 11:15:02.139360", "failed": true, "item": "master0", "rc": 255, "start": "2016-09-30 11:15:02.130548", "stderr": "ssh-keygen: /root/.ssh/known_hosts: No such file or directory", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring

TASK [remove  masters IP from known_hosts file] ********************************
failed: [localhost] (item=master0) => {"changed": true, "cmd": ["/usr/bin/ssh-keygen", "-R", "10.132.0.3"], "delta": "0:00:00.004042", "end": "2016-09-30 11:15:02.463064", "failed": true, "item": "master0", "rc": 255, "start": "2016-09-30 11:15:02.459022", "stderr": "ssh-keygen: /root/.ssh/known_hosts: No such file or directory", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring

stuck at #ansible-playbook -i hosts install.yml

Hi,

I was trying to setup DCOS on GCE but stuck at creating and launching master0. Following was the error can you please help us to resolve same, I had checked all config for 2-3 times all are as per DOC.

...ignoring
failed: [localhost] (item=master0) => {"changed": true, "cmd": ["/usr/bin/ssh-keygen", "-R", "master0"], "delta": "0:00:00.003979", "end": "2016-12-02 15:35:50.125943", "failed": true, "item": "master0", "rc": 255, "start": "2016-12-02 15:35:50.121964", "stderr": "ssh-keygen: /root/.ssh/known_hosts: No such file or directory", "stdout": "", "stdout_lines": [], "warnings": []}

TASK [remove masters IP from known_hosts file] ********************************
...ignoring
failed: [localhost] (item=master0) => {"changed": true, "cmd": ["/usr/bin/ssh-keygen", "-R", "10.240.0.48"], "delta": "0:00:00.003790", "end": "2016-12-02 15:35:50.347317", "failed": true, "item": "master0", "rc": 255, "start": "2016-12-02 15:35:50.343527", "stderr": "ssh-keygen: /root/.ssh/known_hosts: No such file or directory", "stdout": "", "stdout_lines": [], "warnings": []}

TASK [Create and launch masters] ***********************************************
ok: [localhost] => (item=master0)

TASK [wait for master instance creation to complete] ***************************
failed: [localhost] (item={'_ansible_parsed': True, '_ansible_no_log': False, u'ansible_job_id': u'154463511040.2732', u'started': 1, '_ansible_item_result': True, 'item': u'master0', u'finished': 0, u'results_file': u'/root/.ansible_async/154463511040.2732'}) => {"ansible_job_id": "154463511040.2732", "attempts": 1, "changed": true, "cmd": ["/usr/bin/gcloud", "compute", "--project", "innoplexus-980", "instances", "create", "master0", "--zone", "us-east1-b", "--machine-type", "n1-standard-2", "--subnet", "obd-db", "--private-network-ip", "10.240.0.48", "--maintenance-policy", "MIGRATE", "--tags", "master", "--scopes", "default=https://www.googleapis.com/auth/cloud-platform", "--image", "/centos-cloud/centos-7-v20161027", "--boot-disk-size", "10", "--boot-disk-type", "pd-standard", "--boot-disk-device-name", "master0-boot"], "delta": "0:00:00.377401", "end": "2016-12-02 15:35:50.984846", "failed": true, "finished": 1, "item": {"ansible_job_id": "154463511040.2732", "finished": 0, "item": "master0", "results_file": "/root/.ansible_async/154463511040.2732", "started": 1}, "rc": 1, "start": "2016-12-02 15:35:50.607445", "stderr": "WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#pdperformance.\nWARNING: Resource ids like [/centos-cloud/centos-7-v20161027] (specifying multiple parameters separated by "/") are undocumented and deprecated, support for which will be removed in the near future.\nERROR: (gcloud.compute.instances.create) Some requests did not succeed:\n - Insufficient Permission", "stdout": "", "stdout_lines": [], "warnings": []}
to retry, use: --limit @/home/suyash/dcos-gce/install.retry

Many thanks in advance.

ansible-playbook -i hosts install.yml failed

Hi all,
I'm new for dcos on GCP,below are some errors I saw.

I followed the Readme step by step to create the dcos.
some steps which need to be replaced and my setting are:
1.ssh-keygen -t rsa -f ~/.ssh/id_rsa -C dcos18 and prefix the dcos18.(Here I don't figure out what is my "username" )
2.create a .ansible.cfg file in the root.
3.the IP in the file ./hosts I set 10.140.0.3(my zone asia-east1,and bootstarp IP:10.140.0.2) .

I would appreciate for your help.

TASK [remove masters from known_hosts file] ***********************

failed: [localhost] (item=master0) => {"changed": true, "cmd": ["/
usr/bin/ssh-keygen", "-R", "master0"], "delta": "0:00:00.008467", 
"end": "2018-12-26 19:57:33.291838", "item": "master0", "msg": "no
n-zero return code", "rc": 255, "start": "2018-12-26 19:57:33.2833
71", "stderr": "do_known_hosts: hostkeys_foreach failed: No such f
ile or directory", "stderr_lines": ["do_known_hosts: hostkeys_fore
ach failed: No such file or directory"], "stdout": "", "stdout_lin
es": []}
...ignoring

TASK [remove masters IP from known_hosts file] ******************

failed: [localhost] (item=master0) => {"changed": true, "cmd": ["/
usr/bin/ssh-keygen", "-R", "10.140.0.3"], "delta": "0:00:00.006626
", "end": "2018-12-26 19:57:33.553798", "item": "master0", "msg": 
"non-zero return code", "rc": 255, "start": "2018-12-26 19:57:33.5
47172", "stderr": "do_known_hosts: hostkeys_foreach failed: No suc
h file or directory", "stderr_lines": ["do_known_hosts: hostkeys_f
oreach failed: No such file or directory"], "stdout": "", "stdout_
lines": []}
...ignoring

TASK [wait for master instance creation to complete] *************


FAILED - RETRYING: wait for master instance creation to complete (
300 retries left).
failed: [localhost] (item={'_ansible_parsed': True, '_ansible_item
_result': True, '_ansible_item_label': u'master0', u'ansible_job_i
d': u'568314799743.4865', 'failed': False, u'started': 1, 'changed
': True, 'item': u'master0', u'finished': 0, u'results_file': u'/r
oot/.ansible_async/568314799743.4865', '_ansible_ignore_errors': N
one, '_ansible_no_log': False}) => {"ansible_job_id": "56831479974
3.4865", "attempts": 2, "changed": true, "cmd": ["/usr/bin/gcloud"
, "compute", "--project", "dcos18", "instances", "create", "master
0", "--zone", "asia-east1-a", "--machine-type", "n1-standard-1", "
--subnet", "default", "--private-network-ip", "10.140.0.3", "--mai
ntenance-policy", "MIGRATE", "--tags", "master", "--scopes", "http
s://www.googleapis.com/auth/cloud-platform", "--image", "centos-7-
v20161027", "--image-project", "centos-cloud", "--boot-disk-size",
"10", "--boot-disk-type", "pd-standard", "--boot-disk-device-name
", "master0-boot", "--metadata", "hostname=master0"], "delta": "0:
00:01.562617", "end": "2018-12-26 19:57:35.421858", "finished": 1,
"item": {"ansible_job_id": "568314799743.4865", "changed": true,
"failed": false, "finished": 0, "item": "master0", "results_file":
"/root/.ansible_async/568314799743.4865", "started": 1}, "msg": "
non-zero return code", "rc": 1, "start": "2018-12-26 19:57:33.8592
41", "stderr": "WARNING: You have selected a disk size of under [2
00GB]. This may result in poor I/O performance. For more informati
on, see: https://developers.google.com/compute/docs/disks#performa
nce.\nERROR: (gcloud.compute.instances.create) Could not fetch res
ource:\n - The resource 'projects/dcos18/zones/asia-east1-a/instan
ces/master0' already exists", "stderr_lines": ["WARNING: You have
selected a disk size of under [200GB].This may result in poor I/O
performance. For more information, see: https://developers.google
.com/compute/docs/disks#performance.", "ERROR: (gcloud.compute.ins
tances.create) Could not fetch resource:", " - The resource 'proje
cts/dcos18/zones/asia-east1-a/instances/master0' already exists"],
"stdout": "", "stdout_lines": []}

Ubuntu image not supported

I was trying to configure a mesos cluster on GCE using these scripts with ubuntu. I changed the group_vars/all to have image: '/ubuntu-os-cloud/ubuntu-1604-xenial-v20160815'. The script uses yum to install dependencies rather than apt.

Here's what I see,

ASK [pause] *******************************************************************
Pausing for 20 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [localhost]

PLAY [initialise master dcos nodes] ********************************************

TASK [setup] *******************************************************************
ok: [master0]

TASK [create tmp directory] ****************************************************
changed: [master0]

TASK [install dependencies] ****************************************************
failed: [master0] (item=[u'unzip', u'ipset']) => {"failed": true, "item": ["unzip", "ipset"], "module_stderr": "", "module_stdout": "Traceback (most recent call last):\r\n  File \"/tmp/ansible_mL3__t/ansible_module_yum.py\", line 25, in <module>\r\n    import yum\r\nImportError: No module named yum\r\n", "msg": "MODULE FAILURE", "parsed": false}

NO MORE HOSTS LEFT *************************************************************

I see we use yum to configure master node,

$ cat configure_dcos.yml

---
- name: configure hosts
  hosts: tag_master,tag_privateagent,tag_publicagent
  become: yes
  gather_facts: no
  tasks:
    - name: create tmp directory
      file: dest=/tmp/dcos state=directory

    - name: install dependencies
      yum: name={{ item  }} state=present  <<<
      with_items:
        - unzip
        - ipset

    - name: add group nogroup
      group: name=nogroup state=present

    - name: disable selinux
      selinux: state=disabled

    - name: restart host
      shell: sleep 1;/usr/sbin/reboot
      async: 1
      poll: 0
      ignore_errors: true

    - name: waiting for host to come back online
      local_action: wait_for host={{ inventory_hostname }} search_regex=OpenSSH port=22 timeout=300 state=started

Unable to create agent nodes using the Ansible script

New instance was created successfully but the script stopped at the error below:

(excerpt from ansible output)
[user@bootstrap dcos-gce]$ ansible-playbook -i hosts add_agents.yml
...
things ran fine until
...
PLAY [start dcos bootstrap container] ******************************************

TASK [ensure dcos installer container is started] ******************************
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Error retrieving container list: ('Connection aborted.', error(2, 'No such file or directory'))"}

Ansible script unable to create master instance - permission related

GCE returned error when create and configure master nodes with the ansible script. Likely a GCE config issue but the doc didn't specify the required permission.

ansible-playbook -i hosts install.yml

ERROR: (gcloud.compute.instance.create) Some request did not succeed:

  • Required 'compute.subnetworks.use' permission for 'project/trek-trackr/regions/[region]/subnetworks/default'

Are you set-up for REX-Ray?

I'm getting "Error looking up volume plugin rexray: plugin not found." running DC/OS on Google Compute Engine?

I added this to template/config.yaml on the bootstrap node:

rexray:
  storageDrivers:
  - gce
gce:
  keyfile: /certdir/mobilecapitalgroup-dc-os-3bbd29c9ab02.json

I put my private key at the path above on master0.

I added this to my application configuration JSON:

    "volumes": [
      {
        "containerPath": "/var/jenkins_home",
        "external": {
          "name": "jenkins",
          "provider": "dvdi",
          "options": {
            "dvdi/driver": "rexray"
          }
        },
        "mode": "RW"
      },
      {
        "containerPath": "/opt/mesosphere",
        "hostPath": "/opt/mesosphere",
        "mode": "RO"
      }
    ]
]]

But when I try to start Jenkins I get this error:

docker: Error response from daemon: create jenkins: create jenkins: Error looking up volume plugin rexray: plugin not found.
See 'docker run --help'.
W1203 18:46:44.996757 18633 logging.cpp:91] RAW: Received signal SIGTERM from process 2548 of user 0; exiting

Checking rexray on master0 I see that it isn't running:

# /opt/mesosphere/bin/rexray service status
REX-Ray is stopped

I tried adding gce to /etc/rexray/config.yml

rexray:
  loglevel: info
  storageDrivers:
  - gce
  gce:
    keyFile: /certdir/mobilecapitalgroup-dc-os-3bbd29c9ab02.json
  modules:
    default-admin:
      host: tcp://127.0.0.1:61003
    default-docker:
      disabled: true

But it still won't start:

[root@master0 rexray]# /opt/mesosphere/bin/rexray service start
INFO[0000] [linux]
INFO[0000] [docker]
INFO[0000] [gce]
INFO[0000] os driver initialized moduleName= provider=linux
INFO[0000] docker volume driver initialized availabilityZone= iops= moduleName= provider=docker size= volumeRootPath=/data volumeType=
FATA[0000] Could not read service account credentials file, => {open : no such file or directory} keyFile= moduleName= provider=gce

Any suggestions?

create GCE uses wrong URL for image

looks like gcloud now requires the --image-project so that it uses the correct image when spinning up new instances.

   "cmd": [
        "/usr/bin/gcloud", 
        "compute", 
        "--project", 
        "my-project", 
        "instances", 
        "create", 
        "master0", 
        "--zone", 
        "europe-west1-a", 
        "--machine-type", 
        "n1-standard-2", 
        "--subnet", 
        "default", 
        "--private-network-ip", 
        "11.1.1.1", 
        "--maintenance-policy", 
        "MIGRATE", 
        "--tags", 
        "master", 
        "--scopes", 
        "default=https://www.googleapis.com/auth/cloud-platform", 
        "--image", 
        "centos-cloud/centos-7-v20161027", 
        "--boot-disk-size", 
        "10", 
        "--boot-disk-type", 
        "pd-standard", 
        "--boot-disk-device-name", 
        "master0-boot"
    ], 
    "delta": "0:00:00.475978", 
    "end": "2017-01-30 12:39:31.438059", 
    "failed": true, 
    "finished": 1, 
    "invocation": {
        "module_args": {
            "_raw_params": "/usr/bin/gcloud compute --project myproject instances create master0 --zone europe-west1-a --machine-type n1-standard-2 --subnet default --private-network-ip 1.1.1.1 --maintenance-policy \"MIGRATE\" --tags \"master\" --scopes default=https://www.googleapis.com/auth/clou
d-platform --image centos-cloud/centos-7-v20161027 --boot-disk-size 10 --boot-disk-type pd-standard --boot-disk-device-name master0-boot", 
            "_uses_shell": false, 
            "chdir": null, 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "warn": true
        }, 
        "module_name": "async_status"
    }, 
    "item": {
        "ansible_job_id": "54928646654.5347", 
        "finished": 0, 
        "item": "master0", 
        "results_file": "/root/.ansible_async/54928646654.5347", 
        "started": 1
    }, 
    "rc": 1, 
    "start": "2017-01-30 12:39:30.962081", 
    "stderr": "WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#pdperformance.\nWARNING: Flag format --scopes [ACCOUNT=]SCOPE, [[ACCOUNT=]SCOPE, ...] is deprecated and will be removed 24th Jan
 2018. Use --scopes SCOPE[, SCOPE...] --service-account ACCOUNT instead.\nERROR: (gcloud.compute.instances.create) Some requests did not succeed:\n - Invalid value for field 'resource.disks[0].initializeParams.sourceImage': 'https://www.googleapis.com/compute/v1/projects/myproj/global/images/centos-clo
ud/centos-7-v20161027'. The URL is malformed.", 
    "stdout": "", 
    "stdout_lines": [], 
    "warnings": []
}

Installation failed if bootstrap machine install Docker 1.12

Only happen if bootstrap machine install latest Docker 1.12

TASK [start docker] ************************************************************
fatal: [evan-dcos-master]: FAILED! => {"changed": false, "failed": true, "msg": "Failed to start docker.service: Unit docker.socket failed to load: No such file or directory.\n"}
to retry, use: --limit @install.retry

ajazam -> dcos-user

This is pretty cool!

Now that this is documented on the dcos.io docs, it would probably look more official if the default username was dcos-user or something similar, rather than ajazam.

mesos-docker-executor: error while loading shared libraries: libssl.so.1.0.0

We've been using this project successfully for about four months now. This weekend I had to rebuild our cluster, following the same procedure we've been using. For some reason, Jenkins gets the error below, and OpenVPN won't deploy at all. I am still able to deploy MarathonLB.

mesos-docker-executor: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

Is anybody else having problems with Google Cloud Platform? I thought maybe it was because I was using the latest CentOS build, but when I rolled back to /centos-cloud/centos-7-v20161129 I still had the same problem.

use --scopes is not working in the current version on GCE

I've just tried to install this on GCE but getting the following error:

[wait for master instance creation to complete] ***********************************************************************************************************************************FAILED - RETRYING: wait for master instance creation to complete (300 retries left).failed: [localhost] (item={'_ansible_parsed': True, '_ansible_item_result': True, '_ansible_no_log': False, u'ansible_job_id': u'339686342942.2231', 'failed': False, u'started': 1, 'changed': True, 'item': u'master0', u'finished': 0, u'results_file': u'/root/.ansible_async/339686342942.2231', '_ansible_ignore_errors': None}) => {"ansible_job_id": "339686342942.2231", "attempts": 2, "changed": true, "cmd": ["/usr/bin/gcloud", "compute", "--project", "optimum-web-195718", "instances", "create", "master0", "--zone", "us-east1-b", "--machine-type", "n1-standard-2", "--subnet", "default-6f68d4d6fabcb680", "--private-network-ip", "10.142.0.3", "--maintenance-policy", "MIGRATE", "--tags", "master", "--scopes", "default=https://www.googleapis.com/auth/cloud-platform", "--image", "centos-7-v20161027", "--image-project", "centos-cloud", "--boot-disk-size", "10", "--boot-disk-type", "pd-standard", "--boot-disk-device-name", "master0-boot", "--metadata", "hostname=master0"], "delta": "0:00:03.795114", "end": "2018-02-19 19:35:43.643942", "finished": 1, "item": {"ansible_job_id": "339686342942.2231", "changed": true, "failed": false, "finished": 0, "item": "master0", "results_file": "/root/.ansible_async/339686342942.2231", "started": 1}, "msg": "non-zero return code", "rc": 1, "start": "2018-02-19 19:35:39.848828", "stderr": "WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.\nERROR: (gcloud.compute.instances.create) Invalid value for [--scopes]: Flag format --scopes [ACCOUNT=]SCOPE,[[ACCOUNT=]SCOPE, ...] is removed. Use --scopes [SCOPE,...] --service-account ACCOUNT instead.", "stderr_lines": ["WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.", "ERROR: (gcloud.compute.instances.create) Invalid value for [--scopes]: Flag format --scopes [ACCOUNT=]SCOPE,[[ACCOUNT=]SCOPE, ...] is removed. Use --scopes [SCOPE,...] --service-account ACCOUNT instead."], "stdout": "", "stdout_lines": []} to retry, use: --limit @/home/rodyhuibers/dcos-gce/install.retry

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.