Giter VIP home page Giter VIP logo

ec3's Introduction

alternate text


Elastic Cloud Computing Cluster (EC3) is a tool to create elastic virtual clusters on top of Infrastructure as a Service (IaaS) providers, either public (such as Amazon Web Services, Google Cloud or Microsoft Azure) or on-premises (such as OpenNebula and OpenStack). We offer recipes to deploy TORQUE (optionally with MAUI), SLURM, SGE, HTCondor, Mesos, Nomad and Kubernetes clusters that can be self-managed with CLUES: it starts with a single-node cluster and working nodes will be dynamically deployed and provisioned to fit increasing load (number of jobs at the LRMS). Working nodes will be undeployed when they are idle. This introduces a cost-efficient approach for Cluster-based computing.

Installation

Requisites

The program ec3 requires Python 2.6+, PLY, PyYAML, Requests, jsonschema and an IM server, which is used to launch the virtual machines.

PyYAML is usually available in distribution repositories (python-yaml in Debian; PyYAML in Red Hat; and PyYAML in pip).

PLY is usually available in distribution repositories (python-ply and ply in pip).

Requests is usually available in distribution repositories (python-requests and requests in pip).

jsonschema is usually available in distribution repositories (python-jsonschema and jsonschema in pip).

By default ec3 uses our public IM server in appsgrycap.i3m.upv.es. Optionally you can deploy a local IM server following the instructions of the IM manual.

Also sshpass command is required to provide the user with ssh access to the cluster.

Installing

As Python 2 is no longer supported, we recommend to install ec3 with Python 3.

First you need to install pip tool. To install them in Debian and Ubuntu based distributions, do:

sudo apt update
sudo apt install -y python3-pip

In Red Hat based distributions (RHEL, CentOS, Amazon Linux, Oracle Linux, Fedora, etc.), do:

sudo yum install -y epel-release
sudo yum install -y which python3-pip

Then you only have to call the install command of the pip tool with the ec3-cli package:

sudo pip3 install ec3-cli

You can also download the last ec3 version from this git repository:

git clone https://github.com/grycap/ec3

Then you can install it calling the pip tool with the current ec3 directory:

sudo pip3 install ./ec3

Basic example with Amazon EC2

First create a file auth.txt with a single line like this:

id = provider ; type = EC2 ; username = <<Access Key ID>> ; password = <<Secret Access Key>>

Replace <<Access Key ID>> and <<Secret Access Key>> with the corresponding values for the AWS account where the cluster will be deployed. It is safer to use the credentials of an IAM user created within your AWS account.

This file is the authorization file (see Authorization file), and can have more than one set of credentials.

The next command deploys a TORQUE cluster based on an Ubuntu image:

$ ec3 launch mycluster torque ubuntu-ec2 -a auth.txt -y
WARNING: you are not using a secure connection and this can compromise the secrecy of the passwords and private keys available in the authorization file.
Creating infrastructure
Infrastructure successfully created with ID: 60
   ▄▟▙▄¨        Front-end state: running, IP: 132.43.105.28

If you deployed a local IM server, use the next command instead:

$ ec3 launch mycluster torque ubuntu-ec2 -a auth.txt -u http://localhost:8899

This can take several minutes.

Bear in mind that you have to specify a resource manager (like torque in our example) in addition to the images that you want to deploy (e.g. ubuntu-ec2). For more information about this check the templates documentation.

You can show basic information about the deployed clusters by executing:

$ ec3 list
    name       state          IP        nodes
 ---------------------------------------------
  mycluster  configured  132.43.105.28    0

Once the cluster has been deployed, open a ssh session to the front-end (you may need to install the sshpass library):

$ ec3 ssh mycluster
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-24-generic x86_64)
Documentation:  https://help.ubuntu.com/
ubuntu@torqueserver:~$

You may use the cluster as usual, depending on the LRMS. For Torque, you can decide to submit a couple of jobs using qsub, to test elasticity in the cluster:

$ for i in 1 2; do echo "/bin/sleep 50" | qsub; done

Notice that CLUES will intercept the jobs submited to the LRMS to deploy additional working nodes if needed. This might result in a customizable (180 seconds by default) blocking delay when submitting jobs when no additional working nodes are available. This guarantees that jobs will enter execution as soon as the working nodes are deployed and integrated in the cluster.

Working nodes will be provisioned and relinquished automatically to increase and decrease the cluster size according to the elasticity policies provided by CLUES.

Enjoy your virtual elastic cluster!

EC3 in Docker Hub

EC3 has an official Docker container image available in Docker Hub and GitHub Container Regitry that can be used instead of installing the CLI. You can download it by typing:

$ sudo docker pull grycap/ec3
or
$ sudo docker pull ghcr.io/grycap/ec3

You can exploit all the potential of EC3 as if you download the CLI and run it on your computer:

$ sudo docker run grycap/ec3 list
$ sudo docker run grycap/ec3 templates

To launch a cluster, you can use the recipes that you have locally by mounting the folder as a volume. Also it is recommendable to mantain the data of active clusters locally, by mounting a volume as follows:

$ sudo docker run -v /home/user/:/tmp/ -v /home/user/ec3/templates/:/etc/ec3/templates -v /home/user/.ec3/clusters:/root/.ec3/clusters grycap/ec3 launch mycluster torque ubuntu16 -a /tmp/auth.dat

Notice that you need to change the local paths to the paths where you store the auth file, the templates folder and the .ec3/clusters folder. So, once the front-end is deployed and configured you can connect to it by using:

$ sudo docker run -ti -v /home/user/.ec3/clusters:/root/.ec3/clusters grycap/ec3 ssh mycluster

Later on, when you need to destroy the cluster, you can type:

$ sudo docker run -ti -v /home/user/.ec3/clusters:/root/.ec3/clusters grycap/ec3 destroy mycluster

Additional information

ec3's People

Contributors

alldaudinot avatar amcaar avatar eromero-vlc avatar gmolto avatar micafer avatar srisco avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ec3's Issues

Error destroying clusters

Hi, today I updated ec3 (master branch 4fadb51) and, after creating a cluster with no problems, I've got this error message when I tried to delete it:

$ python2 ./ec3 destroy mycluster                                 
WARNING: you are going to delete the infrastructure (including frontend and nodes).
Continue [y/N]? y
Error destroying cluster 'mycluster': name 'y' is not defined

Using Python 2.7.15 on Arch Linux.

NTP

Hola,

He visto que en la instalación de mesos desde ec3 no se instala el ntp. Tampoco lo he encontrado como un módulo separado.

Imagino que sería conveniente ponerlo por defecto, o al menos dar la opción para ello.

Gracias

Error configuring Infra 404

Error while configuring the infrastructure: {"message": "Error Getting Inf. prop: Invalid infrastructure ID or access not granted.", "code": 404}

Unable to set VPC.

My AWS account does not have a default VPC. I don't see where I can specify the VPC to use.

Error parsing a password with special characters

In the auth file I used a password with the single quote ( ' ) character in it.
In the line 648 of the ec3 file there is a call to a radl parser: radl = parse_radl(info)
When this parser reads the password thinks that the single quote of the password is the end of the auth file, tries to execute the rest of the code (in this case the remaining password) and fails.

Deploy two or more front nodes at the same time

Ec3 should allow to deploy more than one front-end node at the same time.

Some recipes could benefit from this enhancement (e.g.):

  • Mesos cluster: creating more than one front node (for High availability) + load balancers
  • Monasca: creating the front + the devstack node

IM on front-end node unable to create worker nodes using OCCI

Firstly, I should say I'm not sure whether I should submit this issue to grycap/ec3, grycap/ansible-role-im or grycap/im, so I apologise if I made the wrong choice. I encountered the problem while testing EC3.

I've tried using the example templates for deploying a SLURM cluster, in my case on EGI FedCloud using the EGI CentOS 7 image. EC3 is using a local IM (running in a Docker container). The front-end node can be deployed successfully, however there are problems when the IM on the front-end node tries to create worker nodes:

Error querying the OCCI server: HTTPSConnectionPool(host='carach5.ics.muni.cz', port=11443): Max retries exceeded with url: /-/ (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_read_bytes', 'tlsv1 alert unknown ca')],)",),))

I assume this is due to the versions of python-requests/openssl etc installed on the front-end but I'm not exactly sure how to fix the problem. So far I've only found a slightly drastic workaround - I removed the grycap.im role from the frontend and added config to install Docker and run IM on the front-end using the indigodatacloud/im Docker image, as the IM Docker container doesn't have this problem.

Note that deploying a cluster using http://servproject.i3m.upv.es/ec3/ gives the same error when IM tries to create a worker node.

Error reconfiguring cluster

This error appears in the ec3 command in case of reconfiguring a cluster with the "reload" option but without the "aut_file" option:

  File "/usr/local/bin/ec3", line 1172, in run
    radl = CmdLaunch.generate_radl(templates, options.add if options.add else [], auth_content)
UnboundLocalError: local variable 'auth_content' referenced before assignment

Provide the capability of reconfiguring an active cluster with additional recipes

Sometimes I have a fully working cluster which I would like to extend with additional software packages. Or simply, I would like to work incrementally to identify issues.

It may be interesting to have a "reconfigure" command that installs a template module in the existing nodes, and (much more interesting, since this is easy to do already with IM), registers this changes in the template RADL used in CLUES for new nodes.

Thanks!

Templates dockers-compose and jupyter

Using a fresh pull from EC3 (im-rest branch) and attempting to issue:

ec3 templates

throws the error:

Error processing 'docker-compose': Line 1: Parse error in: LexToken(VAR,'compose',1,19)

If you remove the template docker-compose.radl the error is:

Error processing 'jupyter': Line 1: Parse error in: LexToken(VAR,'notebook',1,20)

If you remove jupyter.radl then everything works as expected.

Error with python3

When I try to run the command:

./ec3 launch oscar ubuntu16 kubernetes_oscar_latest -a auth.dat -u $IM_ENDPOINT

using python3.6 I get the following error:

Error launching front-end: cannot represent an object: <map object at 0x7f94ed797a90>

With python2.7 there is no problem.
I'm using Python 3.6.6 in Ubuntu 18.04

Support SSH commands in 'ec3 ssh'

When using ec3 ssh cluster-name to log in to a cluster it would be convenient to support passing parameters in order to remotely invoke commands via SSH (just as the ssh command supports).

Expected functionality:

As an example, issuing

ec3 ssh cluster-name ls -l /tmp

would result in listing the contents of the /tmp dir in the front-end node of the cluster identified by cluster-name

Error building docker image from "docker" directory.

Hi,

there is an error while building ec3 docker image from the "docker" directory. It appears that the requests package depends on pyrsistent package, which now requires python3.

docker build -t grycap/ec3:latest .
Sending build context to Docker daemon 2.048kB
Step 1/6 : FROM alpine:3.8
3.8: Pulling from library/alpine
486039affc0a: Pull complete
Digest: sha256:2bb501e6173d9d006e56de5bce2720eb06396803300fe1687b58a7ff32bf4c14
Status: Downloaded newer image for alpine:3.8
---> c8bccc0af957
Step 2/6 : LABEL maintainer="Germán Moltó [email protected]"
---> Running in e360906a42e5
Removing intermediate container e360906a42e5
---> 53efbc6c4393
Step 3/6 : LABEL version="2.0"
---> Running in b52dc0ddbcba
Removing intermediate container b52dc0ddbcba
---> 519015180b1a
Step 4/6 : LABEL description="Elastic Cloud Computing Cluster (EC3) - http://www.grycap.upv.es/ec3"
---> Running in ca803648bdf0
Removing intermediate container ca803648bdf0
---> 1cc61967395d
Step 5/6 : RUN apk add --no-cache py-pip python sshpass openssh-client && pip --no-cache-dir install ec3-cli && apk del py-pip
---> Running in f4c0e604b04b
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/community/x86_64/APKINDEX.tar.gz
(1/15) Installing openssh-keygen (7.7_p1-r4)
(2/15) Installing openssh-client (7.7_p1-r4)
(3/15) Installing libbz2 (1.0.6-r7)
(4/15) Installing expat (2.2.8-r0)
(5/15) Installing libffi (3.2.1-r4)
(6/15) Installing gdbm (1.13-r1)
(7/15) Installing ncurses-terminfo-base (6.1_p20180818-r1)
(8/15) Installing ncurses-terminfo (6.1_p20180818-r1)
(9/15) Installing ncurses-libs (6.1_p20180818-r1)
(10/15) Installing readline (7.0.003-r0)
(11/15) Installing sqlite-libs (3.25.3-r4)
(12/15) Installing python2 (2.7.15-r3)
(13/15) Installing py-setuptools (39.1.0-r0)
(14/15) Installing py2-pip (10.0.1-r0)
(15/15) Installing sshpass (1.06-r0)
Executing busybox-1.28.4-r3.trigger
OK: 66 MiB in 28 packages
Collecting ec3-cli
Downloading https://files.pythonhosted.org/packages/0a/a6/0069fcd2c167c7932928ae13cc8d648b1ccb725a3588e456ccf474b2605c/ec3-cli-2.0.1.tar.gz (64kB)
Collecting ply (from ec3-cli)
Downloading https://files.pythonhosted.org/packages/a3/58/35da89ee790598a0700ea49b2a66594140f44dec458c07e8e3d4979137fc/ply-3.11-py2.py3-none-any.whl (49kB)
Collecting PyYAML (from ec3-cli)
Downloading https://files.pythonhosted.org/packages/64/c2/b80047c7ac2478f9501676c988a5411ed5572f35d1beff9cae07d321512c/PyYAML-5.3.1.tar.gz (269kB)
Collecting jsonschema (from ec3-cli)
Downloading https://files.pythonhosted.org/packages/c5/8f/51e89ce52a085483359217bc72cdbf6e75ee595d5b1d4b5ade40c7e018b8/jsonschema-3.2.0-py2.py3-none-any.whl (56kB)
Collecting requests (from ec3-cli)
Downloading https://files.pythonhosted.org/packages/45/1e/0c169c6a5381e241ba7404532c16a21d86ab872c9bed8bdcd4c423954103/requests-2.24.0-py2.py3-none-any.whl (61kB)
Requirement already satisfied: setuptools in /usr/lib/python2.7/site-packages (from jsonschema->ec3-cli) (39.1.0.post20180508)
Collecting importlib-metadata; python_version < "3.8" (from jsonschema->ec3-cli)
Downloading https://files.pythonhosted.org/packages/8e/58/cdea07eb51fc2b906db0968a94700866fc46249bdc75cac23f9d13168929/importlib_metadata-1.7.0-py2.py3-none-any.whl
Collecting pyrsistent>=0.14.0 (from jsonschema->ec3-cli)
Downloading https://files.pythonhosted.org/packages/7d/ae/90ddcf28fb8eee5d4990920586d2856342e42faa95f39223f0b9762ef264/pyrsistent-0.17.2.tar.gz (106kB)
pyrsistent requires Python '>=3.5' but the running Python is 2.7.15
You are using pip version 10.0.1, however version 20.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The command '/bin/sh -c apk add --no-cache py-pip python sshpass openssh-client && pip --no-cache-dir install ec3-cli && apk del py-pip' returned a non-zero code: 1

A workaround can be to install specific version of pyrsistent before installing ec3-cli in Dockerfile:

RUN apk add --no-cache py-pip python sshpass openssh-client && \
     pip --no-cache-dir install pyrsistent==0.16.0 ec3-cli && \
     apk del py-pip

Add Transfer command

In case that the ec3 client has failed in some point and the infrastructure has not been transferred into the internal IM. Enable the client to transfer it with a command.

Unable to find template in ~/.ec3/templates

EC3 is meant to look for the templates in: ['./templates', '~/.ec3/templates', '/etc/ec3/templates']

I have checked that (at least in macOS) the folders './templates' and '/etc/ec3/templates' are searched for but not the templates in '~/.ec3/templates'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.