Giter VIP home page Giter VIP logo

ckan-cloud-operator's Introduction

CKAN Cloud Operator

CKAN Cloud Operator manages, provisions and configures CKAN Cloud instances and related infrastructure.

Components

  • Terraform configurations for first setup of a Kubernets cluster and peripheral services for multiple cloud providers:
    • AWS
    • GCP
    • Azure
    • Minikube for local development
  • ckan-cloud-operator CLI will manage the cluster and any other services necessary for day-to-day operations
  • Management server, comes preinstalled with ckan-cloud-operator, required tools (terraform, kubectl, helm, awscli etc.) and a Jenkins Server.

Quick Start

In order to start using ckan-cloud-operator, you need to

  1. Create a CKAN Cloud Operator working environment.

    You can choose to:

    • Use our pre-built Docker image
    • Run the AMI (on AWS)
    • Run the TBD (on GCP)
    • Run the TBD (on Azure)

    Note: While technically possible, we recommend not to run ckan-cloud-operator directly on you machine to avoid version incompatibilities between the various tools involved in the process. You should use one of our pre-built images or our Docker image instead.

  2. Create a Kubernetes cluster and provision it.

    • Instructions for AWS:

      • Create a cluster using terraform
      • Initialize the cluster using ckan-cloud-operator
    • Instructions for GCP:

      • Create a cluster using terraform
      • Initialize the cluster using ckan-cloud-operator
    • Instructions for Azure:

      • Create a cluster using terraform
      • Initialize the cluster using ckan-cloud-operator
    • Instructions for Minikube:

      • Initialize the cluster using ckan-cloud-operator
  3. Then you can create a CKAN Instance on the cluster:

    • Create a values file
    • Create the instance on the cluster
  4. (Optional) Set-up Jenkins and the Provisioning UI

Reference

ckan-cloud-operator's People

Contributors

akariv avatar aluminiumgeek avatar amercader avatar jqnatividad avatar orihoch avatar pdelboca avatar pwalsh avatar shevron avatar woodt avatar zelima avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ckan-cloud-operator's Issues

Running CCO on "minikube"

Job story

When I use and interact with CKAN Cloud Operator (CCO), I want to be able to run it in local development, so I can debug and develop the codebase and running CKANs in it without standing up a whole cluster on a major cloud provider.

Context

Currently, we have a range of manual steps that are under- or un-documented, to run CKAN Cloud. And, the assumption is that one is deploying onto a public cloud (GCP, AWS, Azure). This creates friction for both new users (who want to run the thing and see if it addresses their needs), and for DevOps team members already running CCO to manage clusters of CKAN (who need an easier way to standup clusters for testing, debugging, and development purposes).

Acceptance criteria

  • "Getting stared with CKAN Cloud" documentation that takes me from a terminal, to having CKAN Cloud Operator running on minikube on my local development machine
    • I can execute basic CCO commands
    • I can move on to next steps in documentation to deploy a CKAN instance into my setup
  • Any supporting code changes to run a working cluster on minikube

Note:

"Minikube" is not a strict requirement - any local kubernetes system could work.
We should choose one supported (free) local k8s flavor which is supported on most operating systems, and make sure documentation also includes links to installation instructions for that selected solution.

[research] problems related to varnish error on uwsgi/varnish based instances

when looking into the connections limit, I investigated old cluster as well and I think they are hitting connection limit as well
I think there might be some old extensions which run some daemons that place extra load and might cause cascade of errors

logging setup seems to have some problems, preventing getting logs in this case

Inconstancy in instance storage names causes failure on create from-deis

Trying to migrate opendatadenmark, but it's storage is named opendatadenmark-stage (instead of opendatadenmark-staging) and opendatadenmark-prod (instead of opendatadenmark) leading create from-deis in failure:

Initializing storage: gs://ckan-cloud-staging-storage/ckan/opendatadenmark
Traceback (most recent call last):
  ...
  File "/home/zelima/viderum/ckan-cloud-operator/ckan_cloud_operator/deis_ckan/storage.py", line 35, in _update
    raise NotImplementedError('Storage initialization for new instances is not supported yet')
NotImplementedError: Storage initialization for new instances is not supported yet

CI/CD for Azure deployments

Requires #78

WIP

We need to be able to set up a k8s cluster + everything around it (nfs, db, permissions) using terraform so that cco cluster initialize passes correctly.

Acceptance Criteria

  • We are able to set up Azure K8 cluster using terraform
  • We are able to do helm based deployment on that cluster
  • We can access the deployed running CKAN instance from browser

Tasks

  • Download and install terraform
  • Read about terraform, play around with it, maybe deploy hello world somewhere, somehow
  • Try to setup K8 environment manually first
    • Initialize Cluster
    • Deploy CKAN instance
  • Try to setup K8 environment with terraform

Walkthrough

Download and install terraform (linux 64bit)

# Install terraform
wget -O terraform.zip https://releases.hashicorp.com/terraform/0.12.18/terraform_0.12.18_linux_amd64.zip &&\
unzip terraform.zip &&\
sudo mv terraform /usr/local/bin/ &&\
terraform

# Intall Azure CLI and login
curl -L https://aka.ms/InstallAzureCli | bash
az login

# Create AZ service Princibal (user with limited perms)
SUBSCRIPTION_ID=your_subscription_id_from_login
az account set --subscription="${SUBSCRIPTION_ID}"
az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/${SUBSCRIPTION_ID}"

#!/bin/sh
echo "Setting environment variables for Terraform"
export ARM_SUBSCRIPTION_ID=your_subscription_id
export ARM_CLIENT_ID=your_appId
export ARM_CLIENT_SECRET=your_password
export ARM_TENANT_ID=your_tenant_id
export ARM_ENVIRONMENT=public

Analysis

QA

  • What are the manual steps?
  • What are semi-manual steps?
  • What info Do we need to start therefor? Eg superuser credentials from azure or similar...

create from-gcloud-envvars fails on creating datastore permission for some instances

I'm trying to migrate instance with database >5GB, but it's failing with

ERROR:  relation "none" does not exist
CONTEXT:  SQL statement "SELECT coalesce(
            (SELECT string_agg(
                'CREATE TRIGGER zfulltext BEFORE INSERT OR UPDATE ON ' ||
                quote_ident(relname) || ' FOR EACH ROW EXECUTE PROCEDURE ' ||
                'populate_full_text_trigger();', ' ')
            FROM pg_class
            LEFT OUTER JOIN pg_trigger AS t
                ON t.tgrelid = relname::regclass AND t.tgname = 'zfulltext'
            WHERE relkind = 'r'::"char" AND t.tgname IS NULL
                AND relnamespace = (
                    SELECT oid FROM pg_namespace WHERE nspname='public')),
            'SELECT 1;')"
PL/pgSQL function inline_code_block line 3 at EXECUTE

Reproduce.

  • Try migrating datahub instance from the old cluster. Creating an Instance ckan-cloud-operator deis-instance create from-gcloud-envvars ....
  • Since the datastore dump is big, import fails with timeout so add the following lines to db.py
while 1:
    imported = subprocess.check_output('gcloud sql operations list --instance=ckan-cloud-staging --project=ckan-cloud --sort-by=start --limit=2', shell=True).decode("utf-8")
    print(imported)
    if not 'RUNNING' in imported:
        break
    time.sleep(60)

CI/CD for AWS deployments

Related to #69 and #77

Description

Write a CI job that demonstrates the AWS deployment target for CCO.

  • [need credentials for a real sandbox account on AWS for these tests]
  • use terraform default configuration to set up environment required by CCO on AWS:
    • Create RDS instance
    • Create EKS cluster
    • Create IAM roles if necessary
  • ckan-cloud-operator cluster initialize should run cleanly
    • no extra step or configuration should be required out of terraform + cco cluster init
  • deploy a ckan instance with all required services
  • write tests that show CKAN is responding as expected for read and and writes
    • Same ones as the ones running in #77
  • teardown the environment on AWS

Acceptance criteria

  • Deployment of a CKAN instance into the cluster is green
  • CCO and its AWS dependencies are deployed into the GSA sandbox via CircleCI

Authentication problem with solr

Irakli Mchedlishvili [Today at 1:48 PM]
Getting this after migrating the first instance on the production

Traceback (most recent call last):
  File "/usr/bin/paster", line 11, in <module>
    sys.exit(run())
...
  File "/usr/lib/python2.7/httplib.py", line 736, in __init__
    (self.host, self.port) = self._get_hostport(host, port)
  File "/usr/lib/python2.7/httplib.py", line 777, in _get_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: '***@***.searchstax.com'

we need to set:

CKAN_SOLR_USER
CKAN_SOLR_PASSWORD

the change should be done here:
https://github.com/ViderumGlobal/ckan-cloud-operator/blob/master/ckan_cloud_operator/deis_ckan/envvars.py#L49 (edited)

and in infra secret - add the username / password as well in separate keys

create from-gcloud-envvars should not expect ckan-init commands by default

Got the following error when trying to create instance from gcloud envvars:

...
  File "/usr/src/ckan-cloud-operator/ckan_cloud_operator/deis_ckan/ckan.py", line 12, in <lambda>
    self.instance.annotations.update_status('ckan', 'created', lambda: self._create())
  File "/usr/src/ckan-cloud-operator/ckan_cloud_operator/deis_ckan/ckan.py", line 45, in _create
    for cmd in self.instance.spec.spec.get('ckan', {}).get('init'):
TypeError: 'NoneType' object is not iterable

Looking at the create code spec returned by function does not contain {ckan: {init: [commands...]}} (unlike the spec return by from-gitlab).

On the other hand ckan.py expects it to be iterable

    def _create(self):
        for cmd in self.instance.spec.spec.get('ckan', {}).get('init'):
        ...

This should help:

    def _create(self):
        for cmd in self.instance.spec.spec.get('ckan', {}).get('init', []):

Helm-based deployments as canonical deployment method

Job story

When I use CCO, I want a clear method of deployment, supported by good documentation, so I can deploy apps according to CCO best practice.

Context

Currently, we have multiple deployment methods - a "deis" style deployment designed originally to support migration of legacy cloud software at Viderum, and two different helm-based deployments. This is confusing.

Acceptance criteria

  • Review all the code for helm-based deployments, refactor as needed so there is a single helm-based setup for CKAN and non-ckan instances deployed in a CKAN Cloud
  • Document helm-based deployments in necessary detail for newcomers
  • Delegate any "old style" deployment documentation to a "deprecated" section of the documentation
  • Demo example of deploying CKAN using this deployment method in minikube
  • Demo example of deploying any other app using this deployment method in minikube

Expose the state that CCO manages within a cluster

Job story

When I administer infrastructure that CCO manages, I want to be able to rebuild/reproduce the entire state of that cluster, so that I have the technical means to support disaster/recovery scenarios that require the establishment of an entire new infrastructure environment.

Context

CCO produces/manages a lot of state about the configuration of the cluster and related services, as well as the configuration of the multiple CKANs that run within. This state can be written into a format that can be imported and exported, and this is desirable for scenarios

Acceptance criteria

  • An API to export CCO-managed cluster state
  • An API to import CCO-managed cluster state
  • Tests to demonstrate functionality, executed via CI/CD
  • Documentation

check_call() got an unexpected keyword argument 'gsutil'

ckan-cloud-operator deis-instance create from-gcloud-envvars ... fails with the following error

  File "/usr/src/ckan-cloud-operator/ckan_cloud_operator/deis_ckan/db.py", line 18, in <lambda>
    is_created = self.instance.annotations.update_status(self.db_type, 'created', lambda: self._create())
  File "/usr/src/ckan-cloud-operator/ckan_cloud_operator/deis_ckan/db.py", line 69, in _create
    self._import_gcloud_sql_db()
  File "/usr/src/ckan-cloud-operator/ckan_cloud_operator/deis_ckan/db.py", line 169, in _import_gcloud_sql_db
    self._set_gcloud_storage_sql_permissions(importUrl)
  File "/usr/src/ckan-cloud-operator/ckan_cloud_operator/deis_ckan/db.py", line 146, in _set_gcloud_storage_sql_permissions
    gsutil=True
TypeError: check_call() got an unexpected keyword argument 'gsutil'

Looking at the check_call from gcloud.py it does not expect gsutil argument:

def check_call(cmd, project=None, with_activate=True):
    if with_activate: activate()
    ...

So I assume there should be with_activate=True instead of gsutil=True in db.py

Using cloud-native object storage for file storage

Job story

When I standup a cluster of CKANs managed by CCO, I want to provide a configuration for all CKANs hosted therein to use object storage that is provided by the cloud platform, rather than the current method of using minio, so I can leverage the redundancy and availability promises of those services, and reduce the complexity of the software running within the cluster I manage.

Context

We currently use minio running inside the cluster for file storage that backs data in our CKAN instances. We want to use the object storage solutions provided by Cloud providers (S3, Blob Storage, Google Cloud Storage), and this is a requirements for several customers. In such cases, we'd like to also ensure that minio is not deployed to a cluster (we don't want to deploy any dependencies that are not actively in use)

Acceptance criteria

  • Provide an option to not deploy minio to a cluster
  • Configure CKANs in the cluster (using cloudstorage extension or similar) to use the cloud-native object storage solution
  • Documentation for this feature

CI/CD for minikube deployments

Requires #68

Description

Write a CI job that demonstrates the minikube deployment target for CCO.

  • download and install minikube
  • deploy cco into minikube
  • configure cco as needed
  • deploy a ckan instance with all required services
  • write tests that show CKAN is responding as expected for read and and writes

Acceptance criteria

  • With the deployment environment is available (minikube running) all other steps for configuration are automated with commands from CCO
  • Deployment of a CKAN instance into the cluster is green

Create (versioned) image for CCO and its direct dependencies

Story

When I administer a CCO-managed cluster, I want to ensure that CCO is built correctly, with a known version, and deployed in a deterministic way, so that I can manage CCO changes and security updates and quickly incorporate them into a running cluster via immutable deployment.

As an operator, I want the CCO management server to be provisioned from an image using an automated build process so that any changes or security updates are quickly incorporated into the running instance via immutable deploy.

Context

We want to be able to deploy a know working bundle of CCO and its dependencies, tagged to a version, to any CKAN Cloud infrastructure.

Acceptance criteria

  • Docker image that bundles a tagged version of CCO and any direct dependencies (i.e.: Jenkins)
  • CI/CD for building this docker image
    • Ensure core linting and testing on each build
  • Integration with the terraform templates that standup infrastructure, so that the "CCO management server" is deployed as part of that process
  • Documentation on preferred use of the CCO in this way

delete does not delete databases

ckan-cloud-operator deis-instance delete --force aridhia-staging fails to remove databases. Looking at the logs:

 ERROR:  must be owner of database aridhia-staging E 
 STATEMENT:  DROP DATABASE IF EXISTS "aridhia-staging" I 

create from-gcloud-envvars fails on setting permissions to cloud storage for import to sql

ckan-cloud-operator deis-instance create from-gcloud-envvars ... fails with

ERROR: (gcloud) Invalid choice: 'acl'.
Maybe you meant:
  gcloud compute health-checks
  gcloud compute http-health-checks
  gcloud compute https-health-checks
  gcloud sql backups

Happens when the following command is executed: gcloud --project=ckan-cloud acl ch -u [email protected]:R gs://my.backup.dump.sql

Looking at the gloud docs this command should actually be executed via gsutil. So it should be

gsutil --project=ckan-cloud acl ch -u [email protected]:R gs://my.backup.dump.sql

As a solution probably we should add gsutil flag instead of replacing it with with_activate as described in #4 and run subcommand with gsutil if true. Smth like

def check_call(cmd, project=None, with_activate=True, gsutil=False):
    if with_activate: activate()
    if not project:
        infra = CkanInfra()
        project = infra.GCLOUD_AUTH_PROJECT
    cli_tool = 'gsutil' if gsutil else 'gcloud'
    return subprocess.check_call(f'{cli_tool} --project={project} {cmd}', shell=True)

Create values.yaml with CCO CLI

As a CCO user, I want to run 1 or few cco commands (maybe with prompts), and create values.yaml, so that I don't have to write by myself or copy from elswhere

As a CCO user, I want cco to have a default/general values.yaml to be created when deploying via helm, so that I don't need to spend time on searching what it is and create it by myself.

Acceptance Criteria

  • cco has command to create values.yaml
  • cco uses some deful valyes.yaml if not defined

Tasks

  • Do analysis
  • Refactor code

Not all envvars are set correctly when creating from-google-envvars

While this works fine for some instances, I've just got an error from migrating meerbusch

  File "/home/zelima/datopian/viderum/ckan-cloud-operator/ckan_cloud_operator/deis_ckan/envvars.py", line 68, in _update
    for k, v in envvars.items()}}
  File "/home/zelima/datopian/viderum/ckan-cloud-operator/ckan_cloud_operator/deis_ckan/envvars.py", line 68, in <dictcomp>
    for k, v in envvars.items()}}
AttributeError: 'NoneType' object has no attribute 'encode'

Looking at the values of the envvars I've got unexpected database URLs:

'CKAN_SQLALCHEMY_URL': 'postgresql://meerbusch:None@*.*.*.*:****/meerbusch',
'CKAN__DATAPUSHER__URL': None,
'CKAN__DATASTORE__READ_URL': 'postgresql://None:None@*.*.*.*:****/meerbusch-datastore',
'CKAN__DATASTORE__WRITE_URL': 'postgresql://meerbusch-datastore:None@*.*.*.*:****/meerbusch-datastore'
etc...

Note

  • CKAN_SQLALCHEMY_URL: passowrd in None
  • CKAN__DATAPUSHER__URL: is None
  • CKAN__DATASTORE__READ_URL: both user and password is None

Original CKAN__DATAPUSHER__URL looks like this https://datapusher-de.l3.ckan.io/. Usually, they look like this https://datapusher-1.ckan.io/

The script can't pass this line https://github.com/ViderumGlobal/ckan-cloud-operator/blob/master/ckan_cloud_operator/deis_ckan/datapusher.py#L16

routes = kubectl.get(
                f'CkanCloudRoute -l ckan-cloud/datapusher-name={datapusher_name},ckan-cloud/route-type=datapusher-subdomain',
                required=False
            )

routes look like this if I print

{'apiVersion': 'v1', 'items': [], 'kind': 'List', 'metadata': {'resourceVersion': '', 'selfLink': ''}}

Waiting for ready status fails

Trying to migrate one of deis instances. Fails on the Waiting for ready status part with following

Traceback (most recent call last):
  ...
  File "/home/zelima/viderum/ckan-cloud-operator/ckan_cloud_operator/deis_ckan/instance.py", line 315, in <dictcomp>
    k: v for k, v in data if k not in ['ready'] and not v.get('ready')
ValueError: too many values to unpack (expected 2)

Failing after this commit 67b3e35#diff-25a7b8327c37a646d868ce662a190d78R312

k: v for k, v in data if k not in ['ready'] and not v.get('ready')

Looking at the data it is the dict, so I assume the line above is missing data.items()

Handle different versions of helm on different clusters

As a cco user trying to deploy with helm, I had to reinstall helm version on my local machine every time I switched between clusters (azrure, ckan-cloud, minikube), cause different versions if it were installed on each of them.

As a cco user, I want that to be handled efficiently Eg: cco checks for helm version on a cluster and asks me to run cco helm init or similar, and installs the proper version of it locally (or maybe re-installs latest version on both sides)

Acceptance Criteria

  • I don't have to manually install helm every time there is inconsistency

Tasks

  • Do analysis decide if we can handle this (is it worth spending time at all?)
  • Propose solution if reasonable

Create storage classes with CCO

As a cco user, who does not know much about storage-classes, I want cco to take care of its creation, so that I don't need to spend time understanding what they are and how to configure them properly

As a cco user, I want to be able to configure storage-classes Eg choose which provisioner to use (azure, aws, google etc...), so that I'm able to use cloud provider of my choice

As a cco user, I want to have a general understanding of what storage-classes are and what they are for so that I'm able to configure properly

Acceptance criteria

  • Have small description of what storage classes are (maybe in command help or readme)
  • cco creates storage-classes for me if I don't specify
  • cco give me the ability to configure storage classes (eg prompts me with proper questions)

Tasks

  • Do analysis
  • Write small description/doc about what it is for and here to read more
  • Refactor code to address user stories

Image build fails due to dependency installation failure

Due to the latest version (19.0.1) of pip pypa/pip#6197 installation for some packages fail while building docker images for the deis instances.

Exception:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/pip/_internal/cli/base_command.py", line 176, in main
    status = self.run(options, args)
  File "/usr/lib/python2.7/site-packages/pip/_internal/commands/install.py", line 346, in run
    session=session, autobuilding=True
  File "/usr/lib/python2.7/site-packages/pip/_internal/wheel.py", line 886, in build
    assert have_directory_for_build
AssertionError

We are always installing the latest version of pip after dcf1038#diff-0e56af39d909bba5a78bd065247a8f63R70

2 options for solution:

Think the 2nd option is fine for now. Just tried manually and passed

Instance create fails complaining about config values

Tried to create an instance in minikube with cco (replicating this script)

ckan-cloud-operator ckan instance create helm values.yaml --instance-id=second --instance-name=second --exists-ok --wait-ready --update

Where values.yaml looks like this (from https://github.com/ViderumGlobal/ckan-cloud-helm/blob/master/minikube-values.yaml, but fails as well with https://github.com/ViderumGlobal/ckan-cloud-helm/blob/master/aws-values.yaml).

replicas: 1
nginxReplicas: 1
terminationGracePeriodSeconds: 1
datastoreDbTerminationGracePeriodSeconds: 1
dbTerminationGracePeriodSeconds: 1
ckanJobsTerminationGracePeriodSeconds: 1
ckanJobsDbTerminationGracePeriodSeconds: 1

# use the latest unstable version of the helm chart
ckanHelmChartVersion: v0.0.0
ckanHelmChartRepo: https://raw.githubusercontent.com/ViderumGlobal/ckan-cloud-helm/master/charts_repository

Getting assertion error:

2019-11-22 09:06 INFO (instance_id="second") Creating instance
Error from server (NotFound): configmaps "operator-conf" not found
Traceback (most recent call last):
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/bin/ckan-cloud-operator", line 11, in <module>
    load_entry_point('ckan-cloud-operator', 'console_scripts', 'ckan-cloud-operator')()
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/zelima/viderum/cco/ckan-cloud-operator/ckan_cloud_operator/providers/ckan/instance/cli.py", line 34, in create
    wait_ready=wait_ready, skip_deployment=skip_deployment, skip_route=skip_route, force=force)
  File "/home/zelima/viderum/cco/ckan-cloud-operator/ckan_cloud_operator/providers/ckan/instance/manager.py", line 46, in create
    spec=values
  File "/home/zelima/viderum/cco/ckan-cloud-operator/ckan_cloud_operator/crds/manager.py", line 108, in get_resource
    _, kind_suffix = _get_plural_kind_suffix(singular)
  File "/home/zelima/viderum/cco/ckan-cloud-operator/ckan_cloud_operator/crds/manager.py", line 210, in _get_plural_kind_suffix
    parts = config_manager.get(f'installed-crd-{singular}', required=True).split(',')
  File "/home/zelima/viderum/cco/ckan-cloud-operator/ckan_cloud_operator/config/manager.py", line 21, in get
    assert value or not required, f'config value is required for {cache_key}:{key}'
AssertionError: config value is required for configmap:ckan-cloud:operator-conf:installed-crd-ckaninstance

ckan-cloud-operator cluster initialize --interactive fails to get CCO image tag

I've tried to initialize cluster for minikube, but it's failing when trying to get the expected cco image tag from config manager. That's happening cause config-map is not initialized at that point at all (see analysis)

Acceptance Criteria

  • I don't see this error
 assert '@' not in expected_image and ':' in expected_image, f'invalid expected image: {expected_image}'
TypeError: argument of type 'NoneType' is not iterable

Tasks

  • [ ] move print_info after config-map is initialized (or remove at all)

Analysis

Traceback:

ckan-cloud-operator cluster initialize --interactive
2019-11-25 14:45 INFO Starting interactive initialization of the operator on the following cluster:
Error from server (NotFound): namespaces "ckan-cloud" not found
Traceback (most recent call last):
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/bin/ckan-cloud-operator", line 11, in <module>
    load_entry_point('ckan-cloud-operator', 'console_scripts', 'ckan-cloud-operator')()
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/zelima/anaconda3/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/zelima/viderum/cco/ckan-cloud-operator/ckan_cloud_operator/providers/cluster/cli.py", line 27, in initialize
    manager.initialize(interactive=interactive, default_cluster_provider=cluster_provider, skip_to=skip_to)
  File "/home/zelima/viderum/cco/ckan-cloud-operator/ckan_cloud_operator/providers/cluster/manager.py", line 47, in initialize
    print_info(minimal=True)
  File "/home/zelima/viderum/cco/ckan-cloud-operator/ckan_cloud_operator/providers/cluster/manager.py", line 30, in print_info
    print(yaml.dump([dict(get_kubeconfig_info(), nodes=get_node_names(), operator_version=get_operator_version(verify=True))], default_flow_style=False))
  File "/home/zelima/viderum/cco/ckan-cloud-operator/ckan_cloud_operator/providers/cluster/manager.py", line 16, in get_operator_version
    expected_image_tag = _get_expected_operator_image_tag()
  File "/home/zelima/viderum/cco/ckan-cloud-operator/ckan_cloud_operator/providers/cluster/manager.py", line 274, in _get_expected_operator_image_tag
    assert '@' not in expected_image and ':' in expected_image, f'invalid expected image: {expected_image}'
TypeError: argument of type 'NoneType' is not iterable

The thing is that print_info(minimal=True) on line 47 is trying yo get the operator version from config-map:operator-conf that is created on line 54

def initialize(log_kwargs=None, interactive=False, default_cluster_provider=None, skip_to=None):
    if interactive and not skip_to:
        logs.info('Starting interactive initialization of the operator on the following cluster:')
        print_info(minimal=True)
        input('Verify your are connected to the right cluster and press <RETURN> to continue')
        logs.info(f'Creating operator namespace: {OPERATOR_NAMESPACE}', **(log_kwargs or {}))
        subprocess.call(f'kubectl create ns {OPERATOR_NAMESPACE}', shell=True)
        assert default_cluster_provider in ['gcloud', 'aws'], f'invalid cluster provider: {default_cluster_provider}'
        subprocess.call(f'kubectl -n {OPERATOR_NAMESPACE} create secret generic ckan-cloud-provider-cluster-{default_cluster_provider}')
        subprocess.call(f'kubectl -n {OPERATOR_NAMESPACE} create configmap operator-conf --from-literal=ckan-cloud-operator-image=viderum/ckan-cloud-operator:latest --from-literal=label-prefix={OPERATOR_NAMESPACE}')

The fix would be to remove that print_info at all or move it after config-map is already created

solr_config names are not always returned

kubectl -n solr exec zk-0 zkCli.sh get /collections/{old_instance_id} 2>&1 not alwais returns the solr config name in. https://github.com/ViderumGlobal/ckan-cloud-operator/blob/master/ckan_cloud_operator/deis_ckan/instance.py#L307-L309 Resulting create to fail. Eg:

KUBECONFIG=~/.kube/config.deis kubectl -n solr exec zk-0 zkCli.sh get /collections/opendatadenmark-staging
Connecting to localhost:2181

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
Node does not exist: /collections/opendatadenmark-staging

create from-gcloud-envvars errors with not enough values to unpack

ckan-cloud-operator deis-instance create from-gcloud-envvars need exactly 6 arguments to run but for some reason instance.py ignores last 2 and leads to ValueError

ValueError: not enough values to unpack (expected 6, got 4)

removing -2 from instance_env_yaml, image, solr_config, gcloud_db_url, gcloud_datastore_url, instance_id = args[1:-2] solves the issue

ckan-cloud-operator deis-instance create from-gcloud-envvars results in TypeError

Trying to create an instance from gcloud envvars (with actual variables).

ckan-cloud-operator deis-instance create from-gcloud-envvars a b c d e f

This results in following:

Traceback (most recent call last):
  File "/opt/conda/envs/ckan-cloud-operator/bin/ckan-cloud-operator", line 11, in <module>
    load_entry_point('ckan-cloud-operator', 'console_scripts', 'ckan-cloud-operator')()
  File "/opt/conda/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/envs/ckan-cloud-operator/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
TypeError: deis_instance_create_from_gcloud_envvars() got an unexpected keyword argument 'path_to_instance_env_yaml'

Looking at the code function awaits args, but it seems to be getting kwargs

@deis_instance_create.command('from-gcloud-envvars')
@click.argument('PATH_TO_INSTANCE_ENV_YAML')
@click.argument('IMAGE')
@click.argument('SOLR_CONFIG')
@click.argument('GCLOUD_DB_URL')
@click.argument('GCLOUD_DATASTORE_URL')
@click.argument('NEW_INSTANCE_ID')
def deis_instance_create_from_gcloud_envvars(*args):
    """Create and update an instance from existing DB dump stored in gcloud sql format on google cloud storage."""
    DeisCkanInstance.create('from-gcloud-envvars', *args).update()
    great_success()

an easy fix would be?

...
def deis_instance_create_from_gcloud_envvars(path_to_env, image, solr_conf, db_url, datastore_ur, instanc_id):
    """Create and update an instance from existing DB dump stored in gcloud sql format on google cloud storage."""
    DeisCkanInstance.create('from-gcloud-envvars', path_to_env, image, solr_conf, db_url, datastore_ur, instanc_id).update()
    great_success()

add solr metrics to prometheus

we have a prometheus + grafana installation for metrics (installed via https://github.com/GoogleCloudPlatform/click-to-deploy/tree/master/k8s/prometheus in prometheus namespace)

this allows to add solr specific metrics, the general flow for adding metrics to prometheus is:

  1. run an exporter which exposes the metrics via HTTP
  2. edit the prometheus configmap and add a job that scrapes it

options to expose the metrics:

Automating infrastructure deployments for CCO-managed clusters

Job story

When I standup a cluster of CKANs managed by CCO, I want to be able to automate the required infrastructure deployment in a platform-agnostic way, so I have a consistent foundation for infrastructure resource management that is code-driven and does not require manual configuration steps.

Context

Currently, we have a range of manual steps that are under- or un-documented, to run CKAN Cloud, and much of these are related to standing up the k8s configuration required to run CCO within. Further, we now have CCO deployments on Azure, AWS and GCP, and each of these deployment targets has requirements for infrastructure configuration that we want to ensure are reproducible in code. A tool like Terraform may provide us with the necessary foundation to write platform-agnostic deployments for CCO infrastructure, at least targeting Azure, AWS, and GCP.

Acceptance criteria

  • A CLI and a method of configuration for standing up k8s-managed clusters in which to run CCO
    • Prefer an existing tool like Terraform that is designed for this
    • Documented support for Azure, AWS, GCP
  • Default values/configuration for this CLI as part of the CCO repository or else another repository for this specific purpose
  • CCO Documentation that walks the user through exactly how this tool automates configuration of the required infrastructure, and describes the various configuration points available

ckan-cloud-operator-env activate {my-env} fails

I've added the new env, but can't activate. Failing with:

cp: cannot stat '/usr/local/bin/ckan-cloud-operator-': No such file or directory
Failed to create executable, try running with sudo

It's missing ENVIRONMENT_NAME="${2}" on this line

cluster backup and restore

High-level issue for all backup/restore requirements

backup

  • Heptio ARK for storage (handled in datopian/ckan-cloud-cluster#7)
  • Google Cloud SQL Automated backups for DB
  • Custom solution for individual DB SQL backups (migrated from old cluster)

restore

  • Restore storage server - using Heptio ARK
  • Restore DB instance - using Google Cloud SQL
  • Restore individual instance storage:
    • Create a new minio server from backup
    • Use Minio client to copy relevant storage data
  • Restore individual instance DB:
    • Restore from the DB sql backups, similar to migration process

research connections limit and suggest short and long-term solutions

currently we are hitting a connections limit due to large number of instances and low limit from Google Cloud SQL which is not configurable: https://cloud.google.com/sql/docs/quotas

We added pgbouncer but still hitting some limits

Connection Estimates

Per Instance:

  • 7-8 connections average
  • 24 connections max

Number of instances: ~50

Errors

errors from cloud sql:

psql: FATAL:  remaining connection slots are reserved for non-replication superuser connections

after installing connection pooler (pgbouncer) we get the following error:

psql: ERROR:  no more connections allowed (max_client_conn)

DB configuration

https://github.com/ViderumGlobal/ckan-cloud-operator/blob/master/docs/PRODUCTION-GCLOUD-CLUSTER.md#create-the-db

PgBouncer configuration

https://github.com/ViderumGlobal/ckan-cloud-operator/blob/master/ckan_cloud_operator/providers/db_proxy/pgbouncer/manager.py#L94

Don't deploy dependencies that are not used in a given deployment

Job story

When I deploy CCO into a cluster to provide multi-tenant management of CKAN, I want to be able to control the services that are deployed, so that I do not run services I do not use and that services I do not use do not consume cluster resources.

Context

The code base has services running (ELK?), Minio, that may not be in use on a given CCO deployment.

Acceptance criteria

  • Run a command or a set of commands to see what services exactly are being deployed in a CCO managed cluster to support multi-tenant CKAN
  • Ability in cone/configuration to enable/disable certain services
    • Monitoring of the cluster itself
    • Software that supports CCO actions (e.g: Jenkins)
    • Subservices to support CKANs deployed in cluster
  • Documentation on how to enable/disable deployment of certain services

minio storage migration fails on some instances

reproduction

Using Rancher, deploy a minio client image (docker image = minio/mc) with entrypoint /bin/sh -c "while true; do sleep 86400; done"

Execute a shell on the pod and run the following inside the minio client shell to setup the relevant hosts

mc config host add edge https://cc-e-minio.ckan.io MINIO_ACCESS_KEY MINIO_SECRET_KEY
mc config host add deis https://minio.l3.ckan.io MINIO_ACCESS_KEY MINIO_SECRET_KEY

Create the target bucket (if not exists)

mc mb edge/ckan

mirror the data

mc mirror --overwrite --watch -a deis/ckan edge/ckan

expected

all data migrated, no errors

actual

getting errors from some instances, currently seen from bluedot-prod and ni

mc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-07-03-135331.159778slika-1.jpg`. unexpected EOF
mc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-08-09-165121.974039DataPortalDemographicsIcon.svg`. unexpected EOF
...itamins-healthy-eating-52533.jpeg:  1.69 GB / 29.93 GB ┃▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░┃ 14.97 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-08-09-165441.372661DataPortalSocioEconomicIcon.svg`. unexpected EOF
mc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-08-09-165345.010611DataPortalEnvironmentIcon.svg`. unexpected EOF
mc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-08-09-165221.583565DataPortalDiseaseDataIcon.svg`. unexpected EOF
...26-095248.842674Governanceimg.jpg:  1.69 GB / 29.93 GB ┃▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░┃ 14.97 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-08-09-165845.556925DataPortalPlacesIcon.svg`. unexpected EOF
...28-101711.810223Governanceimg.jpg:  1.69 GB / 29.93 GB ┃▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░┃ 14.97 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-08-09-165604.751395DataPortalVectorsIcon.svg`. unexpected EOF
mc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-09-26-095248.842674Governanceimg.jpg`. unexpected EOF
...19-042415.998563images---free.png:  1.69 GB / 29.93 GB ┃▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░┃ 14.97 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-09-28-101752.052968orange-fruit-vitamins-healthy-eating-52533.jpeg`. unexpected EOF
...19-042451.217800images---free.png:  1.70 GB / 29.93 GB ┃▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░┃ 14.97 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-09-26-095217.375427Governanceimg.jpg`. unexpected EOF
mc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-09-28-101711.810223Governanceimg.jpg`. unexpected EOF
...9.797839ifAppliances-01976597.png:  1.70 GB / 29.93 GB ┃▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░┃ 14.98 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-08-09-165743.312159DataPortalMobilityIcon.svg`. unexpected EOF
mc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-08-15-084732.487761orange-fruit-vitamins-healthy-eating-52533.jpeg`. unexpected EOF
...ola-infrastructure-population.jpg:  1.71 GB / 29.94 GB ┃▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░┃ 15.00 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/storage/uploads/group/2017-08-16-132732.681336orange-fruit-vitamins-healthy-eating-52533.jpeg`. unexpected EOF
...-sanitation-hygiene-diarrhoea.csv:  1.90 GB / 31.18 GB ┃▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░┃ 15.99 MB/s
...64a2896c/globalriversnaturalearth:  3.02 GB / 31.89 GB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░┃ 20.39 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/bluedot-prod/resources/51d0a3c1-9912-432f-bf52-2a41805ab815/freshwatersnailranges.dms`. 500 Internal Server Error
...-412b-8e13-920d44732242/test7.dat:  133.75 GB / 159.69 GB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░┃ 33.51 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/ni/resources/08aed413-b895-43b9-b52a-d93dd9869283/draft-pfg-framework-indicators-senior-responsible-owners.csv`. unexpected EOF
...e6c01e897/strabanegullies.geojson:  134.05 GB / 160.04 GB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░┃ 33.50 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/ni/resources/175d2d90-a97f-4ecd-8295-9664f96ffb93/school-level-nursery-school-enrolment-data-2016-2017.csv`. unexpected EOF
...y-provider-15-16-daycase-fces.csv:  134.39 GB / 160.35 GB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░┃ 33.50 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/ni/resources/21b08856-ae65-4cdb-805b-6955b2808b64/invest-ni-support-m-to-edos-universities-at-council-level-2011-12-t....csv`. unexpected EOF
...ranslink-metro-bus-routes.geojson:  134.90 GB / 161.25 GB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░░┃ 33.51 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/ni/resources/2dce397e-1215-4d8a-be86-f93fb1ea9299/opendatani-june-september.csv`. unexpected EOF
...d-45d8-9023-9085b87349c8/2013.csv:  136.09 GB / 162.26 GB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░┃ 33.47 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/ni/resources/4dbfd74c-bf54-4e31-9934-add3a984e6df/by-provider-by-specialty-15-16-non-elective-long-stay-unit-costs.csv`. unexpected EOF
...b9f7-33233c0db629/vehicle2016.csv:  136.12 GB / 162.29 GB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░┃ 33.47 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/ni/resources/4ddf163e-3e9c-4fc2-a416-cae7f19e7a15/by-provider-13-14-elective-costs.csv`. unexpected EOF
...0b6/noids-report-2018-week-15.csv:  136.34 GB / 162.52 GB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░┃ 33.47 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/ni/resources/550d9079-e669-427e-8e57-2da3e30fb6a3/specialist-services-unit-cost-14-15.csv`. unexpected EOF
...ad-865c-1b29b4e81054/table-21.csv:  136.83 GB / 162.88 GB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░┃ 33.44 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/ni/resources/628f4401-afb8-4161-9d91-53a600271207/opendatani-suggetsed-datasets-nov-april.csv`. unexpected EOF
...ni-water-customer-tap-results.csv:  137.22 GB / 163.15 GB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓█░░░░░░░░░░░░░░░░░░░░░░┃ 33.43 MB/smc: <ERROR> Failed to copy `https://minio.l3.ckan.io/ckan/ni/resources/711b18d1-8c06-4468-9f67-a6776329e8af/by-provider-13-14-elective-fces.csv`. unexpected EOF

[Docs] What is values.yaml file and how to properly write it

Helm based deployment assumes that user already has values.yaml when deploying to k8 with helm.

As a newbie CCO user, I want to know what exactly values.yaml is, what it is for, if it is required and how to create it (and maybe have an example of it) so that I'm able to perform helm based deployments.

As a CCO user, I want to know what is the conversion of writing values there, EG how would ckan.storaga_path should be written there, so that I'm able to set CKAN configurations in it.

As a CCO user, I want to know what other values can be set and what they are for, Eg replicas , terminationGracePeriodSeconds etc.. so that I know what I'm doing when I'm creating values.yaml

Acceptance Criteria

  • We have a separate section in docs dedicated to values.yaml
  • I'm (or somebody from the team) is able to create it and deploy CKAN instance

Tasks

  • Gather the need info and answer the user stories and
  • Write the README

[task] ckan instance manual testing and load testing

This issue is a task which could be performed in parallel for multiple instances.

All testing should be done on production ckan-cloud environment.

Migrate an instance

This step should run by a ckan-cloud developer which has ckan-cloud-operator installed and configured locally.

start the gcloud sql proxy locally -

ckan-cloud-operator ckan-infra cloudsql-proxy

keep running in background, and in a new terminal -

set env var to use the proxy and migrate an instance -

export CKAN_CLOUD_OPERATOR_USE_PROXY=yes

ckan-cloud-operator deis-instance migrate OLD_SITE_ID NEW_INSTANCE_ID ROUTER_NAME
  • OLD_SITE_ID - The old cluster site ID of the instance to migrate
  • NEW_INSTASNCE_ID - The new cluster instance ID (usually the same as OLD_SITE_ID)
  • ROUTER_NAME - New cluster router name that will be used to route external traffic (usually production-1)

If the migration fails, re-run the migrate command until it works and be patient as it could take some time for instance to be fully migrated and working.

Wait for the site to be online and make sure datasets are displayed:

https://cc-p-NEW_INSTANCE_ID.ckan.io/

Test an instance

This step could be done without prior knowledge about ckan-cloud or ckan-cloud-operator

Browse through the instance datasets, groups and organizations and verify that there are not broken images or error pages.

Log-in to the relevant CKAN instance with admin username/password

important check the browser URL after log-in to make sure you are logged-in to the relevant testing instance

With a logged-in admin account:

  • Add a group, organization, private and public dataset, make sure they are displayed properly.
    • Fill-in as many fields as possible, upload an image, etc..
  • Look for additional modules / options and validate them
  • Search for the group/organization/datasets, ensure they are returned in search results.

Log-out and verify the site and created resources with an anonymous account:

  • Search for the group/organization/datasets, ensure they are returned in search results.
  • Look for additional modules / options and validate them

Log-in and deleted the created resources:

  • Delete the created group, organization and datasets
  • make sure they are deleted properly.

Load test an instance

  • Log-in to Octoperf
  • Add a new project: cc-p-INSTANCE_ID
    • Create by Website or Rest API
    • Static Website
    • Paste URLs to the instance homepage, groups, organizations, datasets
    • enable - parse response to extract subpages
  • Create VU
  • set scenario settings
  • Create Scenario
  • Run scenario

Fail to create package at datagov-theme instance

I run the datagov instance with:

./create_secrets.py
docker-compose down -v
docker-compose -f docker-compose.yaml -f .docker-compose-db.yaml -f .docker-compose.datagov-theme.yaml pull
docker-compose -f docker-compose.yaml -f .docker-compose-db.yaml -f .docker-compose.datagov-theme.yaml build
docker-compose -f docker-compose.yaml -f .docker-compose-db.yaml -f .docker-compose.datagov-theme.yaml up -d

I have errors creating dataset:
ERROR creating CKAN package: http://nginx:8080/api/3/action/package_create Status code: 409 content:b'

{
	"help": "http://nginx:8080/api/3/action/help_show?name=package_create",
	"success": false,
	"error": {
		"__type": "Validation Error",
		"accrual_periodicity": ["The input is not valid"]
	}
}

The value from is 'R/P1Y'. And it's OK internally

Dataset

{
	'name': 'total-railroad-employment-by-state-and-county-2014',
	'title': 'Total Railroad Employment by State and County, 2014',
	'owner_org': '18c31c30-82ed-4277-bebf-1d3f33ef0152',
	'private': False,
	'notes': 'A breakdown of Railroad employees by State and County',
	'state': 'active',
	'resources': [{
		'url': 'http://www.rrb.gov/sites/default/files/2017-01/StateCounty2014.xls',
		'description': 'A breakdown of Railroad employees by State and County',
		'format': 'application/xls',
		'name': 'Total Railroad Employment by State and County, 2014',
		'mimetype': 'application/vnd.ms-excel',
		'describedBy': 'https://www.rrb.gov/FinancialReporting/FinancialActuarialStatistical/Annual'
	}],
	'tags': [{
		'name': 'county'
	}, {
		'name': 'demographic'
	}, {
		'name': 'railroad'
	}, {
		'name': 'railroad-employees'
	}],
	'extras': [{
		'key': 'resource-type',
		'value': 'Dataset'
	}, {
		'key': 'issued',
		'value': '2016-03-01'
	}, {
		'key': 'harvest_source_title',
		'value': 'rrb'
	}, {
		'key': 'source_schema_version',
		'value': '1.1'
	}, {
		'key': 'source_hash',
		'value': '10f8d1f8f7d01a2defc4eea7d31c304e49a5b905'
	}, {
		'key': 'source_datajson_identifier',
		'value': True
	}],
	'contact_name': 'Anna Salazar-Bartolon',
	'contact_email': '[email protected]',
	'modified': '2016-03-01',
	'publisher': 'Railroad Retirement Board',
	'public_access_level': 'public',
	'homepage_url': 'http://www.rrb.gov/pdf/act/StateCounty2014.xls',
	'unique_id': 'RRB-460',
	'spatial': 'US',
	'program_code': '000:000',
	'bureau_code': '446:00',
	'tag_string': 'County,Demographic,Railroad,Railroad Employees',
	'accrual_periodicity': 'R/P1Y'
}

I look at the repo and it seems to be OK
https://github.com/akariv/ckanext-datajson/blob/datagov/ckanext/datajson/helpers.py#L235

GitLab Privat Token expired for cco initialize-gitlab

Getting exit status 22 when trying to initialize-gitlab.

ckan-cloud-operator initialize-gitlab me/cloud-my-new-instance

# Logs
subprocess.CalledProcessError: Command '['curl', '-f', '-s', '--header', 'PRIVATE-TOKEN: my-TOkEn', 'https://gitlab.com/api/v4/projects/me%2Fcloud-my-new-instance']' returned non-zero exit status 22.

Tried to run curl command manually:

curl -f -s --header "PRIVATE-TOKEN: my-Token" https://gitlab.com/api/v4/projects/me%2Fcloud-my-new-instance -v
...
> Host: gitlab.com
> User-Agent: curl/7.47.0
> Accept: */*
> PRIVATE-TOKEN: My-TOkEn
> 
* The requested URL returned error: 404 Not Found
* Closing connection 0

After Generating new personal token (with my account) problem is gone. So we need to generate a new one for a production cluster and update appropriate ENV

add support for registering SSL using http authentication for external domains

ckan-cloud-operator-edge routers create-deis-instance-external-route external-1 ni www.opendatani.gov.uk

To simplify, we might need dedicated load balancer for external domains, which will be authenticated for ownership using http (instead of DNS which more reliable and used for our domains, but cannot be used for external domains)

ckan-cloud-operator initialize-gitlab sometimes fails

If fails, it only happens on the very first run. The error is coming from this line https://github.com/ViderumGlobal/ckan-cloud-operator/blob/master/ckan_cloud_operator/gitlab.py#L113. If you just rerun, it get's through

  File "/home/zelima/viderum/ckan-cloud-operator/ckan_cloud_operator/gitlab.py", line 35, in initialize
    'content': self._get_gitlab_ci_yml(), 'commit_message': 'Add .gitlab-ci.yml'
  File "/home/zelima/viderum/ckan-cloud-operator/ckan_cloud_operator/gitlab.py", line 113, in _curl
    assert r.status_code == 200, r.text
AssertionError: {"file_path":".gitlab-ci.yml","branch":"master"}

review DB migration and permissions and suggest short and long-term improvements

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.