Giter VIP home page Giter VIP logo

deploy's People

Contributors

aguilerapy avatar bikramtuladhar avatar bjwebb avatar dependabot[bot] avatar dogsbody avatar dogsbody-ashley avatar dogsbody-josh avatar duncandewhurst avatar jpmckinney avatar kindly avatar michaelwood avatar odscjames avatar robhooper avatar robredpath avatar shakhanton avatar yolile avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deploy's Issues

Comment out HTTP/HTTPS Fix in Docs to Cove proxy

https://github.com/open-contracting/deploy/blob/master/salt/apache/ocds-docs-live.conf.include#L249

    {# This solves a problem - "Convert to Spreadsheet" was working on HTTPS but not HTTP. Fix this by forcing header to be something the CSFR check likes. #}
    {# When we change HTTPS to force, there will be no HTTP traffic and this block should be removed. #}
    {% if testing  %}
        <Location /review>
                Header add referer "https://testing.live.standard.open-contracting.org"
                RequestHeader set referer "https://testing.live.standard.open-contracting.org"
        </Location>
        <Location /infrastructure/review>
                Header add referer "https://testing.live.standard.open-contracting.org"
                RequestHeader set referer "https://testing.live.standard.open-contracting.org"
        </Location>
    {% endif %}
    {% if not testing %}
        <Location /review>
                Header add referer "https://standard.open-contracting.org"
                RequestHeader set referer "https://standard.open-contracting.org"
        </Location>
        <Location /infrastructure/review>
                Header add referer "https://standard.open-contracting.org"
                RequestHeader set referer "https://standard.open-contracting.org"
        </Location>
    {% endif %}

Testing Salt changes

Document how to test changes to Salt against a virtual machine and in a separate branch, perhaps through a simple worked example.

Set up OCP Prometheus server

From #28

Question 1: Which box should this be on - a new box? A 1GB Bytemark box should be fine.

Question 2: For alerts, we need a way to send email. What shall we use? AWS has an email sending service, or are there other options?

nagios sends thousands of messages to /var/mail/root

It seems to send a new message every few minutes. I thought we weren't using Icinga and Nagios anymore? I deleted all the mail messages up to now. Sample subjects on ocdskingfisher-new:

 U  16 [email protected] Thu Oct 17 01:17  25/889   [RECOVERY] disk / on process.kingfisher.open-contracting.org is OK!
 U  17 [email protected] Thu Oct 17 01:17  25/913   [RECOVERY] disk on process.kingfisher.open-contracting.org is OK!
 U  18 [email protected] Thu Oct 17 01:19  25/874   [PROBLEM] procs on process.kingfisher.open-contracting.org is WARNING!
 U  19 [email protected] Thu Oct 17 01:23  25/927   [PROBLEM] memory on process.kingfisher.open-contracting.org is UNKNOWN!
 U  20 [email protected] Thu Oct 17 01:34  25/885   [PROBLEM] load on process.kingfisher.open-contracting.org is CRITICAL!
 U  21 [email protected] Thu Oct 17 01:42  25/907   [PROBLEM] apt on process.kingfisher.open-contracting.org is WARNING!

Sample message:

Return-Path: <[email protected]>
X-Original-To: root@localhost
Delivered-To: root@localhost
Received: by process.kingfisher.open-contracting.org (Postfix, from userid 112)
	id 188B05D00CD7; Tue, 23 Jul 2019 06:25:46 +0200 (CEST)
Subject: [PROBLEM] load on process.kingfisher.open-contracting.org is CRITICAL!
To: <root@localhost>
X-Mailer: mail (GNU Mailutils 3.4)
Message-Id: <20190723042548.188B05D00CD7@process.kingfisher.open-contracting.org>
Date: Tue, 23 Jul 2019 06:25:46 +0200 (CEST)
From: [email protected]
X-IMAPbase: 1571862900 21517
Status: O
X-UID: 5721

***** Service Monitoring on process *****

load on process.kingfisher.open-contracting.org is CRITICAL!

Info:    CRITICAL - load average: 8.56, 7.87, 7.90

When:    2019-07-23 06:25:46 +0200
Service: load
Host:    process.kingfisher.open-contracting.org
IPv4:    127.0.0.1
IPv6:    ::1

Creating a new server

To document:

  • Document how to generate and change root password
  • Add host-specific steps for Hetzner
  • Expand Prometheus section after #31
  • ODS CRM pages could move to OCP CRM wiki

To discuss and document:

  • Logging the IP address, hostname, root password in a password manager (A keypass file could be created using OCP current practice)

Other:

  • Check/Test whether python-msgpack and python-concurrent.futures are actually needed

The current process depends on making entries in Open Data Services Coop resources and using the Open Data Services Coop deploy token. Document that fact here, or discuss a new process so that OCP staff can make new servers and document that here.

Invalid HTTP_HOST header

Invalid HTTP_HOST header: '46.43.2.235'. You may need to add '46.43.2.235' to ALLOWED_HOSTS.
Invalid HTTP_HOST header: '46.43.2.235:443'. You may need to add '46.43.2.235' to ALLOWED_HOSTS.
Invalid HTTP_HOST header: 'live.standard-search.opencontracting.uk0.bigv.io'. You may need to add 'live.standard-search.opencontracting.uk0.bigv.io' to ALLOWED_HOSTS.

Let's add these to the allowed hosts (the IP is for the standard-search server).

Document usage of https certsonly to install new certs

I see that you need to first deploy with https='certonly', then with either 'yes' or 'force'.


Original issue title and description

{{ servername }}_acquire_certs will always error if certs not yet created

This state runs:

/etc/init.d/apache2 reload; letsencrypt certonly --non-interactive --no-self-upgrade --expand --email [email protected] --agree-tos --webroot --webroot-path /var/www/html/ {{ domainargs }}

When changing servername, this outputs:

          stderr:
              Job for apache2.service failed because the control process exited with error code.
              See "systemctl status apache2.service" and "journalctl -xe" for details.
              Saving debug log to /var/log/letsencrypt/letsencrypt.log
              Plugins selected: Authenticator webroot, Installer None
              Obtaining a new certificate
              Performing the following challenges:
              http-01 challenge for cove-live.oc4ids.opencontracting.uk0.bigv.io
              http-01 challenge for master.cove-live.oc4ids.opencontracting.uk0.bigv.io
              Using the webroot path /var/www/html for all unmatched domains.
              Waiting for verification...
              Cleaning up challenges
          stdout:
              Reloading apache2 configuration (via systemctl): apache2.service failed!
              IMPORTANT NOTES:
               - Congratulations! Your certificate and chain have been saved at:
                 /etc/letsencrypt/live/cove-live.oc4ids.opencontracting.uk0.bigv.io/fullchain.pem
                 Your key file has been saved at:
                 /etc/letsencrypt/live/cove-live.oc4ids.opencontracting.uk0.bigv.io/privkey.pem
                 Your cert will expire on 2020-01-27. To obtain a new or tweaked
                 version of this certificate in the future, simply run certbot
                 again. To non-interactively renew *all* of your certificates, run
                 "certbot renew"
               - If you like Certbot, please consider supporting our work by:
              
                 Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
                 Donating to EFF:                    https://eff.org/donate-le

systemctl status apache2.service -n 10 shows:

Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io apachectl[32419]: AH00526: Syntax error on line 69 of /etc/apache2/sites-enabled/cove.conf:
Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io apachectl[32419]: SSLCertificateFile: file '/etc/letsencrypt/live/cove-live.oc4ids.opencontracting.uk0.bigv.io/cert.pem' does not exist or is empty
Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io apachectl[32419]: Action 'graceful' failed.
Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io apachectl[32419]: The Apache error log may have more information.
Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io systemd[1]: apache2.service: Control process exited, code=exited status=1
Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io systemd[1]: Reload failed for The Apache HTTP Server.

As I understand, the cert.pem file won't exist until the letsencrypt certonly command is run. Should we instead reload apache after letsencrypt certonly?

Docs: Document usage of removeapache

I assume these are one-time-use utilities in cases where apache or uwsgi are removed from a server (though I don't know in what case that would occur).

Use a single deploy script

I think the only difference is the directory to which files are mirrored. This can be a configuration in .travis.yml (profiles already have partial control over the directory).

That way, we'll have only one deploy-docs.sh script which will be easier to maintain.

The script can check that all necessary environment variables are set.

Use .gitmodules

Use .gitmodules for the private repos and add setup instructions to readme.

For updating the submodules, I prefer instructions in the readme to a shell script, but if the shell script is kept, it should use --rebase to avoid extra merge commits.

Kingfisher - there is a failed state due to View changes

https://github.com/open-contracting/kingfisher-views/pull/32/files removes requirements.txt

This unfortunately means we have a failed state:


      ID: /home/ocdskfp/ocdskingfisherviews/.ve/
Function: virtualenv.managed
  Result: False
 Comment: An exception occurred in this state: Traceback (most recent call last):
           ..............
          FileNotFoundError: [Errno 2] No such file or directory: '/home/ocdskfp/ocdskingfisherviews/requirements.txt'
 Started: 09:53:07.347843
Duration: 369.936 ms
 Changes:   

Deploy Redash using Docker

When we wrote these sripts, Redash provided a Ubuntu install script.

Now they don't - only a Docker version.

So this won't work on a fresh server. (A step fails with a 404 error)

There also raises worries that on the next major upgrade we might get into difficulties, as the upgrade script we are using won't be the right thing to do any more. (We use our own version, for reasons explained at the top of https://github.com/open-contracting/deploy/blob/master/salt/redash/upgrade-nointeraction )

Use Prometheus for monitoring

We would like to switch to Prometheus for OCP server and service monitoring. This is a very popular fully open source project that we now use on our servers - see https://prometheus.io/

Its model is that a small agent runs on the target as a service and makes a HTTP end point available. The Server component then regularly “pulls” data from that end point. The HTTP end point returns a plain text file with a bunch of keys; keys can be fully defined by you. (This contrasts to a “push” model that others use, tho you can do “push” stuff if you really want.)

Data is then available in a nice web UI, with current status, historical graphs and alarms. Other dashboards can be hooked up if you want.

The machine exporter ( https://github.com/prometheus/node_exporter ) exports stats such as CPU, RAM and Disk use. Historical data on this lets us answer questions about how much load a machine typically has - these questions have come up before in server planning. We set this up under its own user on each server and in our experience it consumes minimal resources.

We also use https://github.com/prometheus/blackbox_exporter - this lets you monitor websites for good service. (I know we have uptimerobot but a little more doesn’t hurt)

We haven’t used these agents before, but https://prometheus.io/docs/instrumenting/exporters/ lists agents for both Redis and Postgres - good for Kingfisher.

The pull rather than push model makes for a much simpler setup. You can even have more than one server at a time pull data from an end point. This would allow us to setup all the monitors and alerts in ODSC’s infrastructure for now, but if OCP ever wanted the data to appear in their own server later this would be easy to do. It also allows you to run a server on your laptop to test things, but still pull real data - quite nice.

You can write custom exporters, because the HTTP end point format is so simple. They even provide an official Python library so you can easily measure metrics inside your app. https://prometheus.io/docs/instrumenting/clientlibs/ We could use this in things like Kingfisher; monitoring the length of any queues for instance, or monitoring views’ progress.

All configuration is done by files on disk, mostly YAML. This means that salt can simply, immediately and with no human intervention set up a fully working system. (This contrasts nicely to our current system, where salt installs some stuff but then you have to go on by hand to set up the machine in the monitoring network with some awkward steps)

We’d do the basics initially and then talk with you to see what else we wanted to add once it's up and running - but first we wanted to check in quickly to see what you thought of Prometheus?

Switch from letsencrypt to certbot

(Copied from letsencrypt.sls)

The version of letsencrypt in the 16.04 repo is tragically old (0.4.1) and predates renaming to certbot, nice apache support, et. The version in the 18.04 repo is just a alias for certbot.

When we get rid of our last 16.04 servers, we can just switch to certbot.

salt-ssh 'ocds-redash' state.apply fails

ocds-redash:
- Detected conflicting IDs, SLS IDs need to be globally unique.
The conflicting ID is 'restart-nignx' and is found in SLS 'base:ocds-redash' and SLS 'base:prometheus-client-nginx'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.