open-contracting / deploy Goto Github PK
View Code? Open in Web Editor NEWDeployment configuration and scripts
Home Page: https://ocdsdeploy.readthedocs.io/en/latest/
License: Apache License 2.0
Deployment configuration and scripts
Home Page: https://ocdsdeploy.readthedocs.io/en/latest/
License: Apache License 2.0
@robredpath has written in CRM-4925:
The images for Hetzner servers can be downloaded from https://download.hetzner.de/bootimages/ (user: hetzner p/w: download) so, if we wanted, I think we could set up some kind of Vagrant+Salt server testing setup.
https://github.com/open-contracting/deploy/blob/master/salt/apache/ocds-docs-live.conf.include#L249
{# This solves a problem - "Convert to Spreadsheet" was working on HTTPS but not HTTP. Fix this by forcing header to be something the CSFR check likes. #}
{# When we change HTTPS to force, there will be no HTTP traffic and this block should be removed. #}
{% if testing %}
<Location /review>
Header add referer "https://testing.live.standard.open-contracting.org"
RequestHeader set referer "https://testing.live.standard.open-contracting.org"
</Location>
<Location /infrastructure/review>
Header add referer "https://testing.live.standard.open-contracting.org"
RequestHeader set referer "https://testing.live.standard.open-contracting.org"
</Location>
{% endif %}
{% if not testing %}
<Location /review>
Header add referer "https://standard.open-contracting.org"
RequestHeader set referer "https://standard.open-contracting.org"
</Location>
<Location /infrastructure/review>
Header add referer "https://standard.open-contracting.org"
RequestHeader set referer "https://standard.open-contracting.org"
</Location>
{% endif %}
Document how to test changes to Salt against a virtual machine and in a separate branch, perhaps through a simple worked example.
https://standard.open-contracting.org/infrastructure/review/
This doesn't occur with the OCDS DRT: https://standard.open-contracting.org/review/
It would be helpful for analysts to know where the hosted Kingfisher views logs are.
pip==8.1.2
I figure we can at minimum use >
instead of ==
.
From #28
Question 1: Which box should this be on - a new box? A 1GB Bytemark box should be fine.
Question 2: For alerts, we need a way to send email. What shall we use? AWS has an email sending service, or are there other options?
https://ocdsdeploy.readthedocs.io/en/latest/server-monitoring.html
Document how to fully set up the agent on each server so that others can do this. Document how to set up a server, or more likely, link to existing documentation.
It seems to send a new message every few minutes. I thought we weren't using Icinga and Nagios anymore? I deleted all the mail messages up to now. Sample subjects on ocdskingfisher-new:
U 16 [email protected] Thu Oct 17 01:17 25/889 [RECOVERY] disk / on process.kingfisher.open-contracting.org is OK!
U 17 [email protected] Thu Oct 17 01:17 25/913 [RECOVERY] disk on process.kingfisher.open-contracting.org is OK!
U 18 [email protected] Thu Oct 17 01:19 25/874 [PROBLEM] procs on process.kingfisher.open-contracting.org is WARNING!
U 19 [email protected] Thu Oct 17 01:23 25/927 [PROBLEM] memory on process.kingfisher.open-contracting.org is UNKNOWN!
U 20 [email protected] Thu Oct 17 01:34 25/885 [PROBLEM] load on process.kingfisher.open-contracting.org is CRITICAL!
U 21 [email protected] Thu Oct 17 01:42 25/907 [PROBLEM] apt on process.kingfisher.open-contracting.org is WARNING!
Sample message:
Return-Path: <[email protected]>
X-Original-To: root@localhost
Delivered-To: root@localhost
Received: by process.kingfisher.open-contracting.org (Postfix, from userid 112)
id 188B05D00CD7; Tue, 23 Jul 2019 06:25:46 +0200 (CEST)
Subject: [PROBLEM] load on process.kingfisher.open-contracting.org is CRITICAL!
To: <root@localhost>
X-Mailer: mail (GNU Mailutils 3.4)
Message-Id: <20190723042548.188B05D00CD7@process.kingfisher.open-contracting.org>
Date: Tue, 23 Jul 2019 06:25:46 +0200 (CEST)
From: [email protected]
X-IMAPbase: 1571862900 21517
Status: O
X-UID: 5721
***** Service Monitoring on process *****
load on process.kingfisher.open-contracting.org is CRITICAL!
Info: CRITICAL - load average: 8.56, 7.87, 7.90
When: 2019-07-23 06:25:46 +0200
Service: load
Host: process.kingfisher.open-contracting.org
IPv4: 127.0.0.1
IPv6: ::1
robredpath (re: stopping and starting Scrapyd): under what circumstances might I want to do this? [from open-contracting/kingfisher-collect#103]
Add content to explain
To document:
To discuss and document:
Other:
The current process depends on making entries in Open Data Services Coop resources and using the Open Data Services Coop deploy token. Document that fact here, or discuss a new process so that OCP staff can make new servers and document that here.
https://ocdsdeploy.readthedocs.io/en/latest/making-changes.html
Discuss testing against a virtual machine or making changes against live servers. Discuss procedures for people who can not deploy for whatever reason to make changes (eg always via pull request?). Have clearer guidelines about when it is and is not appropriate to commit straight to master. Document.
https://ocdsdeploy.readthedocs.io/en/latest/deploying.html
Work out how we’re going to make a deploy token work between Open Data Services and OCP, so that OCP staff can deploy. Document.
Invalid HTTP_HOST header: '46.43.2.235'. You may need to add '46.43.2.235' to ALLOWED_HOSTS.
Invalid HTTP_HOST header: '46.43.2.235:443'. You may need to add '46.43.2.235' to ALLOWED_HOSTS.
Invalid HTTP_HOST header: 'live.standard-search.opencontracting.uk0.bigv.io'. You may need to add 'live.standard-search.opencontracting.uk0.bigv.io' to ALLOWED_HOSTS.
Let's add these to the allowed hosts (the IP is for the standard-search server).
salt-ssh '*' pkg.autoremove list_only=True
returns a lot of packages that can be removed (some of which were installed for icinga2/nagios #45 (comment) but not all).
We tried to move this to ocds-docs-live server, but something hadn't gone totally right as when we decommissioned an old server today http://ocds.opendataservices.coop/standard/r/1__0__RC/en/standard/intro/ went offline.
But a message was received saying that's not an important site any more? Is that right?
If so, we can remove ocds-legacy from deploy repository to keep things clean.
If not, we should be able to put it back online.
I see that you need to first deploy with https='certonly', then with either 'yes' or 'force'.
Original issue title and description
{{ servername }}_acquire_certs will always error if certs not yet created
This state runs:
/etc/init.d/apache2 reload; letsencrypt certonly --non-interactive --no-self-upgrade --expand --email [email protected] --agree-tos --webroot --webroot-path /var/www/html/ {{ domainargs }}
When changing servername, this outputs:
stderr:
Job for apache2.service failed because the control process exited with error code.
See "systemctl status apache2.service" and "journalctl -xe" for details.
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator webroot, Installer None
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for cove-live.oc4ids.opencontracting.uk0.bigv.io
http-01 challenge for master.cove-live.oc4ids.opencontracting.uk0.bigv.io
Using the webroot path /var/www/html for all unmatched domains.
Waiting for verification...
Cleaning up challenges
stdout:
Reloading apache2 configuration (via systemctl): apache2.service failed!
IMPORTANT NOTES:
- Congratulations! Your certificate and chain have been saved at:
/etc/letsencrypt/live/cove-live.oc4ids.opencontracting.uk0.bigv.io/fullchain.pem
Your key file has been saved at:
/etc/letsencrypt/live/cove-live.oc4ids.opencontracting.uk0.bigv.io/privkey.pem
Your cert will expire on 2020-01-27. To obtain a new or tweaked
version of this certificate in the future, simply run certbot
again. To non-interactively renew *all* of your certificates, run
"certbot renew"
- If you like Certbot, please consider supporting our work by:
Donating to ISRG / Let's Encrypt: https://letsencrypt.org/donate
Donating to EFF: https://eff.org/donate-le
systemctl status apache2.service -n 10
shows:
Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io apachectl[32419]: AH00526: Syntax error on line 69 of /etc/apache2/sites-enabled/cove.conf:
Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io apachectl[32419]: SSLCertificateFile: file '/etc/letsencrypt/live/cove-live.oc4ids.opencontracting.uk0.bigv.io/cert.pem' does not exist or is empty
Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io apachectl[32419]: Action 'graceful' failed.
Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io apachectl[32419]: The Apache error log may have more information.
Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io systemd[1]: apache2.service: Control process exited, code=exited status=1
Oct 29 21:53:08 cove-live.oc4ids.opencontracting.uk0.bigv.io systemd[1]: Reload failed for The Apache HTTP Server.
As I understand, the cert.pem
file won't exist until the letsencrypt certonly
command is run. Should we instead reload apache after letsencrypt certonly
?
https://ocdsdeploy.readthedocs.io/en/latest/making-changes.html
Discuss procedures and document such that all staff can make changes to these.
OpenDataServices/opendataservices-deploy#81
Check if we need this fix in this repo (I suspect we do) and apply if so
I assume these are one-time-use utilities in cases where apache or uwsgi are removed from a server (though I don't know in what case that would occur).
Tho we should be able to reload almost everything* from files on disk, it would take a while.
This came up in conversation with @kindly.
ps.* The tiny exception is explained in open-contracting/kingfisher-process#122
To reduce duplication between Toucan, etc.
Please use @romifz's @cds.com.py address.
I think the only difference is the directory to which files are mirrored. This can be a configuration in .travis.yml (profiles already have partial control over the directory).
That way, we'll have only one deploy-docs.sh script which will be easier to maintain.
The script can check that all necessary environment variables are set.
After #8 is merged.
"ocdsdeploy" seems free as a project URL?
The DSN for open-contracting-validator isn't the same as those in this repo. So, I assume there is another project that is the 'real' one for the OCDS and OC4IDS Data Review Tools.
Use .gitmodules for the private repos and add setup instructions to readme.
For updating the submodules, I prefer instructions in the readme to a shell script, but if the shell script is kept, it should use --rebase
to avoid extra merge commits.
Nothing reports to it. Its DSN doesn't occur in this repository.
Discuss if this will be the same or different in future. Document the procedure here more fully.
Important for server maintanence reasons
From #28
The agents are at https://prometheus.io/docs/instrumenting/exporters/
I think we should be PR’ing changes to this repo.
'ocdskit-web' will become confusing over time, as we (and new users) will forget that it was the old name for Toucan.
https://github.com/open-contracting/kingfisher-views/pull/32/files removes requirements.txt
This unfortunately means we have a failed state:
ID: /home/ocdskfp/ocdskingfisherviews/.ve/ Function: virtualenv.managed Result: False Comment: An exception occurred in this state: Traceback (most recent call last): .............. FileNotFoundError: [Errno 2] No such file or directory: '/home/ocdskfp/ocdskingfisherviews/requirements.txt' Started: 09:53:07.347843 Duration: 369.936 ms Changes:
When we wrote these sripts, Redash provided a Ubuntu install script.
Now they don't - only a Docker version.
So this won't work on a fresh server. (A step fails with a 404 error)
There also raises worries that on the next major upgrade we might get into difficulties, as the upgrade script we are using won't be the right thing to do any more. (We use our own version, for reasons explained at the top of https://github.com/open-contracting/deploy/blob/master/salt/redash/upgrade-nointeraction )
We would like to switch to Prometheus for OCP server and service monitoring. This is a very popular fully open source project that we now use on our servers - see https://prometheus.io/
Its model is that a small agent runs on the target as a service and makes a HTTP end point available. The Server component then regularly “pulls” data from that end point. The HTTP end point returns a plain text file with a bunch of keys; keys can be fully defined by you. (This contrasts to a “push” model that others use, tho you can do “push” stuff if you really want.)
Data is then available in a nice web UI, with current status, historical graphs and alarms. Other dashboards can be hooked up if you want.
The machine exporter ( https://github.com/prometheus/node_exporter ) exports stats such as CPU, RAM and Disk use. Historical data on this lets us answer questions about how much load a machine typically has - these questions have come up before in server planning. We set this up under its own user on each server and in our experience it consumes minimal resources.
We also use https://github.com/prometheus/blackbox_exporter - this lets you monitor websites for good service. (I know we have uptimerobot but a little more doesn’t hurt)
We haven’t used these agents before, but https://prometheus.io/docs/instrumenting/exporters/ lists agents for both Redis and Postgres - good for Kingfisher.
The pull rather than push model makes for a much simpler setup. You can even have more than one server at a time pull data from an end point. This would allow us to setup all the monitors and alerts in ODSC’s infrastructure for now, but if OCP ever wanted the data to appear in their own server later this would be easy to do. It also allows you to run a server on your laptop to test things, but still pull real data - quite nice.
You can write custom exporters, because the HTTP end point format is so simple. They even provide an official Python library so you can easily measure metrics inside your app. https://prometheus.io/docs/instrumenting/clientlibs/ We could use this in things like Kingfisher; monitoring the length of any queues for instance, or monitoring views’ progress.
All configuration is done by files on disk, mostly YAML. This means that salt can simply, immediately and with no human intervention set up a fully working system. (This contrasts nicely to our current system, where salt installs some stuff but then you have to go on by hand to set up the machine in the monitoring network with some awkward steps)
We’d do the basics initially and then talk with you to see what else we wanted to add once it's up and running - but first we wanted to check in quickly to see what you thought of Prometheus?
(Copied from letsencrypt.sls)
The version of letsencrypt in the 16.04 repo is tragically old (0.4.1) and predates renaming to certbot, nice apache support, et. The version in the 18.04 repo is just a alias for certbot.
When we get rid of our last 16.04 servers, we can just switch to certbot.
https://ocdsdeploy.readthedocs.io/en/latest/salt.html
Work out the minimum version of Salt required and document here for people who want to install a suitable version. Link to install pages for common operating systems.
ocds-redash:
- Detected conflicting IDs, SLS IDs need to be globally unique.
The conflicting ID is 'restart-nignx' and is found in SLS 'base:ocds-redash' and SLS 'base:prometheus-client-nginx'
https://standard.open-contracting.org/profiles/eu/master/en/
https://standard.open-contracting.org/profiles/gpa/master/en/
Is this because they are on master
? Can we make an exception?
Currently the standard site proxies traffic to the data tool via the FQDN "oc4ids.cove.live.opendataservices.coop"
https://github.com/open-contracting/deploy/blob/master/pillar/live_pillar.sls#L3
We should just change that to "cove.cove-live.oc4ids.opencontracting.uk0.bigv.io", like the other server. That will need to be changed on the Data Tool server too, and make sure the SSL cert is obtained correctly.
Process and Views (if open-contracting/kingfisher-summarize#34 is pursued) only interact at the level of the database, i.e. Views can work with any database whose schema is the same as Process.
As such, I don't see why both should always be deployed at the time, as is currently the case.
If one of the two has a broken deployment, it shouldn't cause the other to fail, like in #24.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.