Giter VIP home page Giter VIP logo

ocf-infrastructure's Introduction

OCF Infrastructure

Terraform infrastructure-as-code for cloud environments.

All Contributors


A repository for managing the cloud infrastructure for the Open Climate Fix organisation. Contains terraform code for defining services and describing environments. Each contextual domain and each deployment environment are specified in folders within the terraform directory, along with reusable modules and unittests.

Repository Structure

ocf-infrastructure:
  terraform: # Contains all the terraform code for OCF's cloud infrastructure
    modules: # Portable terraform modules defining specific cloud infrastructure blocks
    nowcasting: # Specific code for the nowcasting domain's cloud infrastructure
    pvsite: # Specific code for the nowcasting domain's cloud infrastruture
    unittests: # Specific infrastructure code for a environment to test the modules
  local-stack: # Code to run the terraform stack locally for local testing/development
  .github: # Contains github-specific code for automated CI workflows

See the README's in the domain folders for more information on their architecture:

Terraform Overview

Terraform is a declarative language which is used to specify and build cloud environments. To install the CLI locally, ensure Homebrew is installed, then run

$ brew install terraform

If you aren't on Mac or don't want to use Homebrew, check out the official terraform installation instructions.

Pre-Commit

This repository implements a pre-commit config that enables automatic fixes to code when you create a commit. This helps to maintin consistency in the main repo. To enable this, follow the installation instructions on the precommit website.

Contributors โœจ

Thanks goes to these wonderful people (emoji key):

Peter Dudfield
Peter Dudfield

๐Ÿ’ป
Flo
Flo

๐Ÿ‘€
Shanmukh
Shanmukh

๐Ÿ’ป
lordsonfernando
lordsonfernando

๐Ÿ’ป
gmlyth
gmlyth

๐Ÿ’ป
Keenan Johnson
Keenan Johnson

๐Ÿ“–
devsjc
devsjc

๐Ÿ’ป ๐ŸŽจ
wsharpe41
wsharpe41

๐Ÿ’ป
Pedro Garcia Rodriguez
Pedro Garcia Rodriguez

๐Ÿ’ป

This project follows the all-contributors specification. Contributions of any kind welcome!

ocf-infrastructure's People

Contributors

peterdudfield avatar devsjc avatar jacobbieker avatar pre-commit-ci[bot] avatar flowirtz avatar allcontributors[bot] avatar braddf avatar aryanbhosale avatar simlmx avatar breakingpitt avatar gmlyth avatar sukh-p avatar keenanjohnson avatar lordsonfernando avatar vnshanmukh avatar wsharpe41 avatar

Stargazers

Megawattz avatar  avatar Florian Kotthoff avatar

Watchers

Lucian avatar James Cloos avatar  avatar

ocf-infrastructure's Issues

PV consumer (ECS)

Create a PV consumer that runs every 5 mins on ECS

can copy forecast ECS module to start with

Producation stack

  • terrafrom stack
  • setup terraform cloud
  • setup aws account
  • setup auth0
  • setup consumer secrets

Screenshot 2022-06-14 at 08 12 23

  • set up s3 buckets

  • Add capacitys to pv and gsp systems

  • add new FE

  • Check everything works?

  • switch over webaddress

  • [ ]

  • ec2 bastion

s3: remove >30 day data

Detailed Description

Automatically remove 30 day old data from s3 bucket.

Context

We don't want to store the data that long. This is just for prediction, not for training

Possible Implementation

I think this can be done with settings on the s3 bucket

add local stack test for satellite consumer

Detailed Description

Would be good to have a test that checks if the a file has been made once the satellite consumer has run in the local stack.
The file will be in the docker volume 'sat_data' in a folder called 'data'

Context

good to test

Possible Implementation

unsure how to check docker volumes using python / pytest

Terraform API

Detailed Description

Use terraform to run nowcasting API on AWS elastic beanstalk

version of docker file to use
tf: elastic beanstalk
tf: IAM independent role
put inside a VPC -
stack_name, so mulitple of these can be set up
tf: ec2: t2.medium

Context

Nowcasting API
Good to use terraform, so it can easily be rebuilt

For nowcasting project
Screenshot 2022-01-11 at 11 05 48

Add pv database

Detailed Description

Create database for pv data
secret password
iam policy to read password

similar to forecast database

Add version to docker tags

Detailed Description

Dont load latest image, load docker tagged versions. These versions should be set at the environment level.

This would be for

Versions could go in development vars and passed through

Context

Good to load tagged version, incase a new latest version comes out which is broken

Satellite consumer app

Detailed Description

Create satellite app that runs in ecs every 5 mins
create iam and roles

Possible Implementation

Copy the NWP consumer, just change docker image and

Setup s3 bucket for NWP data

Detailed Description

Add terraform code for creating

  • s3 bucket. The bucket should be private.
  • iam policy to read to the bucket
  • iam policy to write to the bucket

Context

Screenshot 2022-01-11 at 11 05 48

Might have to delete old bucket in s3 to make this work

Research Dagster

Detailed Description

Run dagster locally and see what it can do. Would be interesting to see

  • how it kicks off jobs - can it kick off AWS tasks on ECS?
  • how to make jobs dependent on each other?
  • how it deals with error messages
  • what informatino does the user get?
  • how can this be deployed not locally

This will be useful for nowcasting project
Screenshot 2022-01-11 at 11 05 48

add vpn tunnel

Detailed Description

Add method that local user can logging to the VPC in the cloud.

Context

This is useful for debugging

Add GSP Consumer to terraform code

Detailed Description

Add GSP Consumer to terraform code.
Should have two scheduled tasks

  • run every 30 mins, collecting todays data
  • run once a day at say 9, collecting yesterdays data

Possible Implementation

Can mainly copy PVConsumer code

Add Status Dashboard

We want to have an internal status dashboard for Nowcasting that let's us see the service health at a glance.

Architecture

To not reinvent the wheel, we can try to use Prometheus for that.
There is a hosted Prometheus service on AWS that takes care of storing and aggregating the metrics (backend). We will still need a Prometheus instance to scrape the metrics, but that could live as a docker container in our cluster.

There are helper libraries for almost all major languages, so we can use those libraries to expose metrics from our individual services. Once all our metrics are in Prometheus, we can make use of existing tooling like the Alertmanager for alerting and Grafana for visualising these metrics.

  graph TD;
      A[Data Source 1]-->B[Prometheus Ingester on Docker];
      E[Data Source 2]-->B[Prometheus Ingester on Docker];
      B[Prometheus Ingester on Docker]-->C[Prometheus Managed Service on AWS];
      C[Prometheus Managed Service on AWS]-->D[Grafana];
Loading

Further reading

Upgrade to postgres 14.1

Detailed Description

Would be good to use latetst postgres version -> 14.1

Screenshot 2022-02-18 at 15 38 23

Possible Implementation

change code here

Locak stack CI tests fails

Describe the bug
Currently CI local stack tests fail. Would be good to run local stack locally with logs and run tests

To Reproduce
See CI

Expected behavior
tests to pass

Additional context
Hard to see logs in CI

Statuts dashboard on faragate --> ecs

Currently status dashboard is on faragate,

Context

For continuous services I think faragate is more expensive that ECS, but we should check this,

Possible Implementation

Move to ecs

Add route53 to eb application

Detailed Description

Add route53 i.e an internet address to elastic beanstalk application.

Context

Useful for people to access the api, and not give them a funny elastic beanstalk address.
Might have to ask @flowirtz for help

S3 access for data vis

Detailed Description

Give data vis module access to nwp and satellite s3 buckets

Context

So data vis can plot latest data

Possible Implementation

similar to how forecast module has access to these buckets. Needs IAM read access

Trigger GSP consumer again,

Currently National results are in, but not GSP when it gets trigger.
Might be worthing triggering it a few minutes later, as well

Adjust cron job
to 9,11,39 and 41

Complete if:
See if data is available on data-vis app after 11 past?

add auth to API

Detailed Description

Add auth env to API
need to add

  • AUTH0_DOMAIN
  • AUTH0_API_AUDIENCE

Context

This is how the api can be authenticated

Possible Implementation

add values to AWS secret and then load them in as secrets to API

Setup ECS cluster

Detailed Description

  • Use terraform to set up simple ECS cluster. Perhaps start with 1 t3small machine and we can always change it easily.
  • no auto scalling

Context

need to setup NWP consumer

Automaticaly expands rds storage

Detailed Description

Let the database expand its storage automatically. A upper limit could be set of 500 MB.

Context

The less things we have to maintain the better

Possible Implementation

code could add using this

terraform: Secruity

Detailed Description

  • Move all security to a separate module. This makes it easy to review and check we know whats going on

Context

Security is hard, best to get a very simple view of it, so we know whats going on

Add Tags

Detailed Description

Add tags to all (if possible) components
The tags could be

  • 'nowcasting'
  • give the enviornment
  • give the module

Context

Useful to filter on tags and see what is going on

Prometheus Ingester can't write to Prometheus Remote Backend

Describe the bug
As per #49, our current status dashboard test setup consists of an AWS-managed service for prometheus and a prometheus ingestion container running on Docker (ECS). The Ingester uses the managed-service as a "remote backend", meaning it just fetches metrics data and sends it to the managed-service.

There seems to be a permissions issue, as the ECS container keeps logging the following error message:

ts=2022-03-01T17:34:55.724Z
caller=dedupe.go:112
component=remote
level=error
remote_name=f3f5a2
url=https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/ws-dcdefd46-7af3-483d-a430-24811887ff36/api/v1/remote_write
msg="non-recoverable error while sending metadata"
count=29
err="server returned HTTP status 403 Forbidden: {\"Message\":\"User: arn:aws:sts::008129123253:assumed-role/statusdash-iam-role/b7997edbd1a342ef8c3d8d57cd70a6e2 is not authorized to perform: aps:RemoteWrite on resource: arn:aws:aps:eu-west-1:008129123253:workspace/ws-dcdefd46-7af3-483d-a430-24811887ff36 with an explicit deny in an identity-based policy\"}"

The violating role statusdash-iam-role should already have this permission though, according to
https://github.com/openclimatefix/nowcasting_infrastructure/blob/16e71e768bf9802ec1d25188609a5306872b59e4/terraform/modules/statusdash/prometheus.tf#L86-L107

Update pv consumer filename

Detailed Description

Make sure PV conusmer doesnt go and get all 1000 metadata, just use 10 in the test file

Context

this is cause a rate limit effect on production service

Possible Implementation

set FILENAME to
filename = os.path.dirname(pvconsumer.file) + "/data/pv_systems_10.csv"

Fix failing local stack

The current locally stack test is failing

test docker is here

Need to

  • Make sure PV Consumer is limits to say 10 PV systems, this is stops our API limit being reached
  • update to latests versions
  • Add satellite - #59
  • adjust wait time for all things to run - here

Setup VPC

Detailed Description

Setup up simple VPC
use terraform

Context

for Nowcasting project
Screenshot 2022-01-11 at 11 05 48

Add ec2 basttion

Detailed Description

Add an ec2 bastion that we can use as a baston.
Add ssh keys for me, @JackKelly and @jacobbieker on this machine. These could be loaded from AWS secret
This means we could set up ssh tunneling ont this machine

Context

Good to have a backdoor into our system

Might be nice to have an variable, if to have this or not. As most of the time we wont need this.

s3 bucket for models

Detailed Description

  • Create s3 bucket for models
  • iam read policy
  • add policy to forecast role

Context

forecats module should be able to download s3 items form this bucket

Create shared Load Balancer for API

We want to have a shared load balancer for our API that doesn't change with each deployment but rather is persistent.

Current Situation
The load balancer gets recreated (with a different URL) on every deploy.

What we want
One persistent load balancer that always points at the most recent deployment(s).

How?
This guide in the AWS docs explains how to create a shared load balancer.

Basically:

  1. Create a standalone Application Load Balancer (ALB)
  2. Reference that standalone ALB in the EB environment config
  3. Done!

Would maybe also be good to export "aws_lb.dns_name" so we can easily find it for adding it as a CNAME to Cloudflare.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.