Giter VIP home page Giter VIP logo

carbonawarescheduler's Introduction

A Proof of Concept for scheduling a GitHub Action Workflow based on the output of the CarbonAwareSDK

How it works

Prerequisites: create personal access token named WORKFLOW_TOKEN that has workflow write access.

  1. on a PR merge, the workflow runs

    1. Sets correct permissions for bash script

    2. copy/overwrite the scheduledWorkflow_template.yml into the workflows/ folder

    3. call scheduleWorkflowUsingHostedCarbonAwareSDK.bash

      1. calls hosted version of CarbonAwareSDK
      2. extract first optimal time to run 10 min job within next 24 hours in 'ukwest' region (can change these parameters, only using these for the PoC).
      3. output reformatted timestamp into correct format for 'cron' trigger of workflow
    4. take output and replace the templated schedule in the newly copied/overwritten scheduledWorkflow.yml workflow

    5. Commit this updated workflow

    6. scheduledWorkflow workflow is now scheduled to run at the time forecasted to have the lowest carbon intensity

    7. scheduledWorkflow.yml ends by deleting itself to prevent it running every year

Notes:

  • You can still manually trigger using the the scheduledWorkflow workflow using 'workflow_dispatch:'

Problems:

  • Requires paid for subscriptions of either ElectricityMaps API or WattTime API before it can forecast emission data
  • Instead of the hosted test version, it will have to clone the carbon-aware-sdk locally and then configure it with the credentials from account(s) above in the delayCalculator.yml workflow

carbonawarescheduler's People

Contributors

jhowlett-scottlogic avatar

Stargazers

Graham Odds avatar

Watchers

 avatar

carbonawarescheduler's Issues

Use CLI version of SDK

Currently a hosted version of the CarbonAwareSDK is being used however this isnt owned by us, only returns test data, is an old version of the API and could be taken down at any minute.

This is going to require acess to the pro teir of either WattTime or ElectrcityMaps or a free month if they give it or some test data returned by one of the data providers (watttime does this if you give it a a specific region).

  • lets keep the existing method of using the hosted SDK in place so we can reference it if we find the CLI no good.

TODO

create new delayCalculator.yml workflow

  • clone carbon-aware-sdk repo to get the CLI source code
  • add configuration values (username & pasword) for data source (WattTime or ElectricityMaps) which must be kept in this repo's secrets or as input to the github action using a secret
  • Install/build or whatever is needed to use the CLI version
  • use CLI to get same output as Hosted version gives

Notes

Turn into a GitHub action

To make this reusable we want to turn this repoisotry into a github action that other repositories can then reference.

it would be used to schedule an existing workflow in the repository to run at the time of forecasted lowest carbon intensity.

it would:

  1. use inputs to calculate optimal time and region using CarbonAwareSDK.
  2. copy and modify the template the workflow into the correct directory and add the necessary parts to the template workflow
    • replace schedule placeholder with correct time ${SCHEDULE}
    • replace region string placeholder ${REGION} (optional for now as currently it can only cater for 1 region)
    • add git config step
    • add self-deletion step to end

Possible Inputs:

  • regions String[] (first iteration should just handle 1 element array, or just a plain string, but want future capability of handling multiple regions)
  • job length in minutes Int
  • max waiting time in minutes Int
  • template workflow tocopy and modify

Smart estimation for Github hosted runners

Github hosted runners do not currnetly give their exact location. Github uses the following azure regions (https://github.com/orgs/community/discussions/24969):

East US (eastus)
East US 2 (eastus2)
West US 2 (westus2)
Central US (centralus)
South Central US (southcentralus)

can we estiamte the most likely best time?

ideas:

  1. sum up each region's forecasted carbon intensity for each timestamp, select lowest total (my current favourite)
  2. look at optimalDataPoints array for each region and select highest count, they could all be 1 if all arre different though

Open to ideas here on what could be the best method

Once this is completed we can then add a new input inot the action that is a boolean for github hosted, if true use this new method and will ignore the existing regions input, false use exisiting method

required completion of #17 first

Design Decision: CarbonAwareSDK vs directly using data sources

for forecast data only WattTime gives this info out for free. the other data source ElectricityMaps doesn't give this out for free and you would need to use a paid tier.

instead of cloning and installing CarbonAwareSDK every workflow which isnt ideal should we not just directly call the WattTime API?

Do we want to be able to connect to the other data sources provided by CarbonAwareSDK?

You can only set up one data source at a time so using multiple is not that useful.

Handle multiple regions

current the simple bash scrip handles only 1 region. THis is because ti retreives the first regions optimalDataPoints object but each region generates this.

we ideally want to be able to put in an array of regions and then have the most optimal region and time outputted.

THis will require changing from using bash scripts to using something more sophisticated that uses json object parsing. (wait for #18 to be compelted)

Come up with proposals for how we integrate this action for the blog publishing

There are different possibilities for us to use this action for the blog publishing including:

  • SDK hosting - we host the SDK vs. clone and build CLI in the workflow.
  • runner hosting - Self vs. github (worse estimation for github as there are 5 possible places and they are not great for us (US based))
  • skip the SDK and build our own predicitve AI model to circumvent the paid-for endpoints of the data sources

I think we need to come up with some proposals that explain the work and effort required.

Use more advanced language

currently all logic are commands contained in workflow files. However for better JSON parsing using an alternative language will be the best moving forward.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.