Giter VIP home page Giter VIP logo

custom-metric-reporter-gcp's Introduction

Report GCP Custom Metrics using Go

This application reports custom metrics to GCP Monitoring and is written in Go. To export the metrics, it reads a list of key-value pairs from an input file and creates (or appends the data to) the intended custom metric.

For each pair in the file, the key becomes the name of the custom metric, and the value becomes part of the label of the custom metric (and not the actual value of the custom metric). Depending on your need, this can be easily changed in metrics.go

Before Starting

  1. Install Golang and Git on your instance

    sudo apt install -y git-all
    wget -c https://go.dev/dl/go1.19.linux-amd64.tar.gz
    sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.19.linux-amd64.tar.gz
    export PATH=$PATH:/usr/local/go/bin
    
  2. Install Google Ops agent

     # install gcp ops agent 
     curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh
     sudo bash add-google-cloud-ops-agent-repo.sh --also-install
    
     # for GPU monitoring
     sudo mkdir -p /opt/google 
     cd /opt/google
     sudo git clone https://github.com/GoogleCloudPlatform/compute-gpu-monitoring.git 
     cd /opt/google/compute-gpu-monitoring/linux
     sudo python3 -m venv venv
     sudo venv/bin/pip install wheel
     sudo venv/bin/pip install -Ur requirements.txt
     sudo cp /opt/google/compute-gpu-monitoring/linux/systemd/google_gpu_monitoring_agent_venv.service /lib/systemd/system
     sudo systemctl daemon-reload
     sudo systemctl --no-reload --now enable /lib/systemd/system/google_gpu_monitoring_agent_venv.service
     cd ~
    
  3. Install and build repo

     git clone https://github.com/garg02/custom-metric-reporter-gcp.git
     cd custom-metric-reporter-gcp
     go mod init metrics
     go mod tidy
     go build metrics.go
     chmod +x metrics
     sudo cp metrics /usr/local/bin/
     cd ~
    

Add custom metrics and required metadata to file

echo "PROJECT_ID=$(gcloud info --format='value(config.project)')" > ~/batch_info.txt
echo "INSTANCE_ID=$(curl http://metadata.google.internal/computeMetadata/v1/instance/id -H Metadata-Flavor:Google)" >> ~/batch_info.txt
echo "ZONE_ID=$(curl http://metadata.google.internal/computeMetadata/v1/instance/zone -H Metadata-Flavor:Google | rev | cut -d/ -f1 | rev)" >> ~/batch_info.txt
        
# in this case the Unix timestamp is the batch_number
echo "cpu_batch_num=$(date +%s)" >> ~/batch_info.txt
echo "gpu_batch_num=$(date +%s)" >> ~/batch_info.txt

Add cron job that reports the custom metric every minute

echo -e "* * * * * /usr/local/bin/metrics -f batch_info.txt >> metrics.out 2>> metrics.err" | crontab -u $USER -

Editing textfile to modify values of custom metrics

sed -i "s/^cpu_batch_num=.*/cpu_batch_num=$(date +%s)/" ~/batch_info.txt
sed -i "s/^gpu_batch_num=.*/gpu_batch_num=$(date +%s)/" ~/batch_info.txt

Note: Cron job requires storing key-value pairs in persistant memory as environment variables get reset each time.

Appendix:

  1. FinalMonitoring.json has been included as an example for aligning and joining MQL metrics.
  2. stress.sh and minimap.sh are wrapper scripts that use metrics to report the current batch being run

custom-metric-reporter-gcp's People

Contributors

garg02 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.