Giter VIP home page Giter VIP logo

missioncontrol-v2's Introduction

missioncontrol-v2

CircleCI

An alternate view of crash and stability

Installation instructions

This code is designed to be 'easy' to install and repeatable. That is if the underlying data doesn't change the output should not either.

This code should either work inside GCP or on a local computer. In either case, we recommend a powerful one with at least 4 cores and a decent amount (8Gb+) of memory. If using Docker, you will want to increase the amount of resources available to containers if you haven't already: the defaults are likely to be insufficient.

You will also want a GCP service account with permission to read from the datasets in fx-data-shared-prod and write access to a cloud storage bucket. Typically one would do this using a sandbox project.

After getting a service setup, download the credentials into a file called gcloud.json in the root of your checkout.

Development instructions

The ETL pipeline is based on running a number of scripts in succession, performing the following operations:

  • Download the latest crash and usage data for a recent set of versions, and upload the results to a temporary table in BigQuery.
  • Build a statistical model based on the above data downloaded as well as historical data that we have seen before.
  • Generate an Rmarkdown-based report based on the output of the above model and upload it to google cloud storage.

Option 1: Use the Docker container

This is the most deterministic approach and closest to what we are using in production, though it is likely to be slower on non-Linux hosts. These instructions assume that you have Docker and a basic set of developer tools installed on your machine.

First, build the container:

make build

Then, create a shell session inside it:

make shell

Skip to the next section to run the code.

Option 2: Use Conda

This should run on the bare metal of your machine, and should be much faster on Mac. These instructions assume you have either conda or miniconda installed, as well as the Google Cloud SDK.

From the root checkout, creating and activating a conda environment is a two step process:

conda env create -n mc2 -f environment.yml
conda activate mc2

Running

Once you have a shell (either in the docker container or activated conda environment), set some environment variables corresponding to your GCP settings:

export GOOGLE_APPLICATION_CREDENTIALS=$PWD/gcloud.json
export RAW_OUTPUT_TABLE=missioncontrol_v2_test_raw
export MODEL_OUTPUT_TABLE=missioncontrol_v2_test_model
export GCP_PROJECT_ID=my-gcp-project-id
export GCS_OUTPUT_PREFIX=gs://my-cloud-storage-bucket

The RAW_OUTPUT_TABLE and MODEL_OUTPUT_TABLE settings specify the GCP table names for temporary data written during the run.

Then run the model:

./complete.runner.sh

If running on an underpowered machine, or you just want to get results more quickly, you can also enable "simple" mode, which (as the name implifies) speeds up the model generation significantly by using a simplified statistical model:

SIMPLE=1 ./complete.runner.sh

Gotchas

If you run the data pulling code shortly after a new release, and did not pull data in the previous days, then those days' data could be missing for the previous major release versions.

To avoid this problem, you can copy the bigquery table used in production (moz-fx-data-derived-datasets.analysis.missioncontrol_v2_raw_data) to your own GCP project.

missioncontrol-v2's People

Contributors

saptarshiguha avatar wcbeard avatar wlach avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

nwork wlach

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.