Giter VIP home page Giter VIP logo

service-capacity-modeling's Introduction

Service Capacity Modeling

Build Status

A generic toolkit for modeling capacity requirements in the cloud. Pricing information included in this repository are public prices.

NOTE: Netflix confidential information should never enter this repo. Please consider this repository public when making changes to it.

Trying it out

Run the tests:

# Test the capacity planner on included netflix models
$ tox -e py38

# Run a single test with a debugger attached if the test fails
$ .tox/py38/bin/pytest -n0 -k test_java_heap_heavy --pdb --pdbcls=IPython.terminal.debugger:Pdb

# Verify all type contracts
$ tox -e mypy

Run IPython for interactively using the library:

tox -e dev -- ipython

Example of Provisioning a Database

Fire up ipython and let's capacity plan a Tier 1 (important to the product aka "prod") Cassandra database.

from service_capacity_modeling.interface import CapacityDesires
from service_capacity_modeling.interface import FixedInterval, Interval
from service_capacity_modeling.interface import QueryPattern, DataShape

db_desires = CapacityDesires(
    # This service is important to the business, not critical (tier 0)
    service_tier=1,
    query_pattern=QueryPattern(
        # Not sure exactly how much QPS we will do, but we think around
        # 10,000 reads and 10,000 writes per second.
        estimated_read_per_second=Interval(
            low=1000, mid=10000, high=100000, confidence=0.9
        ),
        estimated_write_per_second=Interval(
            low=1000, mid=10000, high=100000, confidence=0.9
        ),
    ),
    # Not sure how much data, but we think it'll be below 1 TiB
    data_shape=DataShape(
        estimated_state_size_gib=Interval(low=100, mid=100, high=1000, confidence=0.9),
    ),
)

Now we can load up some models and do some capacity planning

from service_capacity_modeling.capacity_planner import planner
from service_capacity_modeling.models.org import netflix
import pprint

# Load up the Netflix capacity models
planner.register_group(netflix.models)

cap_plan = planner.plan(
    model_name="org.netflix.cassandra",
    region="us-east-1",
    desires=db_desires,
    # Simulate the possible requirements 512 times
    simulations=512,
    # Request 3 diverse hardware families to be returned
    num_results=3,
)

# The range of requirements in hardware resources (CPU, RAM, Disk, etc ...)
requirements = cap_plan.requirements

# The ordered list of least regretful choices for the requirement
least_regret = cap_plan.least_regret

# Show the range of requirements for a single zone
pprint.pprint(requirements.zonal[0].dict(exclude_unset=True))

# Show our least regretful choices of hardware in least regret order
# So for example if we can buy the first set of computers we would prefer
# to do that but we might not have availability in that family in which
# case we'd buy the second one.
for choice in range(3):
    num_clusters = len(least_regret[choice].candidate_clusters.zonal)
    print(f"Our #{choice + 1} choice is {num_clusters} zones of:")
    pprint.pprint(least_regret[choice].candidate_clusters.zonal[0].dict(exclude_unset=True))

Note that we can customize more information given what we know about the use case, but each model (e.g. Cassandra) supplies reasonable defaults.

For example we can specify a lot more information

db_desires = CapacityDesires(
    # This service is important to the business, not critical (tier 0)
    service_tier=1,
    query_pattern=QueryPattern(
        # Not sure exactly how much QPS we will do, but we think around
        # 50,000 reads and 45,000 writes per second with a rather narrow
        # bound
        estimated_read_per_second=Interval(
            low=40_000, mid=50_000, high=60_000, confidence=0.9
        ),
        estimated_write_per_second=Interval(
            low=42_000, mid=45_000, high=50_000, confidence=0.9
        ),
        # This use case might do some partition scan queries that are
        # somewhat expensive, so we hint a rather expensive ON-CPU time
        # that a read will consume on the entire cluster.
        estimated_mean_read_latency_ms=Interval(
            low=0.1, mid=4, high=20, confidence=0.9
        ),
        # Writes at LOCAL_ONE are pretty cheap
        estimated_mean_write_latency_ms=Interval(
            low=0.1, mid=0.4, high=0.8, confidence=0.9
        ),
        # We want single digit latency, note that this is not a p99 of 10ms
        # but defines the interval where 98% of latency falls to be between
        # 0.4 and 10 milliseconds. Think of:
        #   low = "the minimum reasonable latency"
        #   high = "the maximum reasonable latency"
        #   mid = "value between low and high such that I want my distribution
        #          to skew left or right"
        read_latency_slo_ms=FixedInterval(
            low=0.4, mid=4, high=10, confidence=0.98
        ),
        write_latency_slo_ms=FixedInterval(
            low=0.4, mid=4, high=10, confidence=0.98
        )
    ),
    # Not sure how much data, but we think it'll be below 1 TiB
    data_shape=DataShape(
        estimated_state_size_gib=Interval(low=100, mid=500, high=1000, confidence=0.9),
    ),
)

Example of provisioning a caching cluster

In this example we tweak the QPS up, on CPU time of operations down and SLO down. This more closely approximates a caching workload

cache_desires = CapacityDesires(
    service_tier=1,
    query_pattern=QueryPattern(
        # Not sure exactly how much QPS we will do, but we think around
        # 10,000 reads and 10,000 writes per second.
        estimated_read_per_second=Interval(
            low=10_000, mid=100_000, high=1_000_000, confidence=0.9
        ),
        estimated_write_per_second=Interval(
            low=1_000, mid=20_000, high=100_000, confidence=0.9
        ),
        # Memcache is consistently fast at queries
        estimated_mean_read_latency_ms=Interval(
            low=0.05, mid=0.2, high=0.4, confidence=0.9
        ),
        estimated_mean_write_latency_ms=Interval(
            low=0.05, mid=0.2, high=0.4, confidence=0.9
        ),
        # Caches usually have tighter SLOs
        read_latency_slo_ms=FixedInterval(
            low=0.4, mid=0.5, high=5, confidence=0.98
        ),
        write_latency_slo_ms=FixedInterval(
            low=0.4, mid=0.5, high=5, confidence=0.98
        )
    ),
    # Not sure how much data, but we think it'll be below 1000
    data_shape=DataShape(
        estimated_state_size_gib=Interval(low=100, mid=200, high=500, confidence=0.9),
    ),
)

cache_cap_plan = planner.plan(
    model_name="org.netflix.cassandra",
    region="us-east-1",
    desires=cache_desires,
    allow_gp2=True,
)

requirement = cache_cap_plan.requirement
least_regret = cache_cap_plan.least_regret

Notebooks

We have a demo notebook in notebooks you can use to experiment. Start it with

tox -e notebook -- jupyter notebook notebooks/demo.ipynb

Development

To contribute to this project:

  1. Make your change in a branch. Consider making a new model if you are making significant changes and registering it as a different name.
  2. Write a unit test using pytest in the tests folder.
  3. Ensure your tests pass (or debug them) with:
tox -e py38 -- -k test_<your_functionality> --pdb --pdbcls=IPython.terminal.debugger:Pdb

Release

TODO

service-capacity-modeling's People

Contributors

jolynch avatar abersnaze avatar shengweiwang avatar akashdeepgoel avatar arunagrawal84 avatar tcdevoe avatar susheelaroskar avatar gndcshv avatar nickmahilani avatar ramsrivatsa avatar szimmer1 avatar raksoras avatar alexsyeo avatar arunagrawal-84 avatar kaidanfullerton avatar rajivshringi avatar

Stargazers

Juca Da avatar Manoj avatar Skylar Iolta avatar Avraam Mavridis avatar Oliver Mannion avatar Alexander avatar Gheorghina avatar Maciek avatar Gurpreet Singh avatar Aakash Bhowmick avatar Adam Zell avatar  avatar Prakash avatar satria budiman avatar Neeraj Shah avatar Rahul Jyala avatar Rafael Camargo Leite avatar A.J avatar Afzal Muhammad avatar Jack Liu Shurui avatar Nick Imanzi avatar Edward Schaefer avatar Anshul Agrawal avatar Dmytro Gajewski avatar Dmitry Ledentsov avatar Diego Garcia avatar Thabani Chibanda avatar Khalid avatar Dom Delnano avatar CS avatar Masahiro Ide avatar  avatar Max Wenger avatar Joris Roovers avatar Eren Atas avatar Davide Berdin avatar Fabio Ferreira avatar Vincent Lee avatar Marco Winkler avatar Joshua Reese avatar Mahmoud Shiri Varamini avatar Vlad Ionescu avatar Andrew Danks avatar Julian Hirn avatar Arcadio Pando avatar Joonsik Kim avatar Andrejs Agejevs avatar Baylee Schmeisser avatar Patrick Cornnell avatar John Meagher avatar Andrew McEdwards avatar  avatar Ashwin Jayaprakash avatar Nyimbi Odero avatar Štefan Miklošovič avatar Nick Plutt avatar Joe Lane avatar  avatar

Watchers

Hee Won Kim avatar Stephane Maldini avatar Christian Hoareau avatar  avatar  avatar  avatar Vishal Verma avatar  avatar Chaitanya Mutyala avatar Robert LaFont, Jr avatar Peter Lau avatar Scott Behrens avatar Sargun Dhillon avatar Ehtsham Elahi avatar Stephani Bishop avatar Parth Santpurkar avatar  avatar James Cloos avatar Chris Kirk avatar Corey Grunewald avatar Joshua Godi avatar Andrew Jones avatar  avatar Ajit Koti avatar Archana Kumar avatar Tales Marchesan avatar Eric Chiang avatar Alan Souza avatar Adam Girton avatar Giulio avatar Zill Christian avatar Roberto Perez Alcolea avatar Rich Smith avatar Katherine Anderson avatar  avatar Nathan Cook avatar Oscar Martinez avatar  avatar  avatar Hyunmin Kim avatar Bruce Wang avatar ajoshi avatar  avatar Vidhya Arvind avatar Charles Zheng avatar Ethan Adams avatar  avatar Chih-Wei Wu avatar  avatar Andrey Radchenko avatar Armando Magalhães avatar Donavan Fritz avatar Ian Stewart-Binks avatar Ben Duran avatar  avatar  avatar  avatar Tim Gasser avatar Marian Montagnino avatar Zile Liao avatar Sailesh Mukil avatar Briana Peterson avatar  avatar  avatar  avatar Brendan McFarland avatar Neal avatar  avatar  avatar Alex Bainbridge avatar  avatar Tejas Shikhare avatar  avatar

service-capacity-modeling's Issues

For rightsizing ask for existing compute usage

Right now most models are split into two parts:

  1. Try to determine the resources you need for a desire using math on the desire (CPU, RAM, Disk, Network, etc ...). Example
  2. Size and price clusters based on that particular service deployment mode (e.g. C* has to scale by factors of 2, and deploys to zones). Example

This makes sense for provisioning where we are trying to guess CPU time from e.g. payload sizes and RPS and such. For rightsizing it might makes more sense to just provide existing choices in the desire along with utilization and then the model can produce a ideal hardware for that specific requirement. Perhaps modify CapacityDesires to have an additional field called existing_deployment that takes either a Requirements or a Clusters. Maybe with the modification of instead of supplying a frequency to requriements, have a hardware shape/count (the cpu_count would be cpu * utilization for example).

Then models can short circuit the requirements generation or at least use the provided numbers as good defaults. RAM is the only one that seems tricky to me that might require merging.

Adding a new model

Greetings,

I was asked to add a new model to your capacity planner. Are there general directions or documentation on what it takes to add a new model? I see that the existing models vary significantly on how they are implemented. Since the model is provided to the capacity planner, there must be a protocol somewhere that must be followed so the planner understands the model, but if there is, I can't find it.

Thank you,

Improve C* scaling logic when including EVCache in KV plan

In our current logic (https://github.com/Netflix-Skunkworks/service-capacity-modeling/blob/main/service_capacity_modeling/models/org/netflix/key_value.py#L85), we scale the C* cluster by a factor of 1 - estimated_kv_cache_hit_rate, where estimated_kv_cache_hit_rate is configurable (default 0.8).

Per a previous convo with @jolynch and @szimmer1, we discussed possibly tying in the read/write ratio from the user desires into this calculation.

One toy example:

estimated_cache_hit_rate = extra_model_arguments.get("estimated_cache_hit_rate", 0.8)
estimated_cache_miss_rate = 1 - estimated_cache_hit_rate
rps_interval.scale(min(estimated_cache_miss_rate, max(0.1, 1 - read_write_ratio)))

Unclear repetition

I'm working on summarizing the cost, cpu, disk (local & attached) for both regional and zonal clusters. I want there to be more consistency in the way repetition is represented.

us-east-1: # trimmed
us-west-2:
  least_regret:
    - candidate_clusters:
        total_annual_cost: # redacted
        zonal:
          - cluster_type: cassandra # trimmed
          - cluster_type: cassandra # trimmed
          - cluster_type: cassandra # trimmed
        regional:
          - cluster_type: dgwkv
            total_annual_cost: # redacted
            count: 3
            instance:
              total_annual_cost: # redacted
              name: r5.large
            attached_drives:
              - name: gp2
                size_gib: 20
                annual_cost_per_gib: # redacted
                annual_cost_per_read_io: # redacted
                annual_cost_per_write_io: # redacted

In the sample above there are:

  • enumerated list of regions
  • duplicated zonals
  • instance with a count property

Capacity plans should return recommended autoscaling policies

Right now we just make a recommendation like "12 m5d.2xlarge" but for software that can autoscale (stateless java apps, elasticsearch etc ...) it would be nice if we could return a hint of the autoscaling policy.

Step 1: Define how we will represent a scaling policy (e.g. how to represent various metrics like CPU utilization etc ...)
Step 2: Make the models return them

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.