Giter VIP home page Giter VIP logo

service-capacity-modeling's Issues

Improve C* scaling logic when including EVCache in KV plan

In our current logic (https://github.com/Netflix-Skunkworks/service-capacity-modeling/blob/main/service_capacity_modeling/models/org/netflix/key_value.py#L85), we scale the C* cluster by a factor of 1 - estimated_kv_cache_hit_rate, where estimated_kv_cache_hit_rate is configurable (default 0.8).

Per a previous convo with @jolynch and @szimmer1, we discussed possibly tying in the read/write ratio from the user desires into this calculation.

One toy example:

estimated_cache_hit_rate = extra_model_arguments.get("estimated_cache_hit_rate", 0.8)
estimated_cache_miss_rate = 1 - estimated_cache_hit_rate
rps_interval.scale(min(estimated_cache_miss_rate, max(0.1, 1 - read_write_ratio)))

Capacity plans should return recommended autoscaling policies

Right now we just make a recommendation like "12 m5d.2xlarge" but for software that can autoscale (stateless java apps, elasticsearch etc ...) it would be nice if we could return a hint of the autoscaling policy.

Step 1: Define how we will represent a scaling policy (e.g. how to represent various metrics like CPU utilization etc ...)
Step 2: Make the models return them

Unclear repetition

I'm working on summarizing the cost, cpu, disk (local & attached) for both regional and zonal clusters. I want there to be more consistency in the way repetition is represented.

us-east-1: # trimmed
us-west-2:
  least_regret:
    - candidate_clusters:
        total_annual_cost: # redacted
        zonal:
          - cluster_type: cassandra # trimmed
          - cluster_type: cassandra # trimmed
          - cluster_type: cassandra # trimmed
        regional:
          - cluster_type: dgwkv
            total_annual_cost: # redacted
            count: 3
            instance:
              total_annual_cost: # redacted
              name: r5.large
            attached_drives:
              - name: gp2
                size_gib: 20
                annual_cost_per_gib: # redacted
                annual_cost_per_read_io: # redacted
                annual_cost_per_write_io: # redacted

In the sample above there are:

  • enumerated list of regions
  • duplicated zonals
  • instance with a count property

For rightsizing ask for existing compute usage

Right now most models are split into two parts:

  1. Try to determine the resources you need for a desire using math on the desire (CPU, RAM, Disk, Network, etc ...). Example
  2. Size and price clusters based on that particular service deployment mode (e.g. C* has to scale by factors of 2, and deploys to zones). Example

This makes sense for provisioning where we are trying to guess CPU time from e.g. payload sizes and RPS and such. For rightsizing it might makes more sense to just provide existing choices in the desire along with utilization and then the model can produce a ideal hardware for that specific requirement. Perhaps modify CapacityDesires to have an additional field called existing_deployment that takes either a Requirements or a Clusters. Maybe with the modification of instead of supplying a frequency to requriements, have a hardware shape/count (the cpu_count would be cpu * utilization for example).

Then models can short circuit the requirements generation or at least use the provided numbers as good defaults. RAM is the only one that seems tricky to me that might require merging.

Adding a new model

Greetings,

I was asked to add a new model to your capacity planner. Are there general directions or documentation on what it takes to add a new model? I see that the existing models vary significantly on how they are implemented. Since the model is provided to the capacity planner, there must be a protocol somewhere that must be followed so the planner understands the model, but if there is, I can't find it.

Thank you,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.