netflix-skunkworks / service-capacity-modeling Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Now that gp3 is a thing let's add it to the hardware descriptions so we can use it.
In our current logic (https://github.com/Netflix-Skunkworks/service-capacity-modeling/blob/main/service_capacity_modeling/models/org/netflix/key_value.py#L85), we scale the C* cluster by a factor of 1 - estimated_kv_cache_hit_rate
, where estimated_kv_cache_hit_rate
is configurable (default 0.8).
Per a previous convo with @jolynch and @szimmer1, we discussed possibly tying in the read/write ratio from the user desires into this calculation.
One toy example:
estimated_cache_hit_rate = extra_model_arguments.get("estimated_cache_hit_rate", 0.8)
estimated_cache_miss_rate = 1 - estimated_cache_hit_rate
rps_interval.scale(min(estimated_cache_miss_rate, max(0.1, 1 - read_write_ratio)))
Right now we just make a recommendation like "12 m5d.2xlarge" but for software that can autoscale (stateless java apps, elasticsearch etc ...) it would be nice if we could return a hint of the autoscaling policy.
Step 1: Define how we will represent a scaling policy (e.g. how to represent various metrics like CPU utilization etc ...)
Step 2: Make the models return them
I'm working on summarizing the cost, cpu, disk (local & attached) for both regional and zonal clusters. I want there to be more consistency in the way repetition is represented.
us-east-1: # trimmed
us-west-2:
least_regret:
- candidate_clusters:
total_annual_cost: # redacted
zonal:
- cluster_type: cassandra # trimmed
- cluster_type: cassandra # trimmed
- cluster_type: cassandra # trimmed
regional:
- cluster_type: dgwkv
total_annual_cost: # redacted
count: 3
instance:
total_annual_cost: # redacted
name: r5.large
attached_drives:
- name: gp2
size_gib: 20
annual_cost_per_gib: # redacted
annual_cost_per_read_io: # redacted
annual_cost_per_write_io: # redacted
In the sample above there are:
Right now most models are split into two parts:
This makes sense for provisioning where we are trying to guess CPU time from e.g. payload sizes and RPS and such. For rightsizing it might makes more sense to just provide existing choices in the desire along with utilization and then the model can produce a ideal hardware for that specific requirement. Perhaps modify CapacityDesires
to have an additional field called existing_deployment
that takes either a Requirements
or a Clusters
. Maybe with the modification of instead of supplying a frequency to requriements, have a hardware shape/count (the cpu_count would be cpu * utilization for example).
Then models can short circuit the requirements generation or at least use the provided numbers as good defaults. RAM is the only one that seems tricky to me that might require merging.
Greetings,
I was asked to add a new model to your capacity planner. Are there general directions or documentation on what it takes to add a new model? I see that the existing models vary significantly on how they are implemented. Since the model is provided to the capacity planner, there must be a protocol somewhere that must be followed so the planner understands the model, but if there is, I can't find it.
Thank you,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.