Comments (3)
We used to keep track of the last poll time, and we had a special case for missed polls, but we decided to simplify the behavior:
29d77a4#diff-2b45014017c4137b6080dc2c186ca63f
We used to return NaN instead of the rate based on the deltas though, but the major motivation was the significant reduction in the memory footprint of the class.
from servo.
it seems that implementation assumes that polling will never really lag and does not try handle stragglers gracefully
Generally with telemetry at Netflix, we distinguish between operational and business use-cases. For operational use-cases the key question we are trying to answer is what is going on right now and we have SLAs around when data must be actionable. That is we have alerts and automated remediation based on the data that is coming in and we have to know when the data is complete and can be trusted so we can act on it. There has also been a push over time to reduce the mean-time to detect meaning we need to keep the SLA as tight as possible. So generally we do not want the operational data to lag or cover up that it is failing to meet the publishing SLA, but that does mean that it is more likely there will be some data loss.
For business intelligence use-cases the no data loss is a big concern, but our team is not focused on those use-cases.
from servo.
Thank you both for your input. It makes sense.
Generally with telemetry at Netflix, we distinguish between operational and business use-cases. [...] So generally we do not want the operational data to lag or cover up that it is failing to meet the publishing SLA, but that does mean that it is more likely there will be some data loss. For business intelligence use-cases the no data loss is a big concern
My use of "straggler" was perhaps misleading. My use-case is not not a BI one. What led me to filling this issue was a feature team questioning me about the following graph (CW: 5mn period):
They were puzzled to only see 0s. "Why a Counter would be reactived if there is no increment?" Zooming to a 1mn period and throubleshooting their service I was able to explain them what is going on (thread poll is satured, some steps are not polled, data is lost, you will never know what the error rate was).
However, they then asked "Why would Servo discard missed steps rather than gracefully handle them by computing the rate over the observed polling period?" I wanted to provide them first hand answer.
BTW, If Servo expects that polling must not be too delayed, which is reasonable, wouldn't it be useful to warn if late polling is observed? (IIRC HikariCP does something like that). I know how to fix their issue, but am afraid some other feature team does the same mistake. How do you prevent such broken config to reach production?
from servo.
Related Issues (20)
- PollScheduler maximum threads? Configurable? HOT 2
- CloudWatchMetricObserver.putMetricData swallows AmazonServiceException HOT 2
- CloudWatchMetricObserver not pushing metrics to CloudWatch HOT 3
- Memory leak suspect in BasicMonitorRegistry HOT 14
- Observer for InfluxDb HOT 2
- Booleans not changed into int values by MonitorRegistryMetricPoller HOT 4
- AwsInjectableTag should be lazily initialised HOT 1
- Support CloudWatch high-resolution metrics HOT 5
- Heapdump support? HOT 1
- release build failed because it tried to publish for jdk9
- System.getProperties() in DefaultMonitorRegistry.java HOT 3
- JmxMetricPoller: Trouble adding Tabular metrics HOT 2
- Sometimes deploying a zuul project can cause such problems. HOT 1
- .travis.yml: The 'sudo' tag is now deprecated in Travis CI
- upgrade Guava dependency to something recent HOT 6
- Ufyfug
- Hello
- Integration of servo-core into OSS-Fuzz HOT 1
- Netflib
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from servo.