Comments (3)
Operations requirements for taking 10% of production users.
- QA sign-off on supported Sync clients interactions with Sync-RS.
- QA sign-off on server component with particular emphasis on load testing.
- Operational readiness:
- Add Sync-RS monitoring.
- Review of __heartbeat__ endpoint.
- Basic service documentation, this exists but should be updated.
- Description of operational level agreement:
- How slow is too slow? As measured by average upstream_response time, or network time?
- What is the errors per second threshold before considering the service broken?
- Uptime requirements: If GCPs Spanner service is down for > X minutes, we're down.
- Any other availability or performance requirements.
- A documented understanding that this phase of rollout is a live operational acceptance test.
- Taking care to make a use vs. mention distinction, this service while named "durable sync" is not yet proven durable, and Operations is, for the purposes of this test, going to consider Sync-RS equivalent to the non-durable sync nodes and use the same toolset, including the data loss inducing user migrations, to deal with encountered problems.
from services-engineering.
Operations requirements to graduating service to durable status:
- QA sign-off on supported Sync clients migration capabilities.
- Updated acceptable use policy.
- Documented methods for enforcing acceptable use policy.
- Will we switch to enabling quotas for this service?
- What are the requests per second, or maximum hourly/daily limits?
- Automated encrypted backup plan for disaster recovery.
- QA sign-off on DR recovery with particular emphasis on client behavior on Syncing to old data.
- Load testing + predictable Spanner maintenance tasks:
- Multi-part schema changes.
- Spanner node increase.
from services-engineering.
Thank you so much for the details here, @Micheletto, this is incredibly helpful. I'm going to close this out and we can shift over to linked tasks from here.
Operations requirements for taking 10% of production users.
QA here is underway. I'll leave monitoring to you (unless there's anything else you need from us there, in which case let me know).
With regards to the heartbeat endpoint, I've opened this issue which we'll tackle next week.
With regards to the operational agreement, I'll take a stab at it next week: I opened this issue to track status there.
In terms of documenting expectations around "durability", I've added a final new "graduating to durable status" item to our rollout plan and will not plan on referring externally to Sync as "durable" until that happens.
Operations requirements to graduating service to durable status:
Thanks for this bit as well. Since we're looking at ~3 months until we get to this stage, I'm going to hold off on creating specific tasks here. As we get closer, we'll revisit it, and this won't be lost as it's clearly linked in the last phase of the plan to enable for new Sync users.
from services-engineering.
Related Issues (20)
- Onboarding docs for Mark Drobnak HOT 3
- Audit and update Metrics HOT 2
- META: Convert travis CI commands to circle-ci
- syncstorage-rs latency spikes HOT 2
- syncstorage-rs logging HOT 4
- 0.5.0 load test anomaly HOT 4
- Investigate syncstorage-rs stage timeouts HOT 2
- Move DS runbook to wiki
- syncstorage-rs memory consumption HOT 1
- Broadcast Bounce Mitigation
- Add bookmark generator
- Update webpush test page tracking bug
- Meta: Update various CI to use new docker login HOT 4
- META: Ensure that instances are not using travis-ci.org
- Change syncstorage's default keepalive setting HOT 1
- Technical overview for Project Cumulus service HOT 2
- Cumulus Service PRD HOT 3
- Move off of Travis
- Create a new repo/skeleton for Project Cumulus service HOT 5
- QA's syncstorage-loadtest env broken HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from services-engineering.