Giter VIP home page Giter VIP logo

Comments (3)

Micheletto avatar Micheletto commented on June 22, 2024

Operations requirements for taking 10% of production users.

  • QA sign-off on supported Sync clients interactions with Sync-RS.
  • QA sign-off on server component with particular emphasis on load testing.
  • Operational readiness:
    • Add Sync-RS monitoring.
    • Review of __heartbeat__ endpoint.
    • Basic service documentation, this exists but should be updated.
  • Description of operational level agreement:
    • How slow is too slow? As measured by average upstream_response time, or network time?
    • What is the errors per second threshold before considering the service broken?
    • Uptime requirements: If GCPs Spanner service is down for > X minutes, we're down.
    • Any other availability or performance requirements.
  • A documented understanding that this phase of rollout is a live operational acceptance test.
    • Taking care to make a use vs. mention distinction, this service while named "durable sync" is not yet proven durable, and Operations is, for the purposes of this test, going to consider Sync-RS equivalent to the non-durable sync nodes and use the same toolset, including the data loss inducing user migrations, to deal with encountered problems.

from services-engineering.

Micheletto avatar Micheletto commented on June 22, 2024

Operations requirements to graduating service to durable status:

  • QA sign-off on supported Sync clients migration capabilities.
  • Updated acceptable use policy.
  • Documented methods for enforcing acceptable use policy.
    • Will we switch to enabling quotas for this service?
    • What are the requests per second, or maximum hourly/daily limits?
  • Automated encrypted backup plan for disaster recovery.
  • QA sign-off on DR recovery with particular emphasis on client behavior on Syncing to old data.
  • Load testing + predictable Spanner maintenance tasks:
    • Multi-part schema changes.
    • Spanner node increase.

from services-engineering.

tublitzed avatar tublitzed commented on June 22, 2024

Thank you so much for the details here, @Micheletto, this is incredibly helpful. I'm going to close this out and we can shift over to linked tasks from here.

Operations requirements for taking 10% of production users.

QA here is underway. I'll leave monitoring to you (unless there's anything else you need from us there, in which case let me know).

With regards to the heartbeat endpoint, I've opened this issue which we'll tackle next week.

With regards to the operational agreement, I'll take a stab at it next week: I opened this issue to track status there.

In terms of documenting expectations around "durability", I've added a final new "graduating to durable status" item to our rollout plan and will not plan on referring externally to Sync as "durable" until that happens.

Operations requirements to graduating service to durable status:

Thanks for this bit as well. Since we're looking at ~3 months until we get to this stage, I'm going to hold off on creating specific tasks here. As we get closer, we'll revisit it, and this won't be lost as it's clearly linked in the last phase of the plan to enable for new Sync users.

from services-engineering.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.