Hi Guys! I'm looking for information in your solution about failover

Well, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

allow cluster 'watching', but don't mandate it </blockqu

Hi everyone! <a class="user-mention notranslate" data-hovercard-type="user" data-h

Was also looking at an alternative using stolon / etcd: <a href="https://medium.co

Failover and recovery about postgres-operator HOT 18 CLOSED

crunchydata commented on May 20, 2024

Failover and recovery

from postgres-operator.

Comments (18)

paunin commented on May 20, 2024 2

Well, @jmccormick2001 I was asking because looking for alternative solution and understand benefits against mine, please let me know whenever you got something around... I'm all into this topic! thanks :)

from postgres-operator.

jmccormick2001 commented on May 20, 2024 1

right now there is only a full backup (using pg_basebackup)...the other forms of backup and a scheduling capability are in the works too but they will likely lag behind the failover and policy releases.

from postgres-operator.

jmccormick2001 commented on May 20, 2024

its something we are definitely going to add, the containers we use under the hood on this support a watch/failover capability, we will leverage some of that into what the operator will allow...
current thinking is something like pgo create watch mycluster or something similar would let a user set up a failover watch on a cluster, ideally we would like to support different failover strategies similar to the way we support different cluster strategies. But for sure, the operator would let you trigger a recovery. so stay tuned, it will happen in an upcoming release of the operator.

from postgres-operator.

thekalinga commented on May 20, 2024

@jmccormick2001

Assuming that master went down

Isn't failover supposed to be automatic, based on which replica has most of the data of the master & promote him as master?
(or) automatic mounting pvc of (previous)master to one of the replicas and promote him as the new master?

My assumption was that the operator is supposed to watch over the instances & take care of master selection (on bootstrap & on master failure) & coordination among the participating pods.

Please correct me incase if I am missing anything

from postgres-operator.

jmccormick2001 commented on May 20, 2024

this is a good topic for sure and one I'm thinking about...some design ideas in my head include:

allow cluster 'watching', but don't mandate it
allow for a user initiated faillover, where the user can trigger the failover to whatever replica they choose
allow for a failover to a 'certain' replica using a configurable/programmable selection algorithm (based on metadata of a replica, or replication status, other)
allow for a pre-hook and post-hook script to be executed when a failover is triggered
allow for continuous cluster watching after a failover has occurred (updating labels as required to enable proper client routing)
allow for killing off stale replicas after a failover

from postgres-operator.

paunin commented on May 20, 2024

allow cluster 'watching', but don't mandate it

usually you dont want cluster without failover logic

allow for a user initiated faillover, where the user can trigger the failover to whatever replica they choose

Swichover is the correct term here :P as we are not failing ... and yeah it's something i have not developed.

allow for a failover to a 'certain' replica using a configurable/programmable selection algorithm (based on metadata of a replica, or replication status, other)

That might be solved by replica priority, and quorum election of course to avoid split-brain... so your priority should not mean much in case of network issues but it will in case of master bad health.

allow for a pre-hook and post-hook script to be executed when a failover is triggered

Any practical use for this one?

allow for continuous cluster watching after a failover has occurred (updating labels as required to enable proper client routing)

For that one I use pgpool ;) it does it's job, excluding nodes from list of backends based on different conditions and health checks to postgres servers

allow for killing off stale replicas after a failover

That I don't understand at all :(

Sorry for going through the list of your ideas, but looks like I've been in that state half an year ago. And can help you with some of them if you need help ofcourse :)
I like the Idea to implement special API for DB objects in k8s but also think that you might want to segregate responsibilities:

reliable dockerized postgres cluster
management of the cluster using k8s stuff and facilities

PS
I would invest some time in first one(build or adapt my images) while you could focus on wrapper you are developing right now. Let me know if you are interested in and have wish to build something a bit more cooler than we've done separately. Cheers!

from postgres-operator.

hartmut-pq commented on May 20, 2024

Hi everyone!
@jmccormick2001 the postgres-operator is looking very exciting, nice work and overall concept.
I was looking into failover/HA capabilities, too but couldn't really find out what's the current status.
Currently if the cluster master dies the cluster is simply down?

Keep up the good work!

from postgres-operator.

hartmut-pq commented on May 20, 2024

Was also looking at an alternative using stolon / etcd:
https://medium.com/@SergeyNuzhdin/how-to-deploy-ha-postgresql-cluster-on-kubernetes-3bf9ed60c64f

from postgres-operator.

jmccormick2001 commented on May 20, 2024

thanks, currently the master runs in a Deployment which 'should' get rescheduled by Kube if the Kube node dies, Kube 'should' restart the pod if it dies as well to keep the Deployment consistent. However, the gotcha here is what if the master's data is somehow corrupted? Kube in that case will just restart the bad database over and over. What I'm considering is a more formal way to specify that "I want to fail over to this replica", I have some ideas on how this will work but want to give it some extra thought before I do the implementation. I also want a means of specifying a 'sync' replica that a user could specifically target as the failover target. This is definitely a high priority for an upcoming release so stay tuned. There is also the case where a user might want an 'automated failover', I'm thinking about that use case as well.

from postgres-operator.

hartmut-pq commented on May 20, 2024

Thanks for the immediate reply!
Simple use case: HA on AWS...
e.g. you're running a Multi-AZ k8s cluster on eu-west-1a, eu-west-1b, eu-west-1c
and you want your pgcluster running on k8s to be HA in case one A-Z goes down. In the case that the node on an AZ with the pgcluster master goes down - the EBS being AZ-restrictive isn't available to mount on any other node/pod (AZ down..) - so one of the replicas running on other pods on other nodes on another AZ would immediately and automatically take over.

from postgres-operator.

hartmut-pq commented on May 20, 2024

That's not AWS specific though... just in general - if the db is the critical point of failure - which is for most applications in some way.. and you are aiming towards 100% uptime...

So in case of a sudden node death or downtime, 1-2min for k8s to free the PVC, mount on another node + restart the pod somewhere else is quite a long downtime - when there are replicas available that could take over...

from postgres-operator.

jmccormick2001 commented on May 20, 2024

understood, there is definitely the case where users will want to orchestrate a failover onto a specific replica regardless of what Kube might do

from postgres-operator.

hartmut-pq commented on May 20, 2024

Many thanks for your input! I don't want to urge or anything - do you have a rough timescale / idea when some sort of automatic failover may get realised? May I help in any way?
I need to choose a way/option going forward how to build/manage a pg cluster and this operator seems like a pretty good candidate... :-)

from postgres-operator.

jmccormick2001 commented on May 20, 2024

no worries, the current roadmap/schedule is to release the 'policy' mechanism in the next week or so, this is a new feature that lets you apply SQL policies against clusters....right after that is the failover work...so I'm shooting to have some form of failover feature in about the 4-5 week time frame, hopefully earlier. Once I have some early work done on this, I'll reach out and see if you could do a sanity check on it.

from postgres-operator.

hartmut-pq commented on May 20, 2024

👍 will try to keep track, happy to help with sanity check.
One more question - is there any way to have automated backups? It's always full backups, right? What about incremental backups?

from postgres-operator.

hartmut-pq commented on May 20, 2024

That all sounds highly promising 👏 !

from postgres-operator.

jmccormick2001 commented on May 20, 2024

manual failover is coded and will land in upcoming 2.6 release.

from postgres-operator.

jmccormick2001 commented on May 20, 2024

auto failover (first cut) will land in operator 3.1 soon to be released.

from postgres-operator.

Failover and recovery about postgres-operator HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent