k8gb-io / k8gb Goto Github PK
View Code? Open in Web Editor NEWA cloud native Kubernetes Global Balancer
Home Page: https://www.k8gb.io
License: Apache License 2.0
A cloud native Kubernetes Global Balancer
Home Page: https://www.k8gb.io
License: Apache License 2.0
Makefile targets are dependent on full.sh
and environment variables declared at the same file.
GNU call Functions
to get rid of dependencies on full.sh
.full.sh
is used (i.e. terratest build pipe)What we have seen in Infoblox that we have a zone ohmyglb.test.good.org
if we have a delegated zone mynewglb "mynewglb.ohmyglb.test.good.org" that gets created with the Solution see that it creates duplicate zones
pne in the parent zone inside "test.good.org" as a full record inside the test.good.org as mynewglb.ohmyglb inside test.good.org and not below the ohmyglb.test.goo.org as new delegated Zone.
As per the supported load balancing strategies in the initial design a failover strategy should be implemented to ensure the guarantees stated:
Failover - Pinned to a specified primary cluster until that cluster has no available Pods, upon which the next available cluster's Ingress node IPs will be resolved. When Pods are again available on the primary cluster, the primary cluster will once again be the only eligible cluster for which cluster Ingress node IPs will be resolved
Scenario 1:
Deployment
with a backend Service
called app
and that backend service exposed with a Gslb
resource on all 2 clusters as:apiVersion: ohmyglb.absa.oss/v1beta1
kind: Gslb
metadata:
name: app-gslb
namespace: test-gslb
spec:
ingress:
rules:
- host: app.cloud.example.com
http:
paths:
- backend:
serviceName: app
servicePort: http
path: /
strategy: failover
primary: cluster-x
cluster-x-worker-1: 10.0.1.10
cluster-y-worker-1: 10.1.1.11
When issuing the following command, curl -v http://app.cloud.example.com
, I would expect the IP's resolved to reflect as follows (if this command was executed 3 times consecutively):
$ curl -v http://app.cloud.example.com # execution 1
* Trying 10.0.1.10...
...
$ curl -v http://app.cloud.example.com # execution 2
* Trying 10.0.1.10...
...
$ curl -v http://app.cloud.example.com # execution 3
* Trying 10.0.1.10...
...
The resolved node IP's that ingress traffic will be sent should be "pinned" to the primary
cluster named explicitly in the Gslb
resource above, even though there was a healthy Deployment
in cluster Y, the Ingress node IPs for cluster Y would not be resolved.
Scenario 2:
Deployment
only has healthy Pods on one cluster, cluster Y. I.e. The Deployment
on cluster X has no healthy Pods.When issuing the following command, curl -v http://app.cloud.example.com
, I would expect the IP's resolved to reflect as follows (if this command was executed 3 times consecutively):
$ curl -v http://app.cloud.example.com # execution 1
* Trying 10.1.1.11...
...
$ curl -v http://app.cloud.example.com # execution 2
* Trying 10.1.1.11...
...
$ curl -v http://app.cloud.example.com # execution 3
* Trying 10.1.1.11...
...
In this scenario, only Ingress node IPs for cluster Y are resolved given that there is not a healthy Deployment
for the Gslb
host on the primary
cluster, cluster X. Therefore, the "failover" cluster(s) are resolved instead (cluster Y in this scenario).
Now, given that the Deployment
on cluster X (the primary cluster) now becomes healthy once again, I would expect the IP's resolved to reflect as follows (if this command was executed 2 times consecutively):
$ curl -v http://app.cloud.example.com # execution 1
* Trying 10.0.1.10...
...
$ curl -v http://app.cloud.example.com # execution 2
* Trying 10.0.1.10...
...
The primary cluster's Ingress node IPs are now resolved exclusively once again.
NOTE:
There is number of internal timeouts that currently provided as default hardcoded values within Gslb:
We need to make them configurable via CR spec (DNS TTL would probably go there) and operator deployment configuration ( Reconcile loop and ext-dns values)
Currently k8gb fully relies on underlying Ingress that it controls. It has limitations of exclusive http/https support so services on another ports and another protocols are not covered by Gslb operations.
We can change that by enabling k8gb to control underlying Service of type LoadBalancer. It will create mechanism to expose any tcp/udp service on any port and protocol. We definitely should keep current Ingress operations for L7 http/https .
Service enabled spec might look like:
apiVersion: k8gb.absa.oss/v1beta1
kind: Gslb
metadata:
name: test-gslb
namespace: test-gslb
spec:
loadBalancer:
host: app1.cloud.example.com
serviceName: app # Service with type: LoadBalancer
strategy:
type: roundRobin
We should implement the necessary tooling and infrastructure to be able to publish the Helm chart to various chart repositories when a release is tagged.
As a start we should support the following repositories:
GitHub Pages
JFrog Artifactory
Given the five phases of the Operator Maturity Model:
Consider all phases and implement the necessary tasks to achieve the desired maturity phase.
Originally ohmyglb
operator was generated with operator-sdk
version v0.12.0
There is no urgent need but eventually we might want to upgrade to latest v0.16.0
( at the time of writing)
Upgrade path doc upstream: https://github.com/operator-framework/operator-sdk/blob/master/doc/migration/version-upgrade-guide.md
I would suggest we make it after implementation of strong e2e test suite to avoid wasting time on manual e2e regression testing
Great work guys... looking very promising. Give the team a hug and a donut from me.
When we have mismatch between https://github.com/AbsaOSS/k8gb/blob/master/chart/k8gb/values.yaml#L11 and Gslb Ingress host like https://github.com/AbsaOSS/k8gb/blob/master/deploy/crds/k8gb.absa.oss_v1beta1_gslb_cr_failover.yaml#L9 we are going to send non-valid configuration to Infoblox/EdgeDNS
We probably want to:
Steps to reproduce
$ make deploy-full-local-setup
localtargets.*
dnsendpoint conf$ kubectl -n test-gslb get dnsendpoints test-gslb -o yaml
...
spec:
endpoints:
- dnsName: localtargets.app3.cloud.example.com
recordTTL: 30
recordType: A
targets:
- 172.17.0.2
- 172.17.0.3
- 172.17.0.4
...
dig +short @localhost localtargets.app3.cloud.example.com
172.17.0.2
172.17.0.4
172.17.0.3
This is expected result. After some time localtargets.*
can 'lose' one of the records in the following way:
localtargets.*
dnsendpoint conf is always consistent$ kubectl -n test-gslb get dnsendpoints test-gslb -o yaml
...
spec:
endpoints:
- dnsName: localtargets.app3.cloud.example.com
recordTTL: 30
recordType: A
targets:
- 172.17.0.2
- 172.17.0.3
- 172.17.0.4
...
dig +short @localhost localtargets.app3.cloud.example.com
172.17.0.2
172.17.0.4
Issue is not really deterministic in its behaviour . Meanwhile we faced it several times over multiple deployments
In case of 2 cluster setup only single cluster is affected effectively making exposed through coredns only 5 out of 6 k8s worker.
DNSEndpoint
CR generation looks always correct so the problem is somewhere in etcd coredns backend area.
make debug-test-etcd
can help in debugging this issue runtime.
Refactor controller tests
Follow standard template as shown below:
func TestReflectGeoTagInTheStatus(t *testing.T) {
defer cleanup()
init()
//arrange
//act
//assert
}
func TestSomethingElse(t *testing.T) {
defer cleanup()
init()
//arrange
//act
//assert
}
//this will be executed for each test that requires
func cleanup(){
//cleaning env vars, resources etc
}
//this will be executed for each test that requires
func init(){
// init reconcilers, clients etc.. whatever
}
//TestMain is reserved GO func. It says that: this will be executed only once for all tests. Don't use that if we don't need
func TestMain(m testing.M){
//cleaning reasources for all tests i.e. remove cluster, or cloud resources etc...
defer TearDown()
//prepare shared resources for all tests. i.e. create test cluster, queue etc..
InitForAllTests()
t.Main()
}
The default round robin load balancing strategy should be implemented in a way that gives consistent results when resolving a Gslb
host.
âšī¸ Load balancing in the context of this feature corresponds to the distribution of resolved IP's and does not refer in any way to the actual balancing of network traffic
Scenario:
Deployment
with a backend Service
called app
and that backend service exposed with a Gslb
resource on all 3 clusters as:apiVersion: ohmyglb.absa.oss/v1beta1
kind: Gslb
metadata:
name: app-gslb
namespace: test-gslb
spec:
ingress:
rules:
- host: app.cloud.example.com
http:
paths:
- backend:
serviceName: app
servicePort: http
path: /
strategy: roundRobin
cluster-x-worker-1: 10.0.1.10
cluster-y-worker-1: 10.1.1.11
cluster-z-worker-1: 10.2.1.12
When issuing the following command, curl -v http://app.cloud.example.com
, I would expect the IP's resolved to reflect as follows (if this command was executed 6 times consecutively):
$ curl -v http://app.cloud.example.com # execution 1
* Trying 10.0.1.10...
...
$ curl -v http://app.cloud.example.com # execution 2
* Trying 10.1.1.11...
...
$ curl -v http://app.cloud.example.com # execution 3
* Trying 10.2.1.12...
...
$ curl -v http://app.cloud.example.com # execution 4
* Trying 10.0.1.10...
...
$ curl -v http://app.cloud.example.com # execution 5
* Trying 10.1.1.11...
...
$ curl -v http://app.cloud.example.com # execution 6
* Trying 10.2.1.12...
...
As above, the resolved node IP's that ingress traffic will be sent should be evenly "load balanced" between the clusters.
NOTE:
Most of the current use cases, automated tests and functional test scenarios were performed in context of 2 clusters only as it is most common and easier to test setup.
Meanwhile we need to be ready and test for >2 scenario cluster scenario.
This series of testing can reveal potential edge cases in current LB strategies implementation and can lead to LB logic extension and handling of >2 number of clusters
In addition to the default Infoblox support, Route 53 should be added as an additional edge DNS option.
This should include examples and documentation on the usage and configuration of this option.
Related to this note on #46:
The existence of multiple "secondary" failover clusters should also be considered. For example, if there were 3 clusters (X, Y and Z) in the scenario 2 above, could the Ingress node IPs for both clusters (X and Z) be resolved and if so, how (in terms of "load balancing") would the Ingress node IPs across both those secondary/failover clusters be resolved? Would they use the default round robin strategy, if any strategy at all?
we need to validate that the IPs resolved from the secondary/failover clusters are handled correctlty and consistently in line with a load balancing strategy of their own.
#91 implements end-to-end testing of most of ohmyglb functionality with exception of EdgeDNS part which includes
As it is very special part of the codebase and very environment specific (Infoblox, Route53, ...) we need to address it separately from the rest of terratest e2e pipeline.
Currently we have Gslb Status content as shown in example below:
status:
healthyRecords:
app3.cloud.example.com:
- 172.17.0.4
- 172.17.0.5
- 172.17.0.6
managedHosts:
- app1.cloud.example.com
- app2.cloud.example.com
- app3.cloud.example.com
serviceHealth:
app1.cloud.example.com: NotFound
app2.cloud.example.com: Unhealthy
app3.cloud.example.com: Healthy
It is easily visible that we have duplication between the managedHosts
and serviceHealth
It happened historically during development of the project. Initially serviceHealth
looked like
serviceHealth:
unhealthyServiceName: NotFound
backend: Unhealthy
frontend: Healthy
so basically referenced serviceName
that is referenced in associated gslb ingress.
Later on it was changed to ingressHost:Status
as it became apparent that it is practical to have them in a single data structure.
So some questions to discuss:
serviceName
exposed somewhere in Status
?serviceHealth
or place it somewhere else ?managedHosts
High level feature to add support for all major public clouds:
Every public cloud above represents tasks related to:
The problem statement is as follows:
there are several places in the code where we directly read input configuration (i.e. strategy.type,...)
Exclude GeoTag
from refatoring please...
Move all inputs into depresolver
and
Implement relevant metrics that provide insight into:
Metrics should be expressed as Prometheus text exposition format.
Consider the following CoreDNS plugins as reference:
Existing build pipelines should be enhanced with full end to end integration tests running in managed Kubernetes clusters.
The goal of these tests would be to simulate:
These tests should be triggered at strategic and meaningful points in the development flow, considering that this test suite could take considerable time to run to completion.
It's somehow hard to reproduce but we periodically facing
time="2020-01-29T17:08:08Z" level=debug msg="Skipping endpoint localtargets.app3.cloud.example.com 30 IN A 172.17.0.2;172.17.0.4;172.17.0.5 [] because owner id does not match, found: \"\", required: \"\"ohmyglb\"\""
time="2020-01-29T17:08:08Z" level=debug msg="Skipping endpoint localtargets.app3.cloud.example.com 30 IN A 172.17.0.5 [] because owner id does not match, found: \"\", required: \"\"ohmyglb\"\""
time="2020-01-29T17:08:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/6904e7b4 to Host=172.17.0.2, Text=\"heritage=external-dns,external-dns/owner=\"ohmyglb\",external-dns/resource=crd/test-gslb/test-gslb\", TTL=30"
time="2020-01-29T17:08:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/6904e7b4 to Host=172.17.0.4, Text=, TTL=30"
time="2020-01-29T17:08:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/6904e7b4 to Host=172.17.0.5, Text=, TTL=30"
time="2020-01-29T17:09:08Z" level=debug msg="Skipping endpoint localtargets.app3.cloud.example.com 30 IN A 172.17.0.2;172.17.0.4;172.17.0.5 [] because owner id does not match, found: \"\", required: \"\"ohmyglb\"\""
time="2020-01-29T17:09:08Z" level=debug msg="Skipping endpoint localtargets.app3.cloud.example.com 30 IN A 172.17.0.5 [] because owner id does not match, found: \"\", required: \"\"ohmyglb\"\""
time="2020-01-29T17:09:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/6904e7b4 to Host=172.17.0.2, Text=\"heritage=external-dns,external-dns/owner=\"ohmyglb\",external-dns/resource=crd/test-gslb/test-gslb\", TTL=30"
time="2020-01-29T17:09:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/6904e7b4 to Host=172.17.0.4, Text=, TTL=30"
time="2020-01-29T17:09:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/6904e7b4 to Host=172.17.0.5, Text=, TTL=30"
time="2020-01-29T17:10:08Z" level=debug msg="Skipping endpoint localtargets.app3.cloud.example.com 30 IN A 172.17.0.5 [] because owner id does not match, found: \"\", required: \"\"ohmyglb\"\""
Recreation of gslb CR fixes the issue
time="2020-01-29T17:10:08Z" level=info msg="Delete key /skydns/com/example/cloud/app3/6904e7b4"
time="2020-01-29T17:10:08Z" level=info msg="Delete key /skydns/com/example/cloud/app3"
time="2020-01-29T17:13:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/6d89b98f to Host=172.17.0.2, Text=\"heritage=external-dns,external-dns/owner=\"ohmyglb\",external-dns/resource=crd/test-gslb/test-gslb\", TTL=30"
time="2020-01-29T17:13:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/483cc44e to Host=172.17.0.4, Text=, TTL=30"
time="2020-01-29T17:13:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/732d34f7 to Host=172.17.0.5, Text=, TTL=30"
time="2020-01-29T17:13:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/localtargets/03acc020 to Host=172.17.0.2, Text=\"heritage=external-dns,external-dns/owner=\"ohmyglb\",external-dns/resource=crd/test-gslb/test-gslb\", TTL=30"
time="2020-01-29T17:13:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/localtargets/4b25072e to Host=172.17.0.4, Text=, TTL=30"
time="2020-01-29T17:13:08Z" level=info msg="Add/set key /skydns/com/example/cloud/app3/localtargets/4fe76c50 to Host=172.17.0.5, Text=, TTL=30"
Looks like a race condition somewhere between external-dns
/ etcd backend population of local coredns
We already have great design architecture docs in place, now we need to amend them with specific implementation details depicting open source components we are using to solve specific tasks, e.g.
Fancy diagrams are welcomed.
gosec started to fail with
[/github/workspace/pkg/apis/ohmyglb/v1beta1/zz_generated.deepcopy.go:113] - G601 (CWE-): Implicit memory aliasing in for loop. (Confidence: MEDIUM, Severity: MEDIUM)
> &val
Summary:
Files: 20
Lines: 1969
Nosec: 0
Issues: 1
on generated code.
Does not look super critical but worth tracking.
I've unblocked pipeline by sticking to stable v2.2.0 release at #114
As per the supported load balancing strategies in the initial design a manual strategy should be implemented to ensure the guarantees stated:
Manual - Eligibility is manually specified as to which cluster(s) are eligible. If there are no available Pods in the specified clusters, then no cluster Ingress node IPs will be resolved and the client will get a NXDOMAIN response
Scenario 1:
Deployment
with a backend Service
called app
and that backend service exposed with a Gslb
resource on all 2 clusters as:apiVersion: ohmyglb.absa.oss/v1beta1
kind: Gslb
metadata:
name: app-gslb
namespace: test-gslb
spec:
ingress:
rules:
- host: app.cloud.example.com
http:
paths:
- backend:
serviceName: app
servicePort: http
path: /
strategy: manual
clusters:
- cluster-x
cluster-x-worker-1: 10.0.1.10
cluster-y-worker-1: 10.1.1.11
When issuing the following command, curl -v http://app.cloud.example.com
, I would expect the IP's resolved to reflect as follows (if this command was executed 3 times consecutively):
$ curl -v http://app.cloud.example.com # execution 1
* Trying 10.0.1.10...
...
$ curl -v http://app.cloud.example.com # execution 2
* Trying 10.0.1.10...
...
$ curl -v http://app.cloud.example.com # execution 3
* Trying 10.0.1.10...
...
The resolved node IP's will always be those of cluster X. Even if cluster X has no healthy Deployment
s and cluster Y does, the NXDOMAIN
response will be returned regardless. This strategy allows full manual control over which cluster's Ingress node IPs can be resolved.
NOTE:
There is a possible issue that we need to have Separate DNS Zones for different Datacenters to cater for people who can only run Active Passive and then also hvae the GLB
Example is DC1-us.k8gb.gb and DC2-us.k8gb.gb then also DC.k8gb.gb
The request is to add support to host multiple DNS Zones in K8GB and not just one zone
This would allow in the use case where you may have more than 1 DNS server to query (for redundancy)
edgeDNSServer: &edgeDNSServer:
- 1.1.1.1
- 2.2.2.2
As followup to
#97 (comment),
let's create sample Graphana dashboard showing off exposed prometheus metrics.
Dashboard can be uploaded with some sample data here:
https://grafana.com/grafana/dashboards?orderBy=name&direction=asc
We lint project from /pkg/controller
folder due to invalid /pkg/apis/ohmyglb/v1beta1/zz_generated.openapi.go
. Generated file is obsolete and functions within are not used
zz_generated.openapi.go
Currently we are solving potential splitbrain situation of zone delegation configuration by putting timestamp as TXT to heartbeat dns record in edgeDNS zone. Implementation details are in #44
During e2e tests we realized that TTL for this TXT record is implicitly inherited by dns zone configuration of Infoblox.
When TTL is longer then we have an obvious problem with liveness check incorrectness.
We need to find a reliable way to control TTL of edgeDNS splitbrain TXT records.
Current hypo is to use special API call of TTL update
It will also require another upstream modification of https://github.com/infobloxopen/infoblox-go-client as it does not support TTL handling.
Issue is still workaroundable by setting low TTL in edgeDNSzone configuration in Infoblox
In preparation for the 1.0
release, complete the following items:
All documentation and publishing items above should have the corresponding infrastructure/pipelines implemented for automated publishing etc.
I've been trying to install this chart following the instructions in the helm 3 section of the readme, but I'm stuck waiting on etcd pods that won't get past the "pending" status.
Any ideas?
Here are the steps I took:
helm install --debug ohmyglb ohmyglb/ohmyglb
At this point most of the pods are deployed, but the following etcd pods will get stuck in "pending".
Kubectl get pods output:
NAMESPACE NAME READY STATUS RESTARTS AGE
default ohmyglb-coredns-6b7c98d545-hh8kh 1/1 Running 0 13m
default ohmyglb-etcd-operator-etcd-backup-operator-7b7765f455-28m5j 1/1 Running 0 13m
default ohmyglb-etcd-operator-etcd-operator-5d69b5449d-vktx9 0/1 Pending 0 13m
default ohmyglb-etcd-operator-etcd-restore-operator-5ccc8fccf5-8fqgg 0/1 Pending 0 13m
ohmyglb external-dns-5844f58797-pdshk 1/1 Running 0 13m
ohmyglb ohmyglb-7c5d899559-zbwlp 1/1 Running 0 13m
ohmyglb-7c5d899559-zbwlp logs:
{"level":"info","ts":1588015395.9359705,"logger":"cmd","msg":"Operator Version: 0.5.6"}
{"level":"info","ts":1588015395.9365296,"logger":"cmd","msg":"Go Version: go1.14"}
{"level":"info","ts":1588015395.9368103,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1588015395.937081,"logger":"cmd","msg":"Version of operator-sdk: v0.16.0"}
{"level":"info","ts":1588015395.9375331,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1588015396.6210973,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1588015396.6304686,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1588015397.2903323,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1588015397.2911112,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1588015399.4246273,"logger":"metrics","msg":"Metrics Service object created","Service.Name":"ohmyglb-metrics","Service.Namespace":"ohmyglb"}
{"level":"info","ts":1588015400.085832,"logger":"cmd","msg":"Could not create ServiceMonitor object","error":"an empty namespace may not be set during creation"}
{"level":"info","ts":1588015400.0861387,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1588015400.0868874,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"gslb-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1588015400.0877016,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"gslb-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1588015400.088064,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"gslb-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1588015400.0884078,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"gslb-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1588015400.0889385,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1588015400.1968875,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"gslb-controller"}
{"level":"info","ts":1588015400.199356,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"gslb-controller","worker count":1}
Some info that might be useful:
Running in a MacBook Pro
helm v3.2.0
docker desktop v2.2.0.5 (engine v19.03.8 and k8s v1.15.5)
We have non-deterministic test case within terratest pipeline which is failing from time to time
2020-04-27T22:57:56.9670279Z TestOhmyglbBasicAppExample 2020-04-27T22:57:56Z command.go:158: 'map[app1.cloud.example.com:NotFound app2.cloud.example.com:Unhealthy app3.cloud.example.com:Unhealthy]'
2020-04-27T22:57:56.9670560Z TestOhmyglbBasicAppExample: ohmyglb_basic_app_test.go:100:
2020-04-27T22:57:56.9670722Z Error Trace: ohmyglb_basic_app_test.go:100
2020-04-27T22:57:56.9670841Z Error: Not equal:
2020-04-27T22:57:56.9671273Z expected: "'map[app1.cloud.example.com:NotFound app2.cloud.example.com:Unhealthy app3.cloud.example.com:Unhealthy]'"
2020-04-27T22:57:56.9671659Z actual : "'map[app1.cloud.example.com:NotFound app2.cloud.example.com:Unhealthy app3.cloud.example.com:Healthy]'"
2020-04-27T22:57:56.9671778Z
2020-04-27T22:57:56.9671897Z Diff:
2020-04-27T22:57:56.9672149Z --- Expected
2020-04-27T22:57:56.9672263Z +++ Actual
2020-04-27T22:57:56.9672512Z @@ -1 +1 @@
2020-04-27T22:57:56.9672864Z -'map[app1.cloud.example.com:NotFound app2.cloud.example.com:Unhealthy app3.cloud.example.com:Unhealthy]'
2020-04-27T22:57:56.9673213Z +'map[app1.cloud.example.com:NotFound app2.cloud.example.com:Unhealthy app3.cloud.example.com:Healthy]'
2020-04-27T22:57:56.9673353Z Test: TestOhmyglbBasicAppExample
2020-04-27T22:57:56.9673806Z TestOhmyglbBasicAppExample 2020-04-27T22:57:56Z command.go:87: Running command kubectl with args [--namespace ohmyglb-test-n68arr delete -f
We encountered an issue with Gslb.Status
updated after scaling down sample workload and bringing it back. It took unusual amount of time for ohmyglb to pick up recent service health status correctly, also we observed json unmarshall error in the logs
Cannot unmarshall to check empty value '', err: 'unexpected end of JSON input'
It worth to mention that actual ohmyglb operations were not affected - DNSEndpoints were updated and associated DNS responses were constructed properly as expected
Main use case is TLS enabled ingress and cert-manager integration
Spin off from #47
OhMyGLB supports integration with Prometheus Operator thanks to Operator SDK native support.
However, actual procedure of Prometheus Operator setup for proper scraping is not clear, so it makes sense to reflect it in OhMyGLB documentation.
We currently didn't face any problems with the current setup but helm3 provides better crd installation support than just putting them under under templates
This issue is a spin off from #47
Implement relevant tracing that provide insight into:
Current consideration is to use OpenTracing Go API https://opentracing.io/guides/golang/.
This implies manual code instrumentation unless more non-intrusive solution is found.
We need to create contribution guide which would capture development topics like
make
targetsThis chunk of documentation will be useful both for us and external contributors.
As per the supported load balancing strategies in the initial design a weighted round robin strategy should be implemented to ensure the guarantees stated:
Weighted round robin - Specialisation of the above (default round robin #45) strategy but where a percentage weighting is applied to determine which cluster's Ingress node IPs to resolve. E.g. 80% cluster X and 20% cluster Y
Scenario 1:
Deployment
with a backend Service
called app
and that backend service exposed with a Gslb
resource on cluster X as:apiVersion: ohmyglb.absa.oss/v1beta1
kind: Gslb
metadata:
name: app-gslb
namespace: test-gslb
spec:
ingress:
rules:
- host: app.cloud.example.com
http:
paths:
- backend:
serviceName: app
servicePort: http
path: /
strategy: roundRobin
weight: 80%
and a Gslb
resource on cluster Y as:
apiVersion: ohmyglb.absa.oss/v1beta1
kind: Gslb
metadata:
name: app-gslb
namespace: test-gslb
spec:
ingress:
rules:
- host: app.cloud.example.com
http:
paths:
- backend:
serviceName: app
servicePort: http
path: /
strategy: roundRobin
weight: 20%
cluster-x-worker-1: 10.0.1.10
cluster-y-worker-1: 10.1.1.11
When issuing the following command, curl -v http://app.cloud.example.com
, I would expect the IP's resolved to reflect as follows (if this command was executed 6 times consecutively):
$ curl -v http://app.cloud.example.com # execution 1
* Trying 10.0.1.10...
...
$ curl -v http://app.cloud.example.com # execution 2
* Trying 10.0.1.10...
...
$ curl -v http://app.cloud.example.com # execution 3
* Trying 10.0.1.10...
...
$ curl -v http://app.cloud.example.com # execution 4
* Trying 10.0.1.10...
...
$ curl -v http://app.cloud.example.com # execution 5
* Trying 10.1.1.11...
...
$ curl -v http://app.cloud.example.com # execution 6
* Trying 10.1.1.11...
...
The resolved node IP's that ingress traffic will be sent should be spread approximately according to the weighting configured on the Glsb
resources. In this scenario that would be 80% (4 out of 6) resolved to cluster X and 20% (2 out of 6) resolved to cluster Y.
NOTE:
Gslb
resources in 2 clusters have a weight
specified (that might or might not add up to 100%). How does that affect the distribution over 3 clusters?Deployment
s become unhealthy on a cluster, then the weighting should be adjusted to honour the weighting across the remaining clusters with healthy Deployment
sDuring on-prem tests we encountered a situation during EtcdCluster
creation with etcd-operator
is not picking up the overridden private registry image reference and still tries to pull quay.io/coreos/etcd
during cluster creation.
In one cluster it 'fixed' itself automatically by redeployment in another cluster the issue persisted so we had to manually patch EtcdCluster
resource.
Definitely worth investigation as it can complicate on-prem installations.
Also we need to figure out if the problem affects only 1.14 cluster deployment workaround ( see https://github.com/AbsaOSS/ohmyglb/blob/master/Makefile#L114 ) or it is also a case for standard stable deployment target ( https://github.com/AbsaOSS/ohmyglb/blob/master/Makefile#L104 )
Given someone that is looking for a cloud native, Kubernetes native GSLB solution and who finds this project. The README
should concisely and simply answer the following questions:
The goal of the README
should leave no doubt as to what this project does, how it does it and an easy, quick way to try it out.
We already have all mechanics and automation to deploy ohmyglb locally(and also remotely) with some globally loadbalanced application on top. For local testing scenario we are using https://github.com/stefanprodan/podinfo as a sample application.
In context of this issue we want to extend README with end-to-end Howto which would include:
We also might want to configure/modify podinfo to expose ohmyglb GeoTag right in the user interface so it will be highly illustrative in context of loadbalancing strategies demonstration in action.
We already made some experiments with k3d which looked very good in terms of speed and performance, the only visible blocker was inability to have the test on the same network as in
k3d-io/k3d#111
Looks like the issue is solved in recent k3d version and we might give it another shot as we definitely want faster test cycle both locally and within ci/cd pipelines
OhMyGLB was a silly name chosen for no particular reason when this project started.
Should we rename it now that it's maturing?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. đđđ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google â¤ī¸ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.