I have 2 environments both running on Openshift 3.11. The first is a simple sandbox environment consisting of 1 master and 1 compute node, the other is a HA environment consisting of 3 masters and 6 compute nodes.
I attempted multiple deploys on both, and this is what I noticed.
The simple sandbox environment had a 100% success rate over 6 attempts.
The HA sandbox has 1 success over perhaps 25-30 attempts. The 2 failures I see consistently in the HA environment are:
2019/09/20 18:40:37 [crit] 261#0: *1 connect() to unix:/tmp/gunicorn_web.sock failed (2: No such file or directory) while connecting to upstream, client: 10.244.14.1, server: _, request: "GET /health/instance HTTP/2.0", upstream: "http://unix:/tmp/gunicorn_web.sock:/health/instance", host: "10.244.14.10:8443"
2019/09/20 18:40:37 [crit] 261#0: *1 connect() to unix:/tmp/gunicorn_web.sock failed (2: No such file or directory) while connecting to upstream, client: 10.244.14.1, server: _, request: "GET /health/instance HTTP/2.0", upstream: "http://unix:/tmp/gunicorn_web.sock:/quay-registry/static/502.html", host: "10.244.14.10:8443"
10.244.14.1 () - - [20/Sep/2019:18:40:37 +0000] "GET /health/instance HTTP/2.0" 502 173 "-" "kube-probe/1.11+" (0.000 49 0.000 : 0.000)
nginx stdout | 2019/09/20 18:40:37 [crit] 261#0: *1 connect() to unix:/tmp/gunicorn_web.sock failed (2: No such file or directory) while connecting to upstream, client: 10.244.14.1, server: _, request: "GET /health/instance HTTP/2.0", upstream: "http://unix:/tmp/gunicorn_web.sock:/health/instance", host: "10.244.14.10:8443"
2019/09/20 18:40:37 [crit] 261#0: *1 connect() to unix:/tmp/gunicorn_web.sock failed (2: No such file or directory) while connecting to upstream, client: 10.244.14.1, server: _, request: "GET /health/instance HTTP/2.0", upstream: "http://unix:/tmp/gunicorn_web.sock:/quay-registry/static/502.html", host: "10.244.14.10:8443"
10.244.14.1 () - - [20/Sep/2019:18:40:37 +0000] "GET /health/instance HTTP/2.0" 502 173 "-" "kube-probe/1.11+" (0.000 49 0.000 : 0.000)
2019/09/20 18:40:47 [crit] 259#0: *4 connect() to unix:/tmp/gunicorn_web.sock failed (2: No such file or directory) while connecting to upstream, client: 10.244.14.1, server: _, request: "GET /health/instance HTTP/2.0", upstream: "http://unix:/tmp/gunicorn_web.sock:/health/instance", host: "10.244.14.10:8443"
2019/09/20 18:40:47 [crit] 259#0: *4 connect() to unix:/tmp/gunicorn_web.sock failed (2: No such file or directory) while connecting to upstream, client: 10.244.14.1, server: _, request: "GET /health/instance HTTP/2.0", upstream: "http://unix:/tmp/gunicorn_web.sock:/quay-registry/static/502.html", host: "10.244.14.10:8443"
10.244.14.1 () - - [20/Sep/2019:18:40:47 +0000] "GET /health/instance HTTP/2.0" 502 173 "-" "kube-probe/1.11+" (0.000 49 0.000 : 0.000)
nginx stdout | 2019/09/20 18:40:47 [crit] 259#0: *4 connect() to unix:/tmp/gunicorn_web.sock failed (2: No such file or directory) while connecting to upstream, client: 10.244.14.1, server: _, request: "GET /health/instance HTTP/2.0", upstream: "http://unix:/tmp/gunicorn_web.sock:/health/instance", host: "10.244.14.10:8443"
If you require more log data or have a suggested workaround, etc. please let me know.
If you would like full log files, let me know where I can send/drop them.