Giter VIP home page Giter VIP logo

Comments (24)

Luap99 avatar Luap99 commented on June 19, 2024

Do you only see this for public domain names or also when container names are resolved?

First thing is to check in the journal for any logged errors by aardvark-dns.

from aardvark-dns.

matejzero avatar matejzero commented on June 19, 2024

This is only for public domain names, I don't use any container names in my tests.

I managed to setup journald logging.

As far as errors goes, there were 2:

  • aardvark-dns[129509]: [25183] fail response: ProtoError { kind: Msg("mpsc::SendError send failed because receiver is gone") }
  • aardvark-dns[215345]: 14284 dns request got empty response

Apart from that, there is a lot of Received SIGHUP will refresh servers: 1. That is probably because I run 40 containers which are spawning up and down fast (my tests last between 20s and 2min).

Sometimes I also see:

aardvark-dns[129509]: No configuration found stopping the sever
systemd[1084]: Started /usr/libexec/podman/aardvark-dns --config /run/user/988/containers/networks/aardvark-dns -p 53 run.

Example of logs:

Sep 29 12:11:26 host.example.tld aardvark-dns[72066]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:27 host.example.tld aardvark-dns[72066]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:27 host.example.tld aardvark-dns[72066]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:28 host.example.tld aardvark-dns[72066]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:28 host.example.tld aardvark-dns[72066]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:28 host.example.tld aardvark-dns[72066]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:29 host.example.tld aardvark-dns[72066]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:30 host.example.tld aardvark-dns[72066]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:32 host.example.tld aardvark-dns[72066]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:32 host.example.tld aardvark-dns[72066]: No configuration found stopping the sever
Sep 29 12:11:33 host.example.tld systemd[1084]: Started /usr/libexec/podman/aardvark-dns --config /run/user/988/containers/networks/aardvark-dns -p 53 run.
Sep 29 12:11:34 host.example.tld aardvark-dns[84196]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:34 host.example.tld aardvark-dns[84196]: No configuration found stopping the sever
Sep 29 12:11:39 host.example.tld systemd[1084]: Started /usr/libexec/podman/aardvark-dns --config /run/user/988/containers/networks/aardvark-dns -p 53 run.
Sep 29 12:11:40 host.example.tld aardvark-dns[84336]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:41 host.example.tld aardvark-dns[84336]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:41 host.example.tld aardvark-dns[84336]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:41 host.example.tld aardvark-dns[84336]: No configuration found stopping the sever
Sep 29 12:11:41 host.example.tld systemd[1084]: Started /usr/libexec/podman/aardvark-dns --config /run/user/988/containers/networks/aardvark-dns -p 53 run.
Sep 29 12:11:41 host.example.tld aardvark-dns[84649]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:42 host.example.tld aardvark-dns[84649]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:42 host.example.tld aardvark-dns[84649]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:43 host.example.tld aardvark-dns[84649]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:43 host.example.tld aardvark-dns[84649]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:43 host.example.tld aardvark-dns[84649]: No configuration found stopping the sever
Sep 29 12:11:44 host.example.tld systemd[1084]: Started /usr/libexec/podman/aardvark-dns --config /run/user/988/containers/networks/aardvark-dns -p 53 run.
Sep 29 12:11:45 host.example.tld aardvark-dns[85160]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:45 host.example.tld aardvark-dns[85160]: No configuration found stopping the sever
Sep 29 12:11:46 host.example.tld systemd[1084]: Started /usr/libexec/podman/aardvark-dns --config /run/user/988/containers/networks/aardvark-dns -p 53 run.
Sep 29 12:11:48 host.example.tld aardvark-dns[85294]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:48 host.example.tld aardvark-dns[85294]: No configuration found stopping the sever
Sep 29 12:11:49 host.example.tld systemd[1084]: Started /usr/libexec/podman/aardvark-dns --config /run/user/988/containers/networks/aardvark-dns -p 53 run.
Sep 29 12:11:49 host.example.tld aardvark-dns[85439]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:50 host.example.tld aardvark-dns[85439]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:50 host.example.tld aardvark-dns[85439]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:51 host.example.tld aardvark-dns[85439]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:51 host.example.tld aardvark-dns[85439]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:51 host.example.tld aardvark-dns[85439]: No configuration found stopping the sever
Sep 29 12:11:51 host.example.tld systemd[1084]: Started /usr/libexec/podman/aardvark-dns --config /run/user/988/containers/networks/aardvark-dns -p 53 run.
Sep 29 12:11:51 host.example.tld aardvark-dns[85951]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:52 host.example.tld aardvark-dns[85951]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:52 host.example.tld aardvark-dns[85951]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:52 host.example.tld aardvark-dns[85951]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:53 host.example.tld aardvark-dns[85951]: Received SIGHUP will refresh servers: 1
Sep 29 12:11:53 host.example.tld aardvark-dns[85951]: Received SIGHUP will refresh servers: 1

from aardvark-dns.

Luap99 avatar Luap99 commented on June 19, 2024

No configuration found stopping the sever

This is normal assuming all containers are stopped at that moment. The next container start would respawn the process.

As far as errors goes, there were 2:

* `aardvark-dns[129509]: [25183] fail response: ProtoError { kind: Msg("mpsc::SendError send failed because receiver is gone") }`

* `aardvark-dns[215345]: 14284 dns request got empty response`

These look definitely relevant, @flouthoc any idea?

from aardvark-dns.

matejzero avatar matejzero commented on June 19, 2024

The problem is that our of 150 containers, between 10 and 20 fail due to dns error. Could SIGHUPs be the reason if the request comes when server is reloading?

from aardvark-dns.

Luap99 avatar Luap99 commented on June 19, 2024

That is a possibility, are these containers all on the same network or are there multiple networks in use?

from aardvark-dns.

matejzero avatar matejzero commented on June 19, 2024

Every container is in it's own network (I'm using FF_NETWORK_PER_BUILD flag with gitlab-runner).

from aardvark-dns.

flouthoc avatar flouthoc commented on June 19, 2024

Would it be possible run aardvark-dns in debug mode and share logs ?

from aardvark-dns.

matejzero avatar matejzero commented on June 19, 2024

I think I could, yes. How do I run aardvark in debug mode?

from aardvark-dns.

flouthoc avatar flouthoc commented on June 19, 2024

@Luap99 Does --log-level gets propogated to aardvark-dns and netavark ?

from aardvark-dns.

Luap99 avatar Luap99 commented on June 19, 2024

Yes podman --log-level debug ... gets passed down to netavark and aardvark. Although it is important to keep in mind that this of course has to happen on the command that start the aardvark-dns server, so it must be set on the first podman command who starts a container with dns.

from aardvark-dns.

Luap99 avatar Luap99 commented on June 19, 2024

Anyway the code looks pretty clear to me we teardown on each sighub and the setup again, that definitely looks wrong and likely is responsible for the package loss. We must keep the sockets active only only add/remove the ones according to the changed configs. Looking at it this whole section would need to be rewritten to handle it in a much better way.

from aardvark-dns.

matejzero avatar matejzero commented on June 19, 2024

I have to figure our a way to do that with gitlab-runner if it's even possible. I wasn't able to replicate this issue without gitlab-runner jobs.

from aardvark-dns.

Luap99 avatar Luap99 commented on June 19, 2024

I would assume just spamming the aardvark-dns process with SIGHUB signals should work as a reproducer to cause some package loss as the sockets are closed and opened each time again. So it is just a question of hitting that window.

from aardvark-dns.

michaelfranzl avatar michaelfranzl commented on June 19, 2024

Hi,

I am experiencing the same symptoms described by @matejzero under similar conditions: I use one GitLab Runner with Docker executor (which is configured to use a rootless and unprivileged Podman socket). I experience many DNS resolution failures where an estimated half of all CI jobs fail due to this issue.

Troubleshooting steps:

  • First I tried to mitigate this by running a local caching DNS on the host, which did not improve the issue.
  • By running a special GitLab CI job that performs hundreds of unique DNS lookups I could confirm that whenever there is a DNS lookup failure, the caching DNS on the host did not even receive the DNS query.
  • Then I assumed that packet loss due to high network load between the host and the containers could be at fault; but even when I limited the incoming network bandwidth of the host, the issue was not improving.

This pretty much leaves only the container network stack as the potential cause.

When GitLab Runner jobs are started and stopped, I can see bursts of the following log lines in journald:

Feb 19 08:29:18 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:19 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:19 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:19 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:19 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:19 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:19 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:19 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:20 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:21 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:21 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:21 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:21 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:22 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:22 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:25 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1
Feb 19 08:29:25 myhostname aardvark-dns[11348]: Received SIGHUP will refresh servers: 1

I can reproduce these messages using a custom test job, however I found that the presence of one of these messages is not sufficient to cause a DNS resolution failure.

from aardvark-dns.

matejzero avatar matejzero commented on June 19, 2024

I was hoping to have some more time to debug this, but unfortunately I had to switch back to docker for performance reasons so no new info from my side:/

from aardvark-dns.

Luap99 avatar Luap99 commented on June 19, 2024

#389 (comment) is still valid and must be fixed, and seeing all the Received SIGHUP will refresh servers messages makes me confident that this is the problem you are seeing.

from aardvark-dns.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.