Giter VIP home page Giter VIP logo

Comments (23)

bfjelds avatar bfjelds commented on August 22, 2024 3

what would happen if you took hostNetwork: true out of the agent template (or added it to your test Pod)? i don't remember what all we needed that for, but i'm thinking udev was the primary reason. maybe that is causing the issue?

from akri.

bfjelds avatar bfjelds commented on August 22, 2024 1

(your documentation is awesome, by the way)

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

OK, I replaced the Configuration's value for the discovery endpoint with the Cluster IP and it is then able to GET it:

kubectl get service/discovery --output=jsonpath="{.spec.clusterIP}"
10.152.183.188

Revised Configuration:

apiVersion: akri.sh/v0
kind: Configuration
metadata:
  name: http
spec:
  protocol:
    http:
      discoveryEndpoint: http://10.152.183.188:9999 # http://discovery:9999
  capacity: 1
  brokerPodSpec:
    imagePullSecrets: # GitHub Container Registry secret
    - name: ghcr
    containers:
      - name: http-broker
        image: "ghcr.io/dazwilkin/http@sha256:2c0738c5053761f738576912400921208d204a763904e54308783d1e48a14a4d"
        resources:
          limits:
            "{{PLACEHOLDER}}": "1"
  instanceServiceSpec:
    ports:
      - name: grpc
        port: 80
        targetPort: 8084 # HTTP uses 8084
  configurationServiceSpec:
    ports:
      - name: grpc
        port: 80
        targetPort: 8084

Yields:

[http:discover] Entered
[http:discover] url: http://10.152.183.188:9999
[http:discover] Response: Ok(Response { url: "http://10.152.183.188:9999/", status: 200, headers: {"date": "Wed, 04 Nov 2020 20:37:44 GMT", "content-length": "130", "content-type": "text/plain; charset=utf-8"} })

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

Created a branch in my repo.

I've added some documentation for it too.

from akri.

bfjelds avatar bfjelds commented on August 22, 2024

Here's a comparison link that might help look at just the changes in your branch

from akri.

bfjelds avatar bfjelds commented on August 22, 2024

is devices.yaml included in the branch? i'm not finding it.

from akri.

bfjelds avatar bfjelds commented on August 22, 2024

i'd be curious to see what the output looks like for the discovery service: kubectl get service discovery -o yaml

(i haven't used kubectl expose before, so i'm not sure what it exactlky produces)

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

is devices.yaml included in the branch? i'm not finding it.

My mistake... I moved the YAMLs from the Broker to the Devices repo and forgot to push them... doing so now.

YAMLs

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

i'd be curious to see what the output looks like for the discovery service: kubectl get service discovery -o yaml

(i haven't used kubectl expose before, so i'm not sure what it exactlky produces)

You caught me ;-)

kubectl expose deployment/X ... is a quick way to create a service and avoid writing the spec:

kubectl get service/discovery --output=yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: discovery
    broker: http
    project: akri
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .: {}
          f:app: {}
          f:broker: {}
          f:project: {}
      f:spec:
        f:ports:
          .: {}
          k:{"port":9999,"protocol":"TCP"}:
            .: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
        f:selector:
          .: {}
          f:app: {}
          f:protocol: {}
        f:sessionAffinity: {}
        f:type: {}
    manager: kubectl
    operation: Update
  name: discovery
  namespace: default
spec:
  clusterIP: 10.152.183.191
  ports:
  - port: 9999
    protocol: TCP
    targetPort: 9999
  selector:
    app: akri
    protocol: http
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

I often cheat by --output=yaml and munging the result into e.g. discovery.service.yaml

from akri.

bfjelds avatar bfjelds commented on August 22, 2024

out of curiosity, what happens if you specify the FQDN for the discovery service (something like discovery.default.svc.cluster.local)?

from akri.

bfjelds avatar bfjelds commented on August 22, 2024

another interesting data point might be adding some dns/ip tools to the agent container and then using kubectl exec to see if the agent pod successfully resolves the service name?

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

out of curiosity, what happens if you specify the FQDN for the discovery service (something like discovery.default.svc.cluster.local)?

I just deleted the cluster for today to walk my dog.... I think I tried that yesterday and it doesn't work.

I tried discovery.default and discover.default.svc.cluster.local IIRC... Will try again tomorrow

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

another interesting data point might be adding some dns/ip tools to the agent container and then using kubectl exec to see if the agent pod successfully resolves the service name?

Yes, that I have done.

I added dnsutils and I can nslookup [discovery|discovery.default|discovery.default.svc.cluster.local]

I added curl and, from that Pod, I can curl the discovery endpoint.

I even went as far as refactoring the get code from the discovery handler and deploying that as a separate Pod, it worked too!

from akri.

bfjelds avatar bfjelds commented on August 22, 2024

I tried to create a simpler version of the problem (https://github.com/bfjelds/reqwest-test) but it seems like I'm not quite repro'ing what you're seeing.

It seems like DNS name resolution is working with this code ... and i can manufacture the error you are seeing.

I'm not sure what is different yet, but maybe you see something helpful in the code?

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

I'm confident that I'm doing something stupid.

I am able to copy and paste the discovery URL from the error into a Pod running curl and have this succeed.

I was also able to use a repro container with the same (!?) code and URL and have that succeed.

I will spend more time focused on this tomorrow.

Thank you for looking into this!

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

It's curious, I'm running curl in a pod and here's the result:

for PRE in "http://" ""
do
  for POST in "" "/"
  do
    for SVC in "discovery" "discovery.default" "discovery.default.svc" "discovery.default.svc.cluster.local"
    do
      URL="${PRE}${SVC}:9999${POST}"
      printf "%s:\t%s\n" \
       $(curl --silent --write-out '%{response_code}' ${URL} --output /dev/null) \
       ${URL}
    done
  done
done | sort

Yields:

NOTE 000 corresponds to failure to resolve

000:	discovery.default.svc:9999
000:	discovery.default.svc:9999/
000:	http://discovery.default.svc:9999
000:	http://discovery.default.svc:9999/

NOTE http:// and terminating / make no difference.

And:

200:	discovery.default.svc.cluster.local:9999
200:	discovery.default.svc.cluster.local:9999/
200:	discovery.default:9999
200:	discovery.default:9999/
200:	discovery:9999
200:	discovery:9999/
200:	http://discovery.default.svc.cluster.local:9999
200:	http://discovery.default.svc.cluster.local:9999/
200:	http://discovery.default:9999
200:	http://discovery.default:9999/
200:	http://discovery:9999
200:	http://discovery:9999/

Yet, using known good DNS names for the service in the agent doesn't work:

# http://discovery:9999

[http:new] Entered
[http:discover] Entered
[http:discover] url: http://discovery:9999
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorMessage { msg: "Failed to connect to discovery endpoint results: reqwest::Error { kind: Request, url: \"http://discovery:9999/\", source: hyper::Error(Connect, ConnectError(\"dns error\", Custom { kind: Other, error: \"failed to lookup address information: Temporary failure in name resolution\" })) }" }', agent/src/util/config_action.rs:146:64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[http:discover] Failed to connect to discovery endpoint: http://discovery:9999

# http://discovery.default:9999

[http:new] Entered
[http:discover] Entered
[http:discover] url: http://discovery.default:9999
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorMessage { msg: "Failed to connect to discovery endpoint results: reqwest::Error { kind: Request, url: \"http://discovery.default:9999/\", source: hyper::Error(Connect, ConnectError(\"dns error\", Custom { kind: Other, error: \"failed to lookup address information: Name or service not known\" })) }" }', agent/src/util/config_action.rs:146:64
[http:discover] Failed to connect to discovery endpoint: http://discovery.default:9999
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

# http://discovery.default.svc.cluster.local:9999

[http:new] Entered
[http:discover] Entered
[http:discover] url: http://discovery.default.svc.cluster.local:9999
[http:discover] Failed to connect to discovery endpoint: http://discovery.default.svc.cluster.local:9999
[http:discover] Error: error sending request for url (http://discovery.default.svc.cluster.local:9999/): error trying to connect: dns error: failed to lookup address information: Temporary failure in name resolution
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorMessage { msg: "Failed to connect to discovery endpoint results: reqwest::Error { kind: Request, url: \"http://discovery.default.svc.cluster.local:9999/\", source: hyper::Error(Connect, ConnectError(\"dns error\", Custom { kind: Other, error: \"failed to lookup address information: Temporary failure in name resolution\" })) }" }', agent/src/util/config_action.rs:146:64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

But, IP does;

http:discover] Entered
[http:discover] url: http://10.152.183.106:9999
[2020-11-06T20:07:13Z TRACE agent::protocols::http::discovery_handler] [http:discover] Connected to discovery endpoint: "http://10.152.183.106:9999" => Response { url: "http://10.152.183.106:9999/", status: 200, headers: {"date": "Fri, 06 Nov 2020 20:07:13 GMT", "content-length": "189", "content-type": "text/plain; charset=utf-8"} }
[2020-11-06T20:07:13Z TRACE agent::protocols::http::discovery_handler] [protocol:http] Result: [DiscoveryResult { digest: "d62266", properties: {"AKRI_HTTP": "http", "AKRI_HTTP_DEVICE_ENDPOINT": "http://device-1:8080"} }, DiscoveryResult { digest: "703d61", properties: {"AKRI_HTTP_DEVICE_ENDPOINT": "http://device-2:8080", "AKRI_HTTP": "http"} }, DiscoveryResult { digest: "8f52a9", properties: {"AKRI_HTTP": "http", "AKRI_HTTP_DEVICE_ENDPOINT": "http://device-3:8080"} }, DiscoveryResult { digest: "8bb408", properties: {"AKRI_HTTP": "http", "AKRI_HTTP_DEVICE_ENDPOINT": "http://device-4:8080"} }, DiscoveryResult { digest: "ccb080", properties: {"AKRI_HTTP_DEVICE_ENDPOINT": "http://device-5:8080", "AKRI_HTTP": "http"} }, DiscoveryResult { digest: "231bfa", properties: {"AKRI_HTTP_DEVICE_ENDPOINT": "http://device-6:8080", "AKRI_HTTP": "http"} }, DiscoveryResult { digest: "8c07de", properties: {"AKRI_HTTP_DEVICE_ENDPOINT": "http://device-7:8080", "AKRI_HTTP": "http"} }, DiscoveryResult { digest: "38a57c", properties: {"AKRI_HTTP_DEVICE_ENDPOINT": "http://device-8:8080", "AKRI_HTTP": "http"} }, DiscoveryResult { digest: "4a70c3", properties: {"AKRI_HTTP": "http", "AKRI_HTTP_DEVICE_ENDPOINT": "http://device-9:8080"} }]

And interestingly the broker uses reqwest and cluster service names and succeeds:

# http://device-X:8080

[http:main] Entered
[http:main] Device: http://device-1:8080
[http:main:loop] Sleep
[http:main:loop] read_sensor(http://device-1:8080)
[http:read_sensor] Entered
[main:read_sensor] Response status: 200
[main:read_sensor] Response body: Ok("0.026863285639930047")
[http:main:loop] Sleep
[http:main:loop] read_sensor(http://device-1:8080)
[http:read_sensor] Entered
[main:read_sensor] Response status: 200
[main:read_sensor] Response body: Ok("0.3173084282870506")
[http:main:loop] Sleep

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

As long as I replace the service's DNS with its IP, it works. So, this is currently my only issue.

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

I wondered that and promptly forgot to try it!

I'll do so next week.

Have a good weekend.

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

Well done @bfjelds ...

Removing hostNetwork: true from the agent spec resolves the issue.

Thank you!

from akri.

bfjelds avatar bfjelds commented on August 22, 2024

@DazWilkin , would something like dnsPolicy: ClusterFirstWithHostNet work as well (leaving the agent free to use hostNetwork: true). Info found here: kubernetes/dns#316 (comment)

Seems like a fairly common combination: https://github.com/search?l=YAML&q=%22hostNetwork%3A+true%22+%22dnsPolicy+ClusterFirstWithHostNet%22&type=Code

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

Will give it a whirl

from akri.

DazWilkin avatar DazWilkin commented on August 22, 2024

Works!

more ./akri/deployment/helm/templates/agent.yaml
{{- if .Values.agent.enabled }}
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: akri-agent-daemonset
spec:
  selector:
    matchLabels:
      name: akri-agent
  template:
    metadata:
      labels:
        name: akri-agent
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
...

from akri.

bfjelds avatar bfjelds commented on August 22, 2024

awesome! dnsPolicy: ClusterFirstWithHostNet seems like something we should add to akri:main

from akri.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.