Giter VIP home page Giter VIP logo

Comments (8)

jhuckaby avatar jhuckaby commented on May 26, 2024

Wow, this is very odd. I've been staring at that bit of code for an hour, and I swear I am not trimming the domain for this URL. The hostname is pulled straight out of the job object, and represents how the server is identified in the cluster. By all rights this should be the fully-qualified hostname of the target server. The fact that it isn't really puzzles me.

Could it be that your server thinks its own hostname is truncated? Can you SSH to your myhost server and tell me what this command outputs:

node -e 'console.log(require("os").hostname());'

Does that output a fully-qualified hostname, or a truncated one?

Regardless, I think you did indeed find a bug here. This URL is constructed blindly using the server hostname from the job object, without regard for the web_socket_use_hostnames configuration parameter. Meaning, it is behaving as if this param is always set to true.

I have fixed that in this commit:
a05ed7e

If your server has a correct, fully-qualified hostname, and you aren't using web_socket_use_hostnames, please let me know if the new HEAD revision fixes this.

from cronicle.

aldanor avatar aldanor commented on May 26, 2024

Thanks for looking into this!

Hmm, I have web_socket_use_hostnames set to zero, but I think this is a different issue though, when I add a new server via the UI and type in myhost.example.com (where example.com is my domain, all boxes belong to it), in global/servers/00 it stores the following entry:

{"hostname":"myhost","ip":"123.123.123.123"}

-- thus the information about .example.com seems to be lost for good at that point?

node -e 'console.log(require("os").hostname());'

-- this one outputs the truncated one if I SSH to the host (i.e., myhost).

/* By the way, if I remove a server from admin console, I can't then add it back until I restart the master scheduler, it complains about it still being "the member of the cluster". Is that a bug? */

from cronicle.

aldanor avatar aldanor commented on May 26, 2024

(Haven't checked it with HEAD though, but will do tomorrow - looks like it may address that, since the aforesaid flag would no longer be ignored so it should use IPs in log urls)

from cronicle.

jhuckaby avatar jhuckaby commented on May 26, 2024

node -e 'console.log(require("os").hostname());'
-- this one outputs the truncated one if I SSH to the host (i.e., myhost).

Ah ha, that explains it. So, the issue here is that Cronicle relies on all local server hostnames being "correct", where correct is defined as "resolvable by DNS on any servers or clients using the system". Fully-qualified hostnames are best, but it CAN work with partial short hostnames as long as your own DNS setup, and your servers, can all resolve the short hostname. In your case you have a server that has its local hostname set to "myhost" instead of "myhost.example.com". Cronicle uses that internally as THE definitive server hostname for that server.

However, there is indeed another, separate bug here, which is now fixed in HEAD. The client-side JS code that was constructing the URL to the remote server's active job log was using the hostname instead of the IP address. That is now fixed, so this should theoretically start working for you, if you upgrade to HEAD.

from cronicle.

jhuckaby avatar jhuckaby commented on May 26, 2024

By the way, if I remove a server from admin console, I can't then add it back until I restart the master scheduler, it complains about it still being "the member of the cluster". Is that a bug?

I have never heard of that bug before. I remove and add servers all the time, and I have never encountered this, so it is very strange that you are seeing it. Something is funky in your setup I think. Perhaps you have two servers in your cluster with the same internal local hostname, e.g. "myhost"?

from cronicle.

aldanor avatar aldanor commented on May 26, 2024

Something is funky in your setup I think. Perhaps you have two servers in your cluster with the same internal local hostname, e.g. "myhost"?

Don't think so, it's quite trivial.

Just tried it again, removing the server first (it was only the part of 'all servers' group, if that matters), and then adding it back as testhost.example.com (seen as testhost), and then trying to add it back -- it fails (see the log below).

To note, the global/servers/0 file was correctly updated right away -- but I have to restart the master node to be able to re-add the server.

[1511743783.32][2017-11-27 00:49:43][scheduler][WebServer][debug][8][New incoming HTTP connection: c39][{"ip":"::ffff:1.2.3.4","num_conns":2}]
[1511743783.321][2017-11-27 00:49:43][scheduler][WebServer][debug][8][New HTTP request: POST /api/app/add_server (::ffff:1.2.3.4)][{"socket":"c39","version":"1.1"}]
[1511743783.322][2017-11-27 00:49:43][scheduler][WebServer][debug][9][Incoming HTTP Headers:][{"host":"3.4.5.6:3012","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:59.0) Gecko/20100101 Firefox/59.0","accept":"text/plain, */*; q=0.01","accept-language":"en-US,en;q=0.5","accept-encoding":"gzip, deflate","referer":"http://scheduler.example.com/","content-type":"text/plain","content-length":"114","origin":"http://scheduler.example.com","connection":"keep-alive"}]
[1511743783.322][2017-11-27 00:49:43][scheduler][WebServer][debug][6][Invoking handler for request: POST /api/app/add_server: API][]

==> /opt/cronicle/logs/API.log <==
[1511743783.323][2017-11-27 00:49:43][scheduler][API][debug][6][Handling API request: POST /api/app/add_server][{}]
[1511743783.323][2017-11-27 00:49:43][scheduler][API][debug][9][API Params][{"hostname":"testhost.example.com","session_id":"a207a90dce1fbdfa70b4079121be28ecaa28a0c237bdb3b2932280e73cc09ab7"}]
[1511743783.323][2017-11-27 00:49:43][scheduler][API][debug][9][Activating namespaced API handler: app/api_add_server for URI: /api/app/add_server][]

==> /opt/cronicle/logs/S3.log <==
[1511743783.323][2017-11-27 00:49:43][scheduler][S3][debug][9][Fetching S3 Object: sessions/a207a90dce1fbdfa70b4079121be28ecaa28a0c237bdb3b2932280e73cc09ab7][]
[1511743783.342][2017-11-27 00:49:43][scheduler][S3][debug][9][JSON fetch complete: sessions/a207a90dce1fbdfa70b4079121be28ecaa28a0c237bdb3b2932280e73cc09ab7][]
[1511743783.342][2017-11-27 00:49:43][scheduler][S3][debug][9][Fetching S3 Object: users/aldanor][]
[1511743783.358][2017-11-27 00:49:43][scheduler][S3][debug][9][JSON fetch complete: users/aldanor][]

==> /opt/cronicle/logs/Cronicle.log <==
[1511743783.358][2017-11-27 00:49:43][scheduler][Cronicle][debug][9][Sending API request to remote server: http://testhost.example.com:3012/api/app/check_add_server][]

==> /opt/cronicle/logs/User.log <==
[1511743783.371][2017-11-27 00:49:43][scheduler][User][error][server][Failed to add server to cluster: testhost.example.com: Server is already a member of a cluster (Master: scheduler)][]

==> /opt/cronicle/logs/WebServer.log <==
[1511743783.372][2017-11-27 00:49:43][scheduler][WebServer][debug][9][Compressed text output with gzip: 148 bytes down to: 134 bytes][]
[1511743783.373][2017-11-27 00:49:43][scheduler][WebServer][debug][9][Sending HTTP response: 200 OK][{"Content-Type":"application/json","Access-Control-Allow-Origin":"*","Server":"Cronicle 1.0","Content-Length":134,"Content-Encoding":"gzip"}]
[1511743783.373][2017-11-27 00:49:43][scheduler][WebServer][debug][9][Request complete][]
[1511743783.373][2017-11-27 00:49:43][scheduler][WebServer][debug][9][Response finished writing to socket][]
[1511743783.374][2017-11-27 00:49:43][scheduler][WebServer][debug][9][Request performance metrics:][{"scale":1000,"perf":{"total":51.685,"read":0.437,"process":49.111,"write":1.997},"counters":{"bytes_in":520,"bytes_out":282,"num_requests":1}}]
[1511743783.374][2017-11-27 00:49:43][scheduler][WebServer][debug][9][Keeping socket open for keep-alives: c39][]
[1511743788.374][2017-11-27 00:49:48][scheduler][WebServer][debug][8][HTTP connection has closed: c39][{"ip":"::ffff:1.2.3.4","total_elapsed":5053,"num_requests":1,"bytes_in":520,"bytes_out":282}]

from cronicle.

jhuckaby avatar jhuckaby commented on May 26, 2024

Ah, thank you for the log snippets. I see what is going on here now. When you "remove" a server from the cluster, the master doesn't actually tell the slave server it was removed. Instead, that happens naturally when the slave realizes it is no longer receiving pings from the master. That process usually takes around 60 seconds (see master_ping_timeout).

So I think your cycling of the master scheduler checkbox was a red herring. You were just killing time, and after you were done doing that, the remote server finally realized it lost its master, making it available again to be re-added.

I'll add this to the TODO list, thanks!

from cronicle.

jhuckaby avatar jhuckaby commented on May 26, 2024

Should be fixed in HEAD revision. Commit: 01998dd

from cronicle.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.