Giter VIP home page Giter VIP logo

Comments (4)

grondo avatar grondo commented on September 9, 2024

More info: the nesting works if done in two steps:

grondo@pi3:~$ flux alloc -N1
grondo@pi0:~$ flux alloc -N1
grondo@pi0:~$ flux getattr instance-level
2

from flux-core.

grondo avatar grondo commented on September 9, 2024

Hm, this "fixes" the issue, but I need to understand why flux-job isn't in the foreground process group in the failing case.

diff --git a/src/common/libterminus/client.c b/src/common/libterminus/client.c
index 3bccd55ad..8e3d743da 100644
--- a/src/common/libterminus/client.c
+++ b/src/common/libterminus/client.c
@@ -432,6 +432,12 @@ static void pty_read_cb (flux_reactor_t *r,
             llog_fatal (c, "read: %s", strerror (errno));
             pty_client_exit (c, NULL);
         }
+        else if (errno == EIO) {
+            /*  Force ourselves in foreground
+            */
+            if (tcgetpgrp (STDIN_FILENO) != getpgrp ())
+                tcsetpgrp (STDIN_FILENO, getpgrp());
+        }
         return;
     }
     if (len == 0) {

from flux-core.

grondo avatar grondo commented on September 9, 2024

Ah, the difference in behavior here is described by this comment:

/* For shell commands run the target cmdline in a separate process
* group so that any processes spawned by the shell will be signaled
* in runat_abort(). This does not work for an interactive shell
* (seems to disable access to the pty), so this flag is not set in
* runat_push_shell().
*/
cmd->flags |= FLUX_SUBPROCESS_FLAGS_SETPGRP;

When a new instance is launched with -o pty.interactive, then the flux-broker process becomes the process group leader of the foreground process group. When it launches a non-interactive-shell initial program (in the failing case here flux-alloc->flux job attach), it puts the initial program into a new process group, which therefore cannot access the tty (a background process group).

Possibly, a solution here is to disable setpgrp whenever stdin is a tty, but I wonder if that will have any other fallout. Another solution might be to use the pre-exec subprocess hook to force the initial program as the foreground process group before it is executed. I'll try both and see which works out (probably the first will be simplest)

from flux-core.

grondo avatar grondo commented on September 9, 2024

To sum up more succinctly, the problem here is that the broker wants to create a new process group for the initial program (except when it launches a plain interactive shell) so that it can signal any processes spawned from rc2 as a group. However, this disables access to the tty for those processes.

For example, even this simple test fails:

$ flux start vim
Dec 16 08:18:31.079432 broker.err[0]: rc2: Killing stopped non-interactive process
Dec 16 08:18:31.080101 broker.err[0]: rc2.0: vim Killed (rc=137) 0.0s

Here vim is stopped with SIGTTIN when it tries to write to the terminal, and to avoid a hang this workaround is implemented in the broker currently:

case FLUX_SUBPROCESS_STOPPED:
/* If this process is non-interactive, and stdin was a tty,
* then likely the subprocess was stopped due to SIGTTIN/TTOU.
* To avoid what would be a permanent hang, log an error and
* kill the child process.
*/
if (!entry->interactive && isatty (STDIN_FILENO)) {
flux_log (r->h, LOG_ERR,
"%s: Killing stopped non-interactive process",
entry->name);
flux_subprocess_kill (p, SIGKILL);
}
break;

(flux job attach blocks SIGTTIN for another issue, which is why it gets EIO when reading from the terminal instead of SIGTTIN like vim)

There's probably two approaches to fix this problem:

  • Do not run processes under a separate process group when stdin is a tty. This could affect cleanup of initial programs that are shell scripts or that invoke multiple processes, but is the simplest.
  • Have the broker act more like a shell, since it is trying to run groups of processes and manage them like a shell does. Run rc2 in its own process group all the time, and make the process group the foreground process group with tcsetpgrp(3) if the broker is currently in the foreground process group. This one would need more thought and some experimentation to see if there is any fallout.

from flux-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.