Comments (4)
More info: the nesting works if done in two steps:
grondo@pi3:~$ flux alloc -N1
grondo@pi0:~$ flux alloc -N1
grondo@pi0:~$ flux getattr instance-level
2
from flux-core.
Hm, this "fixes" the issue, but I need to understand why flux-job
isn't in the foreground process group in the failing case.
diff --git a/src/common/libterminus/client.c b/src/common/libterminus/client.c
index 3bccd55ad..8e3d743da 100644
--- a/src/common/libterminus/client.c
+++ b/src/common/libterminus/client.c
@@ -432,6 +432,12 @@ static void pty_read_cb (flux_reactor_t *r,
llog_fatal (c, "read: %s", strerror (errno));
pty_client_exit (c, NULL);
}
+ else if (errno == EIO) {
+ /* Force ourselves in foreground
+ */
+ if (tcgetpgrp (STDIN_FILENO) != getpgrp ())
+ tcsetpgrp (STDIN_FILENO, getpgrp());
+ }
return;
}
if (len == 0) {
from flux-core.
Ah, the difference in behavior here is described by this comment:
Lines 468 to 474 in b0c565f
When a new instance is launched with -o pty.interactive
, then the flux-broker process becomes the process group leader of the foreground process group. When it launches a non-interactive-shell initial program (in the failing case here flux-alloc
->flux job attach
), it puts the initial program into a new process group, which therefore cannot access the tty (a background process group).
Possibly, a solution here is to disable setpgrp
whenever stdin is a tty, but I wonder if that will have any other fallout. Another solution might be to use the pre-exec subprocess hook to force the initial program as the foreground process group before it is executed. I'll try both and see which works out (probably the first will be simplest)
from flux-core.
To sum up more succinctly, the problem here is that the broker wants to create a new process group for the initial program (except when it launches a plain interactive shell) so that it can signal any processes spawned from rc2 as a group. However, this disables access to the tty for those processes.
For example, even this simple test fails:
$ flux start vim
Dec 16 08:18:31.079432 broker.err[0]: rc2: Killing stopped non-interactive process
Dec 16 08:18:31.080101 broker.err[0]: rc2.0: vim Killed (rc=137) 0.0s
Here vim
is stopped with SIGTTIN
when it tries to write to the terminal, and to avoid a hang this workaround is implemented in the broker currently:
Lines 196 to 208 in b0c565f
(flux job attach
blocks SIGTTIN
for another issue, which is why it gets EIO
when reading from the terminal instead of SIGTTIN
like vim
)
There's probably two approaches to fix this problem:
- Do not run processes under a separate process group when
stdin
is a tty. This could affect cleanup of initial programs that are shell scripts or that invoke multiple processes, but is the simplest. - Have the broker act more like a shell, since it is trying to run groups of processes and manage them like a shell does. Run rc2 in its own process group all the time, and make the process group the foreground process group with tcsetpgrp(3) if the broker is currently in the foreground process group. This one would need more thought and some experimentation to see if there is any fallout.
from flux-core.
Related Issues (20)
- valgrind reports new leaks with hwloc 2.10
- broker crash in `content_cache_destroy` HOT 2
- `job-ingest: fluid_init failed` when launching 16 brokers per node on 1040 nodes HOT 7
- discussion: go bindings for flux-core HOT 3
- Cray MPI launches as singletons on frontier HOT 10
- shell: support dumping hwloc XML to `HWLOC_XMLFILE`
- Need a bulk submission tool for `flux batch`
- shell taskmap `block` scheme ignores its arguments
- flux-job: support `MPIR_executable_path` and `MPIR_server_arguments` in attach HOT 1
- Flux RADIUSS Tutorial Discussion Issue HOT 3
- liboptparse segfaults with duplicate subcommand option table entries
- docker-run-systest.sh does not work anymore
- nodes are drained when a user aborts a run request with prolog running HOT 11
- flux-shell: ERROR: output: shell_output_write: Function not implemented
- when a user aborts a job early, the prolog script may get SIGTERM
- make all jobs "waitable" HOT 1
- tracking issue: standby/preemptible jobs HOT 1
- idea: use host constraint for queues instead of properties
- TOSS 4 non-TCE openmpi: Failed to open drm root directory /sys/class/drm.: No such file or directory HOT 3
- job shell blocks at exit in degraded job HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flux-core.