Giter VIP home page Giter VIP logo

torque's Introduction

TORQUE Resource Manager

This is the master branch README. Major changes and updates should go into this branch.

Overview

TORQUE (Terascale Open-source Resource and Queue manger) is an open source project based on the original PBS* resource manager developed by NASA, LLNL, and MRJ. It possesses a large number of enhancements contributed by organizations such as OSC, NCSA, TeraGrid, the U.S Dept of Energy, USC, and many, many others. It continues to incorporate significant advancements in the areas of scalability, fault-tolerance, usability, functionality, and security development from the community and vendor supporters. It may be utilized, modified, and distributed subject to the constraints of the license located in the PBS_License.txt file. If you would like to contribute to this project or have patches, enhancements, or associated projects you would like to have included in this project, please send an email to our mailing list.

Installation Instructions

The current build status is Build Status.

Install directions are available on the documentation website

Additional information concerning TORQUE is available in the PBS Administrator's Guide available in the admin_guide.ps found in the 'doc' subdirectory.

Additional Notes

  • TORQUE is not endorsed by nor affiliated with Altair.

torque's People

Contributors

ac-autogit avatar actorquedeveloper avatar acvizi avatar alawrence-ac avatar braddaw avatar bringhurst avatar craigprescott avatar dhh1128 avatar dhill12 avatar djhaskin987 avatar dvandok avatar elicha avatar gardnerbyu avatar jbarber avatar jbristow0 avatar jovandeginste avatar knielson avatar lnikulin avatar loz-hurst avatar mattaezell avatar necrolyte2 avatar scottm1ll avatar spuder avatar ssather avatar stdweird avatar tabaer avatar thedavidwhiteside avatar vladimirstarostenkov avatar wmpauli avatar wpoely86 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torque's Issues

4.2.x : Add --enable-mic (Intel Phi) flag to torque.spec.in

Hi,
I hope this is correct :) we needed to add Phi support to our test cluster running torque-4.2.x but it appeared it not being passed through via the rpmbuild command. The following very simple diff patch to the buildutils/torque.spec.in for torque-4.2.1 appears to work for us :

--- buildutils/torque.spec.orig 2013-03-10 14:53:25.532000584 +0000
+++ buildutils/torque.spec.in   2013-03-10 14:59:06.667998157 +0000
@@ -25,6 +25,7 @@
 %bcond_with    numa
 %bcond_with    pam
 %bcond_with    top
+%bcond_with    mic

 ### Features enabled by default
 %bcond_without scp
@@ -46,6 +47,7 @@
 %define ac_with_scp        --with-rcp=%{?with_scp:scp}%{!?with_scp:pbs_rcp}
 %define ac_with_spool      --%{?with_spool:en}%{!?with_spool:dis}able-spool
 %define ac_with_syslog     --%{?with_syslog:en}%{!?with_syslog:dis}able-syslog
+%define ac_with_mic        --%{?with_mic:en}%{!?with_mic:dis}able-mic

 ### Build Requirements
 %define breq_gui   %{?with_gui:tcl-devel tk-devel tclx}

I haven't coded up the -with-mic-path because we have used the default location so I will add that later if needed.

Best Regards

Mark Roberts

header file cleanup / missing code?

2 issues with the current header files
a. pbs_ifl.h defines a struct pbs_statdest and a function pbs_stageout; however the code seems missing (also git log -S doesn't show it, which is really odd). if the code does not exist, please cleanup the headers.

b. log.h defines extern int LOGLEVEL; 3 times.

huge address space usage of torque server

torque_server uses about 900GB of address space on a system with about 2000 jobs (1600 running, 400 queued):
S USER PID PPID NI RSS VSZ STIME %CPU TIME COMMAND
S root 31996 1 0 766520 912800900 Sep23 10.7 17:44:10 /usr/local/torque/sbin/pbs_server -d /var/spool/torque -H b0
After a restart vm usage almost instantly jumps to 500GB and then grows to 700GB within a minute or so. After that it grows more gradually to the value shown above and from there on the vm usage remains constant. Even on our development system with a single running job torque_server uses about 28GB of address space:
S USER PID PPID NI RSS VSZ STIME %CPU TIME COMMAND
S root 20223 1 0 39252 28951952 Sep22 0.0 00:02:58 /usr/local/torque/sbin/pbs_server -d /var/spool/torque

Travis-ci initial configuration and parallel build

@mattaezell writes:

Is the intention to just do builds or builds and unit tests? As currently
configured, the 'make check' doesn't actually do anything. You have to
configure --with-check for it to actually execute the unit tests.

Also, I noticed that the Travis Vms are configured with 1.5 cores (what's
half a core??). Anyway, it would probably make sense to specify make's
parallel option '-j' of at least 2. I've noticed that oversubscribing
cores can sometimes yield improvement, when some are doing disk I/O and
others are using the CPU. On my desktop with 2 cores, a very simple
non-scientific test resulted in 'make -j1 && make -j1 check' taking
5:02.014, -j2 taking 3:27.915 and -j4 taking 3:14.099.

No builds have run yet - I think someone from Adaptive has to login to
Travis to enable the builds.

pbs_sched stopped working

This issue was reported by multiple community members and Matt Ezell points out that commit 062443f broke pbs_sched as it writes back on the same port.

Probably the best solution would be to add a variable called using_pbs_sched or something like that which specifies that the server not close this socket if pbs_sched is being used. This socket gets closed by default because the most popular schedulers that interface with TORQUE never write replies on this socket, making this pure overhead for most users.

pbs_server_attributes man page not up to date

I have noticed that the pbs_server_attributes man page does not contain all of the defined pbs_server attributes. For example, none of the attributes relating to logging job information (record_job_info, record_job_script, etc) are documented in the man page. I believe these attributes were added a number of years ago in Torque 2.5. I'm sure there are other missing attributes, and there are probably queue attributes missing as well.

4.1.x Can not delete Slot-Limited Array Jobs

x=$(qsub -t 0-1000%100 sleeper.sh)
[wait]
qdel $x

In 4.1.4:
Torque now deletes the array job, however the not yet running jobs still get started. They do no longer show in qstat as queued, but reappear when they are started.
In 4.1.5:
I tried this once after the update and the qdel command blocked with no action taken.

trqauthd crash

this is a build from latest 4.2.4 brach (ie just before the 4.2.4.1)
i get the following trqauthd crash once in a while

i managed to start trquathd -D in gdb on sl6.4, in another session on the same host i start pbs_server and grap something after ctrl-C; maybe this can help?

[root@master16 server_logs]# gdb /usr/sbin/trqauthd 
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/trqauthd...(no debugging symbols found)...done.
(gdb) run -D
Starting program: /usr/sbin/trqauthd -D
[Thread debugging using libthread_db enabled]
hostname: master16.delcatty.os
Active server name: master16.delcatty.os  pbs_server port is: 15001
trqauthd port: 15005
*** glibc detected *** /usr/sbin/trqauthd: free(): invalid next size (fast): 0x0000000000624a00 ***
*** glibc detected *** /usr/sbin/trqauthd: malloc(): memory corruption: 0x0000000000624a20 ***

^C
Program received signal SIGINT, Interrupt.
0x00007ffff5f78b1b in pthread_once () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install torque-4.2.4-ug.1.5c186cb47cb41351272ce7f552f864d9b0aff610.x86_64
(gdb) 
(gdb) bt
#0  0x00007ffff5f78b1b in pthread_once () from /lib64/libpthread.so.0
#1  0x00007ffff532d954 in backtrace () from /lib64/libc.so.6
#2  0x00007ffff529f7cb in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff52a50e6 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007ffff52a8b64 in _int_malloc () from /lib64/libc.so.6
#5  0x00007ffff52a9911 in malloc () from /lib64/libc.so.6
#6  0x00007ffff7de9b9d in _dl_map_object_deps () from /lib64/ld-linux-x86-64.so.2
#7  0x00007ffff7defa91 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#8  0x00007ffff7deb1a6 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9  0x00007ffff7def4ea in _dl_open () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff5355300 in do_dlopen () from /lib64/libc.so.6
#11 0x00007ffff7deb1a6 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#12 0x00007ffff5355457 in __libc_dlopen_mode () from /lib64/libc.so.6
#13 0x00007ffff532d855 in init () from /lib64/libc.so.6
#14 0x00007ffff5f78b23 in pthread_once () from /lib64/libpthread.so.0
#15 0x00007ffff532d954 in backtrace () from /lib64/libc.so.6
#16 0x00007ffff529f7cb in __libc_message () from /lib64/libc.so.6
#17 0x00007ffff52a50e6 in malloc_printerr () from /lib64/libc.so.6
#18 0x00007ffff52a7c13 in _int_free () from /lib64/libc.so.6
#19 0x00007ffff712482d in process_svr_conn (sock=0x6249e0) at ../Libifl/trq_auth.c:901
#20 0x00007ffff7132803 in start_domainsocket_listener (socket_name=0x7fffffffd5c0 "/tmp/trqauthd-unix", process_meth=0x4012c8 <_Z16process_svr_connPv@plt>)
    at ../Libnet/server_core.c:264
#21 0x0000000000401b3c in daemonize_trqauthd(char const*, int, void* (*)(void*)) ()
#22 0x0000000000401e1d in trq_main ()
#23 0x0000000000401e7c in main ()
(gdb) 

PBS_NODEFILE is missing for interactive jobs on 4.2.1

I just made a quick stab at torque 4.2.1 for my work on torque-roll for Rocks and found that the nodefile is missing on the nodes when you run an interactive job. Non-interactive jobs seems to work fine. Here is a quick run-through:

Regular jobs are fine

[marve@hpc1 ~]$ echo cat \$PBS_NODEFILE | qsub -lnodes=2:ppn=2,walltime=1000
8.hpc1.local
[marve@hpc1 ~]$ cat STDIN.o8
compute-0-2
compute-0-2
compute-0-1
compute-0-1

Interactive jobs are not

[marve@hpc1 ~]$ qsub -lnodes=2:ppn=2,walltime=1000 -I
qsub: waiting for job 9.hpc1.local to start
qsub: job 9.hpc1.local ready

[marve@compute-0-2 ~]$ echo $PBS_NODEFILE
/var/spool/torque/aux//9.hpc1.local
[marve@compute-0-2 ~]$ ll /var/spool/torque/aux/
total 0

This is kind of serious for the torque roll setup because I have a hack that integrates the standard openmpi (that do not contain libtm support) with torque and it depends on having the nodefile available.

[marve@compute-0-2 ~]$ cat /opt/torque/etc/openmpi-setup.sh 
export OMPI_MCA_orte_rsh_agent="pbsdshwrapper"
export OMPI_MCA_orte_default_hostfile=$PBS_NODEFILE
export OMPI_MCA_orte_leave_session_attached=1

And it barfs on me when the nodefile is missing

[marve@compute-0-2 ~]$ source /opt/torque/etc/openmpi-setup.sh 
[marve@compute-0-2 ~]$ mpirun ./hello.x
--------------------------------------------------------------------------
Open RTE was unable to open the hostfile:
    /var/spool/torque/aux//9.hpc1.local
Check to make sure the path and filename are correct.
--------------------------------------------------------------------------
[compute-0-2.local:02832] [[3496,0],0] ORTE_ERROR_LOG: Not found in file base/ras_base_allocate.c at line 247
[compute-0-2.local:02832] [[3496,0],0] ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 99
[compute-0-2.local:02832] [[3496,0],0] ORTE_ERROR_LOG: Not found in file plm_rsh_module.c at line 1167

procs no longer parsed by Torque

When the procs resource was originally introduced it was pass-through to the scheduler only. Torque simply recorded the value for procs in Resources_List. In Torque 2.5.0, Torque would allocate x processors packed on any available node which allowed pbs_sched, Maui, and qrun to work as expected with jobs that request the procs resource.

It appears that there has been a regression, procs is again pass-through only in Torque 4.2

torque and openssl

Torque's configure script requires openssl: it terminates if either -lssl or -lcrypto are not found with:
error: TORQUE needs lib openssl-devel in order to build
However, as far as I can tell no openssl routines are used in Torque:

  • the string "openssl" does not appear in any file other than configure. I.e., none of the openssl include files are used in torque since that would require a line
    #include <openssl/...h>
  • I was able to relink libtorque.so.2.0.0 without -lcrypto -lssl and then was able to relink pbs_server, pbs_mom and trqauthd with that library and without -lcrypto -lssl. No unresolved symbols were found and the resulting binaries run without problem.
    Thus, apparently openssl is not used at all.

Incorrect qstat output

Hi,

We're in the process of configuring our new Torque (4.2.7) and have come up with an issue.

Our previous install (2.5.1) has output as below:

[root@dev1 ~]# qstat -a

mgt-mgmt.basc.dpi.vic.gov.au:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


2218246.mgt-mgmt ss1b batch MIRA_4-2014 -- 1 8 800gb 1200: Q --
2500686.mgt-mgmt sk00 batch Trinity.fasta.ca 6105 1 6 22gb 170:0 R 141:0
2502414.mgt-mgmt im18 batch FG_BN_Job25 13862 1 1 18gb 45:00 R 25:26
2502516.mgt-mgmt jp24 asreml RFI_validation 16526 -- -- 13800m 40:00 R 22:49

qstat-f on one of these jobs (2502414) shows
Resource_List.mem = 18gb
Resource_List.nodect = 1
Resource_List.nodes = 1
Resource_List.pmem = 3800mb

The new (4.2.7) shows:

[root@dev3 NCBI]# qstat -a

mgt.basc.science.depi.vic.gov.au:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


187.mgt.basc.science.d pb26 batch samtools_4 43945 1 1 3800m 02:00:00 R 00:34:08
188.mgt.basc.science.d pb26 batch samtools_5 43906 1 1 3800m 02:00:00 R 00:34:07
189.mgt.basc.science.d pb26 batch samtools_6 43792 1 1 3800m 02:00:00 R 00:34:07
190.mgt.basc.science.d pb26 batch samtools_7 43880 1 1 3800m 02:00:00 R 00:34:06
192.mgt.basc.science.d pb26 batch samtools_9 43180 1 1 3800m 02:00:00 R 00:34:05
193.mgt.basc.science.d pb26 batch samtools_10 43945 1 1 3800m 02:00:00 R 00:34:04
226.mgt.basc.science.d pb26 batch memstress_33 44137 1 1 3800m 06:00:00 R 00:04:26
227.mgt.basc.science.d pb26 batch memstress_34 44209 1 1 3800m 06:00:00 R 00:01:37
228.mgt.basc.science.d pb26 batch memstress_35 44178 1 1 3800m 06:00:00 R 00:01:06
229.mgt.basc.science.d pb26 batch memstress_36 44192 1 1 3800m 06:00:00 R 00:01:06
230.mgt.basc.science.d pb26 batch memstress_37 44234 1 1 3800m 06:00:00 R 00:01:06
231.mgt.basc.science.d pb26 batch memstress_38 46662 1 1 3800m 06:00:00 R 00:00:35

qstat -f on one of these jobs (226) shows:
Resource_List.mem = 376gb
Resource_List.nodect = 1
Resource_List.nodes = 1:ppn=1
Resource_List.pmem = 3800mb
Resource_List.walltime = 06:00:00

It seems that the newer version is showing the value of the Resource_List.pmem field (which is the default value for the queue)

2.5.x: -DUSESAVEDRESOURCES

Dear All,

While trying to fix pbs_mom behaviour to preserve resource usage information
when mom gets restarted with -p option i came across code between USESAVEDRESOURCES define blocks.

My question to you is why not make it a default build option?
Was there any reason behind keeping it disabled by default?

Cheers

LKF

/dev/null as stdout/stderr destination not always handled properly

I have found a problem with how TORQUE handles /dev/null as a destination for a stderr/stdout spool file under certain circumstances.

submit_host:/dev/null (remote) works fine -- pbs_mom does not attempt to scp the file if the destination is /dev/null on the remote system, and everything works as expected

/dev/null does not (local destination) pbs_mom seems to send the output to /dev/null (the file is never created in server_priv/spool) but then it still tries to copy the .ER file from the spool directory to /dev/null when the job finishes. We get emails with these error messages:

Unable to copy file /var/spool/torque/spool/277828.host.domain.ER to /dev/null
*** error from copy
/bin/cp: cannot stat `/var/spool/torque/spool/277828.host.domain.ER': No such file or directory
*** end error output

We never noticed this with qsub, because if I do "qsub -e /dev/null ..." qsub will prepend the submit host, so what is actually passed to TORQUE is submit_host:/dev/null. However, we also use the pbs_python API and do not bother prepending out submit host to the e and o job parameters because we want to do all local copies anyway.

This is with TORQUE 4.2.0

pbs_mom requires libraries even without a MIC

It seems I have to install some of the MIC packages on all the nodes even without a MIC card installed.
In particular, I need to install :
intel-mic-2.1.6720-16.2.6.32-358.el6.x86_64.rpm
for dependency: intel-mic-kmod-2.1.6720-16.2.6.32.358.el6.x86_64.rpm
for dependency: intel-mic-flash-2.1.386-3.2.6.32-358.el6.x86_64.rpm
intel-mic-gpl-2.1.6720-16.el6.x86_64.rpm

That is just to get the particular libraries: libcoi_host.so.0 and libscif.so.0
I also need to symlink libcoi_host.so.0 to be in /lib64 as well

If I donโ€™t do the above steps, pbs_mom will not start.

I suggest there be code in pbs_mom similar to the gpu code in that if the kernel driver is not loaded, pbs_mom does not bother with MIC code.

From mom log on a host without a gpu:
20130808:08/08/2013 08:42:52;0001; pbs_mom.3417;Svr;pbs_mom;LOG_DEBUG::main, Not using Nvidia gpu support even though built with --enable-nvidia-gpus

Potential Shadowed Variable Bugs

From compiling with -Wshadow:

../Libnet/net_server.c:625:11: warning: declaration of 'i' shadows a previous local [-Wshadow]
node_func.c:198:17: warning: declaration of โ€˜statusโ€™ shadows a global declaration [-Wshadow]
node_func.c:3548:19: warning: declaration of โ€˜nameโ€™ shadows a global declaration [-Wshadow]
node_func.c:3659:23: warning: declaration of โ€˜nameโ€™ shadows a global declaration [-Wshadow]
node_func.c:3772:29: warning: declaration of โ€˜node_nameโ€™ shadows a global declaration [-Wshadow]
node_func.c:3790:29: warning: declaration of โ€˜node_nameโ€™ shadows a global declaration [-Wshadow]
node_func.c:3818:25: warning: declaration of โ€˜node_nameโ€™ shadows a global declaration [-Wshadow]
node_func.c:3908:29: warning: declaration of โ€˜node_nameโ€™ shadows a global declaration [-Wshadow]
node_manager.c:661:17: warning: declaration of โ€˜node_nameโ€™ shadows a global declaration [-Wshadow]
node_manager.c:905:25: warning: declaration of โ€˜node_idโ€™ shadows a global declaration [-Wshadow]
node_manager.c:2530:16: warning: declaration of โ€˜node_idโ€™ shadows a global declaration [-Wshadow]
node_manager.c:3739:16: warning: declaration of โ€˜nameโ€™ shadows a global declaration [-Wshadow]
node_manager.c:4148:13: warning: declaration of โ€˜node_nameโ€™ shadows a global declaration [-Wshadow]
node_manager.c:5098:18: warning: declaration of โ€˜node_nameโ€™ shadows a global declaration [-Wshadow]
node_manager.c:5132:13: warning: declaration of โ€˜nameโ€™ shadows a global declaration [-Wshadow]
pbsd_init.c:1757:12: warning: declaration of โ€˜pjobโ€™ shadows a previous local [-Wshadow]
pbsd_main.c:2978:11: warning: declaration of โ€˜sidโ€™ shadows a global declaration [-Wshadow]
process_request.c:473:27: warning: declaration of โ€˜tmpLineโ€™ shadows a previous local [-Wshadow]
req_modify.c:1307:13: warning: declaration of โ€˜comp_resc_ltโ€™ shadows a global declaration [-Wshadow]
req_stat.c:651:10: warning: declaration of โ€˜iterโ€™ shadows a previous local [-Wshadow]
svr_jobfunc.c:1189:17: warning: declaration of โ€˜comp_resc_ltโ€™ shadows a global declaration [-Wshadow]
pbs_sched.c:374:19: warning: declaration of โ€˜saddrโ€™ shadows a global declaration [-Wshadow]
pbs_sched.c:423:19: warning: declaration of โ€˜saddrโ€™ shadows a global declaration [-Wshadow]
mom_mach.c:4221:8: warning: declaration of โ€˜pdirโ€™ shadows a global declaration [-Wshadow]
mom_mach.c:4500:15: warning: declaration of โ€˜jobโ€™ shadows a global declaration [-Wshadow]
mom_comm.c:5514:55: warning: declaration of โ€˜log_bufferโ€™ shadows a global declaration [-Wshadow]
mom_main.c:1323:29: warning: declaration of โ€˜ret_stringโ€™ shadows a global declaration [-Wshadow]
mom_main.c:8273:21: warning: declaration of โ€˜time_nowโ€™ shadows a global declaration [-Wshadow]
requests.c:1588:24: warning: declaration of โ€˜tmpLineโ€™ shadows a previous local [-Wshadow]
start_exec.c:612:41: warning: declaration of โ€˜log_bufferโ€™ shadows a global declaration [-Wshadow]
start_exec.c:1724:27: warning: declaration of โ€˜vtableโ€™ shadows a global declaration [-Wshadow]
start_exec.c:2027:23: warning: declaration of โ€˜time_nowโ€™ shadows a global declaration [-Wshadow]
start_exec.c:3367:26: warning: declaration of โ€˜writerpidโ€™ shadows a global declaration [-Wshadow]
start_exec.c:3368:26: warning: declaration of โ€˜shellpidโ€™ shadows a global declaration [-Wshadow]
start_exec.c:4440:17: warning: declaration of โ€˜bufโ€™ shadows a previous local [-Wshadow]
start_exec.c:5478:12: warning: declaration of โ€˜iโ€™ shadows a previous local [-Wshadow]
start_exec.c:6174:41: warning: declaration of โ€˜log_bufferโ€™ shadows a global declaration [-Wshadow]
start_exec.c:7510:26: warning: declaration of โ€˜vtableโ€™ shadows a global declaration [-Wshadow]
start_exec.c:7616:27: warning: declaration of โ€˜vtableโ€™ shadows a global declaration [-Wshadow]
start_exec.c:7683:34: warning: declaration of โ€˜vtableโ€™ shadows a global declaration [-Wshadow]
start_exec.c:8721:39: warning: declaration of โ€˜log_bufferโ€™ shadows a global declaration [-Wshadow]
start_exec.c:8816:46: warning: declaration of โ€˜log_bufferโ€™ shadows a global declaration [-Wshadow]
qselect.c:100:58: warning: declaration of โ€˜optargโ€™ shadows a global declaration [-Wshadow]
qselect.c:133:104: warning: declaration of โ€˜optargโ€™ shadows a global declaration [-Wshadow]
qsub_functions.c:2553:15: warning: declaration of โ€˜alloc_lenโ€™ shadows a previous local [-Wshadow]
qsub_functions.c:2567:21: warning: declaration of โ€˜err_msgโ€™ shadows a previous local [-Wshadow]
qsub_functions.c:2589:21: warning: declaration of โ€˜err_msgโ€™ shadows a previous local [-Wshadow]
qsub_functions.c:2939:25: warning: declaration of โ€˜err_msgโ€™ shadows a previous local [-Wshadow]
qrun.c:45:27: warning: declaration of โ€˜jobโ€™ shadows a global declaration [-Wshadow]
qrun.c:145:13: warning: declaration of โ€˜jobโ€™ shadows a global declaration [-Wshadow]
qmgr.c:664:7: warning: declaration of โ€˜connectionโ€™ shadows a global declaration [-Wshadow]

Problem with array since "Backport pull request 154" on 4.1.6.h2

Hi,

I have a strange issue with the last commit concerning "Backport pull request 154"

  • On my test server (one virtual node with same os/version), everything is ok
  • On my production serveur, every array jobs are in qstat with "number[]" in Q state, and not displayed on qstat -ant1.

The only traces obtained are -->
08/14/2013 10:42:56;0008;PBS_Server.29421;Job;req_commit;job_id: 220548[].torque.xxx.fr
And after a restart -->
08/14/2013 10:44:34;0001;PBS_Server.29663;Svr;PBS_Server;LOG_ERROR::No such file or directory (2) in job_save, cannot open file '/var/spool/torque/server_priv/jobs/220548.torque.xxx.fr.JB' for job 220548[].torque.xxx.fr in state SUBSTATE55 (quick)

(after check, there is only .TA files)

A new package done without the last commit is working fine.

cannot start job - RM failure, rc: 15043 MSG= connection to mom tiimed out

When for some reason the job's spool file SC does not exist but the JB files are present the scheduler tries to schedule the job but the mom returns error
"child reported failure for job after 2 seconds (dest-????), rc=-2"
and the scheduler returns error:
"Reject reply code=15043(Execution server rejected request MSG=connection to mom timed out"

torque version is 4.2.6.h1

4.1.5.1 Crash in attrlist handling

Torque sometimes crashes when handling attribute lists. The standard glibc reports this as a double free.

(gdb) bt
#0  0x00007f44d3e34b23 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /usr/lib/libtcmalloc.so
#1  0x00007f44d3e34f67 in tcmalloc::ThreadCache::Scavenge() () from /usr/lib/libtcmalloc.so
#2  0x00007f44d3e41685 in tc_free () from /usr/lib/libtcmalloc.so
#3  0x000000000046ef6c in free_attrlist (pattrlisthead=0xaa2df38) at attr_func.c:422
#4  0x0000000000431542 in reply_free (prep=0x8802e88) at reply_send.c:300
#5  0x000000000042f269 in free_br (preq=0x8802a00) at process_request.c:1080
#6  0x0000000000431378 in reply_send_svr (request=0x8802a00) at reply_send.c:197
#7  0x00000000004504a4 in sel_step3 (cntl=0xb1e0d80) at req_select.c:670
#8  0x000000000044fbe7 in req_selectjobs (preq=0x8802a00) at req_select.c:351
#9  0x000000000042ee51 in dispatch_request (sfds=7, request=0x8802a00) at process_request.c:869
#10 0x000000000042e942 in process_request (chan=0x7a48ba0) at process_request.c:662
#11 0x0000000000429f54 in process_pbs_server_port (sock=7, is_scheduler_port=0) at pbsd_main.c:402
#12 0x000000000042a1b3 in start_process_pbs_server_port (new_sock=0x6a1fbc0) at pbsd_main.c:533
#13 0x000000000047373e in work_thread (a=0x7fff7d872480) at u_threadpool.c:307
#14 0x00007f44d29e48ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#15 0x00007f44d2543b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#16 0x0000000000000000 in ?? ()```

torque.setup fails in open stack environments.

To get torque to work in open stack vm's, the torque.setup must be run with the fqdn

torque.setup root foo.localhost

It is not clear from the torque.setup that two parameters are accepted.
The 'usage' message should indicate this fact.

remove - "do you wish to continue" for automated installs

When running torque.setup, the admin is prompted y/n to create the server database.

Enhancement: Provide a mechanism for torque.setup to accept default values and not prompt administrator. (aka --defaults or --silent)

Story points: The fact torque waits for user input is limiting the ability to quickly deploy torque in open stack

bs_server port is: 15001
trqauthd daemonized - port 15005
trqauthd successfully started
initializing TORQUE (admin: root@trq-1857)

You have selected to start pbs_server in create mode.
If the server database exists it will be overwritten.
do you wish to continue y/(n)?

torque 4.2 run-time option to disable cpuset

pbs_mom run-time option so pbs_mom could be start without
setting up cpuset if so desired by the system admin.

  In past releases /dev/cpuset would have to be mounted before the mom
  start, otherwise torque would report an error and not use it:

  torque.2.4.11
  pbs_mom: LOG_ERROR::initialize_root_cpuset, cannot locate
  /dev/cpuset/cpus - cpusets not configured/enabled on host


  Unfortunately the torque.4.2.4 does indeed mount it so that's
  not an option any longer.

inconsistent used time reported in 'qstat -a'

I'm seeing some strange used CPU/walltime reports when using qstat -a with Torque v4.1.6.

gengar1:~$ qstat 4100326
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
4100326.master             Po2_Diss_CCSD    username        152:53:3 R long           
$ qstat 4100326 -a

master.gengar.gent.vsc: 
                                                                               Req'd    Req'd       Elap
Job ID               Username    Queue    Jobname          SessID NDS   TSK    Memory   Time    S   Time
-------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - ---------
4100326.master.g     username    long     myjob           17727     1      8    --   71:59:00  R  00:33:05

$ qstat -f 4100326 | grep walltime
    resources_used.walltime = 41:32:11
    Resource_List.walltime = 71:59:00

The "Elapsed time" of ~33m makes no sense at all, since the job has been running full speed for 41+ hours, as indicated by the CPU time (it's a multi-threaded job).

What is the "Elapsed time" supposed to be representing? It's not walltime or CPU time (the documentation isn't exactly clear what is supposed to be showed with -a), but then what does it show?

pbs_mom leftover processes hanging

on workernodes, we have pbs_mom spawning itself during jobstart and sometimes these processes do not get cleaned up.
the hanging processes seem harmless, but their number keeps growing until someone restart pbs_mom service.

code tested is latest 4.1.X branch, but we have this observed with lots of releases.

ps axf output

 8003 ?        SLsl   0:09 /usr/sbin/pbs_mom -H somehost -p -d /var/spool/pbs
 8927 ?        Ss     0:00  \_ /usr/sbin/pbs_mom -H somehost -p -d /var/spool/pbs

backtrace from process 8927

(gdb) bt
#0  0x00007f0273b2c54d in read () from /lib64/libpthread.so.0
#1  0x00007f0274b35c39 in read_nonblocking_socket (fd=15, buf=0x7fff30ffb1f0, count=16) at ../Libifl/nonblock.c:129
#2  0x000000000043a025 in starter_return (upfds=18, downfds=15, code=-2, sjrtn=0x7fff30ffb390) at start_exec.c:6500
#3  0x00000000004341f9 in handle_prologs (pjob=0x1970030, sjr=0x7fff30ffb390, TJE=0xac8280) at start_exec.c:2944
#4  0x0000000000434fe1 in setup_batch_job (pjob=0x1970030, sjr=0x7fff30ffb390, TJE=0xac8280, pts_ptr=0x7fff30ffb3ac, qsub_sock_ptr=0x7fff30ffb3a8) at start_exec.c:3421
#5  0x0000000000435fa2 in TMomFinalizeChild (TJE=0xac8280) at start_exec.c:3979
#6  0x0000000000433464 in TMomFinalizeJob2 (TJE=0xac8280, SC=0x7fff30ffea9c) at start_exec.c:2302
#7  0x000000000043c39c in exec_job_on_ms (pjob=0x1970030) at start_exec.c:8560
#8  0x0000000000439d5b in start_exec (pjob=0x1970030) at start_exec.c:6343
#9  0x0000000000441831 in req_commit (preq=0x196e700) at mom_req_quejob.c:1047
#10 0x00000000004433d8 in mom_dispatch_request (sfds=9, request=0x196e700) at mom_process_request.c:342
#11 0x00000000004432e0 in mom_process_request (sock_num=0x7fff31000520) at mom_process_request.c:280
#12 0x00007f0274b54284 in wait_request (waittime=1, SState=0x0) at ../Libnet/net_server.c:669
#13 0x0000000000424bf7 in main_loop () at mom_main.c:8595
#14 0x0000000000424e7a in main (argc=6, argv=0x7fff31000b28) at mom_main.c:9022

fd 15 is a pipe

pbs_mom 8927 root   15r  FIFO                0,8      0t0 29891117 pipe

and is the same as

pbs_mom 8003 root   16w  FIFO      0,8      0t0 29891117 pipe

Job end emails from TORQUE 4.2.0 no longer contain resources_used

I didn't have the problem with 4.2.0-EA, however with the latest 4.2.0 release the body of the email sent on a job end event no longer contains resources_used information.

Here is a current example of an email body:

PBS Job Id: 263290.rockhopper.jax.org
Job Name: STDIN
Exec host: n8/0
Execution terminated
Exit_status=0

Here is an example from 4.2.0-EA:

PBS Job Id: 262025.rockhopper.jax.org
Job Name: align_1423_stack_grey.tif
Exec host: n9/1+n9/0
Execution terminated
Exit_status=0
resources_used.cput=03:31:07
resources_used.mem=9056144kb
resources_used.vmem=18435000kb
resources_used.walltime=03:08:55

Array job dependencies failing silently

i=$(qsub sleeper1.sh)
qsub -t 0-1 -W depend=afterany:$i sleeper2.sh

This fails (silently to the user) after sleeper1.sh finished with the
log message:
1/11/2013 19:43:50;0080;PBS_Server.697;Req;req_reject;Reject reply
code=15004(Invalid request MSG=Arrays may only be given array
dependencies), aux=0, type=RegisterDependency, from @cluster

It should be possible for an array job to depend on a single job.

custom message in mail sent after job completion

Is there any way in which a custom message can be included in the mail body of the mail that is sent after a job is completed?

E.g., I would imagine that setting an environment variable that reports the result of some test after the task in the job was completed would be very useful:

export PBS_MAIL_MSG="Job done, results have been verified."

spec file cleanup

torque daemons should not be enabled automatically in the spec file, so you
can quite some code from the spec file.

best regards,

Florian La Roche

--- torque.spec
+++ torque.spec
@@ -218,15 +220,12 @@
/sbin/ldconfig
if [ $1 -eq 1 ]; then
chkconfig --add trqauthd >/dev/null 2>&1 || :

  • chkconfig trqauthd on >/dev/null 2>&1 || :
  • service trqauthd start >/dev/null 2>&1 || :
    else
    service trqauthd condrestart >/dev/null 2>&1 || :
    fi

%preun
if [ $1 -eq 0 ]; then

  • chkconfig trqauthd off >/dev/null 2>&1 || :
    service trqauthd stop >/dev/null 2>&1 || :
    chkconfig --del trqauthd >/dev/null 2>&1 || :
    fi
    @@ -249,39 +248,13 @@
    trqauthd 15005/tcp # authorization daemon
    trqauthd 15005/udp # authorization daemon
    EOF
  • if [ ! -e %{torque_home}/server_priv/serverdb ]; then
  •    if [ "%{torque_server}" = "localhost" ]; then
    
  •        TORQUE_SERVER=`hostname`
    
  •        perl -pi -e "s/localhost/$TORQUE_SERVER/g" %{torque_home}/server_name %{torque_home}/server_priv/nodes 2>/dev/null || :
    
  •    else
    
  •        TORQUE_SERVER="%{torque_server}"
    

- fi

  •    pbs_server -t create -f >/dev/null 2>&1 || :
    
  •    sleep 1
    
  •    qmgr -c "set server scheduling = true" >/dev/null 2>&1 || :
    
  •    qmgr -c "set server managers += root@$TORQUE_SERVER" >/dev/null 2>&1 || :
    
  •    qmgr -c "set server managers += %{torque_user}@$TORQUE_SERVER" >/dev/null 2>&1 || :
    
  •    qmgr -c "create queue batch queue_type = execution" >/dev/null 2>&1 || :
    
  •    qmgr -c "set queue batch started = true" >/dev/null 2>&1 || :
    
  •    qmgr -c "set queue batch enabled = true" >/dev/null 2>&1 || :
    
  •    qmgr -c "set queue batch resources_default.walltime = 1:00:00" >/dev/null 2>&1 || :
    
  •    qmgr -c "set queue batch resources_default.nodes = 1" >/dev/null 2>&1 || :
    
  •    qmgr -c "set server default_queue = batch" >/dev/null 2>&1 || :
    
  •    qmgr -c "set node $TORQUE_SERVER state = free" >/dev/null 2>&1 || :
    
  •    killall -TERM pbs_server >/dev/null 2>&1 || :
    

- fi

 chkconfig --add pbs_server >/dev/null 2>&1 || :
  • chkconfig pbs_server on >/dev/null 2>&1 || :
  • service pbs_server start >/dev/null 2>&1 || :
    else
    service pbs_server condrestart >/dev/null 2>&1 || :
    fi

%preun server
if [ $1 -eq 0 ]; then

  • chkconfig pbs_server off >/dev/null 2>&1 || :
    service pbs_server stop >/dev/null 2>&1 || :
    chkconfig --del pbs_server >/dev/null 2>&1 || :
    fi
    @@ -301,20 +274,13 @@
    trqauthd 15005/tcp # authorization daemon
    trqauthd 15005/udp # authorization daemon
    EOF
  • if [ "%{torque_server}" = "localhost" ]; then
  •    TORQUE_SERVER=`hostname`
    
  •    perl -pi -e "s/localhost/$TORQUE_SERVER/g" %{torque_home}/mom_priv/config 2>/dev/null || :
    
  • fi
    chkconfig --add pbs_mom >/dev/null 2>&1 || :
  • chkconfig pbs_mom on >/dev/null 2>&1 || :
  • service pbs_mom start >/dev/null 2>&1 || :
    else
    service pbs_mom condrestart >/dev/null 2>&1 || :
    fi

%preun client
if [ $1 -eq 0 ]; then

  • chkconfig pbs_mom off >/dev/null 2>&1 || :
    service pbs_mom stop >/dev/null 2>&1 || :
    chkconfig --del pbs_mom >/dev/null 2>&1 || :
    fi
    @@ -322,15 +288,12 @@
    %post scheduler
    if [ $1 -eq 1 ]; then
    chkconfig --add pbs_sched >/dev/null 2>&1 || :
  • chkconfig pbs_sched on >/dev/null 2>&1 || :
  • service pbs_sched start >/dev/null 2>&1 || :
    else
    service pbs_sched condrestart >/dev/null 2>&1 || :
    fi

%preun scheduler
if [ $1 -eq 0 ]; then

  • chkconfig pbs_sched off >/dev/null 2>&1 || :
    service pbs_sched stop >/dev/null 2>&1 || :
    chkconfig --del pbs_sched >/dev/null 2>&1 || :
    fi

incorrect parsing in get_port in pbsd_main.c

get_port in pbsd_main.c checks if (isdigit(*arg)). If you specify an IP address for the host name or if the first character of the host name is a digit it will attempt to set the port number to the host's IP address instead.

Torque 4.1.4 Multiple deadlocks

I found multiple deadlocks which can be triggered by using large array jobs and/or job dependencies.

The first one has a chance to occur when a large array job is submitted and in the process of being created and a second job is submitted with a dependency on the first, before its
creation is finished.

(gdb) info threads
  19 Thread 16091  0x00007fbdc2350c5d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  18 Thread 16120  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
  17 Thread 16119  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
  16 Thread 16118  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
  15 Thread 16117  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  14 Thread 16116  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  13 Thread 16115  0x00007fbdc2374c13 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=120000) at ../sysdeps/unix/sysv/linux/poll.c:87
  12 Thread 16114  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
  11 Thread 16113  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
  10 Thread 16112  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  9 Thread 16111  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  8 Thread 16110  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  7 Thread 16109  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
  6 Thread 16108  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  5 Thread 16107  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
  4 Thread 16104  0x00007fbdc2350c5d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  3 Thread 16103  0x00007fbdc2350c5d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  2 Thread 16102  0x00007fbdc282838d in accept () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 16099  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136

Thread 15 and 8 are in a deadlock. This occurs when a array job is in
the process of being generated and a second job is submitted with a
dependency on the first job is submitted.

(gdb) thread 15
[Switching to thread 15 (Thread 16117)]#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136 in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007fbdc2823179 in _L_lock_953 () from /lib/libpthread.so.0
#2  0x00007fbdc2822f9b in __pthread_mutex_lock (mutex=0x70cd800) at pthread_mutex_lock.c:61
#3  0x000000000040a6b4 in is_array (id=Unhandled dwarf expression opcode 0xf3
) at array_func.c:137
#4  0x0000000000425e3c in dispatch_request (sfds=65535, request=0x7fbd94048ff0) at process_request.c:927
#5  0x000000000040eaac in issue_to_svr (servern=Unhandled dwarf expression opcode 0xf3
) at issue_request.c:324
#6  0x000000000043a713 in send_depend_req (pjob=0x7c876d0, pparent=0x7fbd94043160, type=12, op=1, schedhint=0, postfunc=0x439d40 <post_doq>) at req_register.c:2262
#7  0x000000000043bbb9 in depend_on_que (pattr=Unhandled dwarf expression opcode 0xf3
) at req_register.c:1237
#8  0x000000000044a5c5 in svr_enquejob (pjob=0x7c876d0, has_sv_qs_mutex=0, prev_job_index=65) at svr_jobfunc.c:575
#9  0x00000000004121fc in job_clone_wt (cloned_id=Unhandled dwarf expression opcode 0xf3
) at job_func.c:1153
#10 0x000000000045acd2 in work_thread (a=0x7fff7fa38120) at u_threadpool.c:307
#11 0x00007fbdc28208ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#12 0x00007fbdc237fb6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#13 0x0000000000000000 in ?? ()
(gdb) print *(pthread_mutex_t*)0x70cd800
$2 = {__data = {__lock = 2, __count = 0, __owner = 16110, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000\356>\000\000\001", '\000' <repeats 26 times>, __align = 2}
Lock is hold by thread 8

(gdb) thread 8
[Switching to thread 8 (Thread 16110)]#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136 in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007fbdc2823179 in _L_lock_953 () from /lib/libpthread.so.0
#2  0x00007fbdc2822f9b in __pthread_mutex_lock (mutex=0x7fbdb02858b0) at pthread_mutex_lock.c:61
#3  0x000000000044a61d in lock_ai_mutex (pa=0x7fbdb02103f0, id=Unhandled dwarf expression opcode 0xf3
) at svr_jobfunc.c:2940
#4  0x000000000040a77c in get_array (id=Unhandled dwarf expression opcode 0xf3
) at array_func.c:164
#5  0x000000000043a342 in req_registerarray (preq=0x7fbdb82e0170) at req_register.c:718
#6  0x0000000000425e4c in dispatch_request (sfds=65535, request=0x7fbdb82e0170) at process_request.c:929
#7  0x000000000040eaac in issue_to_svr (servern=Unhandled dwarf expression opcode 0xf3
) at issue_request.c:324
#8  0x000000000043a713 in send_depend_req (pjob=0x7fbdb82e32e0, pparent=0x7fbdb80a5ed0, type=12, op=1, schedhint=0, postfunc=0x439d40 <post_doq>) at req_register.c:2262
#9  0x000000000043bbb9 in depend_on_que (pattr=Unhandled dwarf expression opcode 0xf3
) at req_register.c:1237
#10 0x000000000044a5c5 in svr_enquejob (pjob=0x7fbdb82e32e0, has_sv_qs_mutex=0, prev_job_index=-1) at svr_jobfunc.c:575
#11 0x0000000000438cf4 in req_commit (preq=0x7fbdb81bfb80) at req_quejob.c:2141
#12 0x0000000000426119 in dispatch_request (sfds=11, request=0x7fbdb81bfb80) at process_request.c:741
#13 0x000000000042687e in process_request (chan=Unhandled dwarf expression opcode 0xf3
) at process_request.c:662
#14 0x0000000000422da8 in process_pbs_server_port (sock=11, is_scheduler_port=0) at pbsd_main.c:410
#15 0x0000000000422e9a in start_process_pbs_server_port (new_sock=Unhandled dwarf expression opcode 0xf3
) at pbsd_main.c:541
#16 0x000000000045acd2 in work_thread (a=0x7fff7fa38120) at u_threadpool.c:307
#17 0x00007fbdc28208ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#18 0x00007fbdc237fb6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#19 0x0000000000000000 in ?? ()
(gdb)  print *(pthread_mutex_t*)0x7fbdb02858b0
$3 = {__data = {__lock = 2, __count = 0, __owner = 16117, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000\365>\000\000\001", '\000' <repeats 26 times>, __align = 2}
Lock is hold by thread 15

Thread 15 holds the ai lock. I do not know where the lock is aquired,
but it is release in job_clone_wt shortly after all jobs are enqueued.

(gdb) thread 8
[Switching to thread 8 (Thread 16110)]#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136 in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) frame 4
#4  0x000000000040a77c in get_array (id=Unhandled dwarf expression opcode 0xf3
) at array_func.c:164
164     lock_ai_mutex(pa, __func__, NULL, LOGLEVEL);
Current language:  auto
The current source language is "auto; currently c".
(gdb) print pa
$4 = (job_array *) 0x7fbdb02103f0

(gdb) thread 15
[Switching to thread 15 (Thread 16117)]#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
    in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
Current language:  auto
The current source language is "auto; currently asm".
(gdb) frame 9
#9  0x00000000004121fc in job_clone_wt (cloned_id=Unhandled dwarf expression opcode 0xf3
) at job_func.c:1153
1153          if ((rc = svr_enquejob(pjobclone, FALSE, prev_index)))
Current language:  auto
The current source language is "auto; currently c".
(gdb) print pa
$5 = (job_array *) 0x7fbdb02103f0

(gdb) info threads
  19 Thread 13950  0x00007f3ef94cdc5d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  18 Thread 14161  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  17 Thread 14160  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  16 Thread 14159  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  15 Thread 14158  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  14 Thread 14157  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  13 Thread 14156  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  12 Thread 14155  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  11 Thread 14154  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  10 Thread 14153  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  9 Thread 14152  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  8 Thread 14151  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  7 Thread 14150  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  6 Thread 14149  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  5 Thread 14148  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  4 Thread 14145  0x00007f3ef94cdc5d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  3 Thread 14144  0x00007f3ef94cdc5d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  2 Thread 14143  0x00007f3ef99a538d in accept () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 14142  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136

(gdb) thread 17
[Switching to thread 17 (Thread 14160)]#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136 in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f3ef99a0179 in _L_lock_953 () from /lib/libpthread.so.0
#2  0x00007f3ef999ff9b in __pthread_mutex_lock (mutex=0x5729cb0) at pthread_mutex_lock.c:61
#3  0x0000000000448760 in lock_ji_mutex (pjob=0x5731100, id=Unhandled dwarf expression opcode 0xf3
) at svr_jobfunc.c:2863
#4  0x0000000000411370 in remove_job (aj=0xa972c0, pjob=0x5731100) at job_func.c:2562
#5  0x0000000000449aac in svr_dequejob (pjob=0x5731100, parent_queue_mutex_held=0) at svr_jobfunc.c:758
#6  0x0000000000411e17 in svr_job_purge (pjob=0x5731100) at job_func.c:1776
#7  0x000000000042c5c8 in handle_complete_second_time (ptask=Unhandled dwarf expression opcode 0xf3
) at req_jobobit.c:1800
#8  0x000000000045acd2 in work_thread (a=0x7fff14ff7180) at u_threadpool.c:307
#9  0x00007f3ef999d8ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#10 0x00007f3ef94fcb6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()
(gdb) print *(pthread_mutex_t*)0x5729cb0
$4 = {__data = {__lock = 2, __count = 0, __owner = 14154, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000J7", '\000' <repeats 29 times>, __align = 2}

(gdb) thread 11
[Switching to thread 11 (Thread 14154)]#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136 in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f3ef99a0179 in _L_lock_953 () from /lib/libpthread.so.0
#2  0x00007f3ef999ff9b in __pthread_mutex_lock (mutex=0x51cde90) at pthread_mutex_lock.c:61
#3  0x000000000044a864 in lock_alljobs_mutex (aj=0xa972c0, id=Unhandled dwarf expression opcode 0xf3
) at svr_jobfunc.c:3017
#4  0x0000000000410aee in find_job_by_array (aj=0xa972c0, job_id=0x7f3ee41f99a0 "30979[34].glorim-1.cluster", get_subjob=1) at job_func.c:2140
#5  0x0000000000410e16 in svr_find_job (jobid=0x7f3ee41f99a0 "30979[34].glorim-1.cluster", get_subjob=1) at job_func.c:2245
#6  0x000000000042c53a in handle_complete_second_time (ptask=0x7f3ee40361d0) at req_jobobit.c:1765
#7  0x000000000045acd2 in work_thread (a=0x7fff14ff7180) at u_threadpool.c:307
#8  0x00007f3ef999d8ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#9  0x00007f3ef94fcb6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#10 0x0000000000000000 in ?? ()
(gdb) print *(pthread_mutex_t*)0x51cde90
$5 = {__data = {__lock = 2, __count = 0, __owner = 14160, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000P7\000\000\001", '\000' <repeats 26 times>, __align = 2}
(gdb) info threads
  19 Thread 23839  0x00007f18f4083c5d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  18 Thread 23882  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  17 Thread 23881  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
  16 Thread 23880  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  15 Thread 23879  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  14 Thread 23878  0x00007f18f40a7c13 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=120000) at ../sysdeps/unix/sysv/linux/poll.c:87
  13 Thread 23877  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
  12 Thread 23876  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  11 Thread 23875  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  10 Thread 23874  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
  9 Thread 23873  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  8 Thread 23872  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  7 Thread 23871  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  6 Thread 23870  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  5 Thread 23869  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  4 Thread 23868  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  3 Thread 23867  0x00007f18f4083c5d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  2 Thread 23866  0x00007f18f4083c5d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 23865  0x00007f18f455b38d in accept () at ../sysdeps/unix/syscall-template.S:82
Current language:  auto
The current source language is "auto; currently asm".
(gdb) thread 18
[Switching to thread 18 (Thread 23882)]#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
    in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f18f4556179 in _L_lock_953 () from /lib/libpthread.so.0
#2  0x00007f18f4555f9b in __pthread_mutex_lock (mutex=0x5eaf850) at pthread_mutex_lock.c:61
#3  0x000000000040a7d3 in get_array (id=0x7f18e1ff5490 "31008[].glorim-1.cluster") at array_func.c:163
#4  0x00000000004118fa in get_jobs_array (pjob_ptr=0x7f18e1ff6cd8) at job_func.c:2760
#5  0x0000000000411d1e in svr_job_purge (pjob=0x7f18d405c260) at job_func.c:1730
#6  0x000000000042c6f8 in handle_complete_second_time (ptask=Unhandled dwarf expression opcode 0xf3
) at req_jobobit.c:1800
#7  0x000000000045b0d2 in work_thread (a=0x7fff26c6a610) at u_threadpool.c:307
#8  0x00007f18f45538ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#9  0x00007f18f40b2b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#10 0x0000000000000000 in ?? ()
(gdb) print *(pthread_mutex_t*)0x5eaf850
$1 = {__data = {__lock = 2, __count = 0, __owner = 23870, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000>]\000\000\001", '\000' <repeats 26 times>, __align = 2}
(gdb) thread 6
[Switching to thread 6 (Thread 23870)]#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136 in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f18f4556179 in _L_lock_953 () from /lib/libpthread.so.0
#2  0x00007f18f4555f9b in __pthread_mutex_lock (mutex=0x7f18ec05a5f0) at pthread_mutex_lock.c:61
#3  0x000000000044a8bd in lock_ai_mutex (pa=0x7f18ec0598f0, id=Unhandled dwarf expression opcode 0xf3
) at svr_jobfunc.c:2986
#4  0x000000000040cb3c in next_array (iter=Unhandled dwarf expression opcode 0xf3
) at array_func.c:1895
#5  0x000000000040cbbe in update_array_statuses () at array_func.c:1706
#6  0x00000000004425f1 in req_stat_job_step2 (cntl=0x7f18ec05d940) at req_stat.c:586
#7  0x0000000000442875 in req_stat_job (preq=0x7f18ec05ed40) at req_stat.c:331
#8  0x0000000000426038 in dispatch_request (sfds=12, request=0x7f18ec05ed40) at process_request.c:963
#9  0x00000000004269ae in process_request (chan=Unhandled dwarf expression opcode 0xf3
) at process_request.c:662
#10 0x0000000000422ed8 in process_pbs_server_port (sock=12, is_scheduler_port=0) at pbsd_main.c:410
#11 0x0000000000422fca in start_process_pbs_server_port (new_sock=Unhandled dwarf expression opcode 0xf3
) at pbsd_main.c:541
#12 0x000000000045b0d2 in work_thread (a=0x7fff26c6a610) at u_threadpool.c:307
#13 0x00007f18f45538ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#14 0x00007f18f40b2b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#15 0x0000000000000000 in ?? ()

X11 forwarding (in 4.1.3)

Hi,

When I specify '-X' with qsub, $DISPLAY is not being set on the compute node I get sent to. Using SSH (with -X) directly to the node works, so xauth, etc. seem to work fine.

I took time to instrument the part in src/resmom/start_exec.c and create a pbs_mom that would give me some more output:

[0907][cable@cadgrid:~]$ qsub -X -I -l nodes=1:ppn=12,vmem=200g
qsub: waiting for job 183.cadgrid.asdf to start
qsub: job 183.cadgrid.asdf ready

pbs_mom: LOG_DEBUG::TMomFinalizeChild, beginning x11 fwd
pbs_mom: LOG_DEBUG::setup_x11_forwarding, in function
value of TJE->is_interactive is 1 value of pjob->ji_wattr[JOB_ATR_forwardx11].at_val.at_str is (null)
pbs_mom: LOG_DEBUG::TMomFinalizeChild, continuing on

Because pjob->ji_wattr[JOB_ATR_forwardx11].at_val.at_str is null, it's not going on to run x11_create_display.

My guess is that pbs_mom is not getting the forwardx11 flag - but I'm not that familiar with the codebase. I know I had this working in 4.0 when I was testing implementing TORQUE on our network a while ago - did this change in 4.1?

"printjob 183.cadgrid.asdf.JB" shows:

---------------------------------------------------
jobid:  183.cadgrid.asdf
---------------------------------------------------
ji_qs version:  0x00020302
state:      0x4
substate:   0x2a (42)
svrflgs:    0x3001 (12289)
ordering:   0
inter prior:    0
stime:      1357740507
file base:  183.cadgrid.asdf
queue:      main
destin:     cadgrid002
union type exec:
    momaddr 2887332710
    exits   0
--attributes--
Job_Name = STDIN
Job_Owner = cable@cadgrid
job_state = R
queue = main
server = cadgrid.asdf
Checkpoint = u
ctime = 1357740503
Error_Path = cadgrid.asdf:/home/cable/STDIN.e183
exec_host = cadgrid002/11+cadgrid002/10+cadgrid002/9+cadgrid002/8+cadgrid002/7+cadgrid002/6+cadgrid002/5+cadgrid002/4+cadgrid002/3+cadgrid002/2+cadgrid002/1+cadgrid002/0
exec_port = 15003+15003+15003+15003+15003+15003+15003+15003+15003+15003+15003+15003
Hold_Types = n
interactive = 35475
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = 1357740507
Output_Path = cadgrid.asdf:/home/cable/STDIN.o183
Priority = 0
qtime = 1357740503
Rerunable = False
Resource_List.neednodes = 1:ppn=12
Resource_List.nodect = 1
Resource_List.nodes = 1:ppn=12
Resource_List.vmem = 200gb
substate = 40
Variable_List = PBS_O_QUEUE=main
PBS_O_HOME=/
PBS_O_WORKDIR=/home/cable
PBS_O_HOST=cadgrid.asdf
PBS_O_SERVER=cadgrid
PBS_O_WORKDIR=/home/cable

euser = cable
egroup = wheel
hashname = 183.cadgrid.asdf
hop_count = 1
queue_rank = 93
queue_type = E
etime = 1357740503
submit_args = -X -I -l nodes=1:ppn=12,vmem=200g
start_time = 1357740507
start_count = 1
fault_tolerant = False
job_radix = 0
submit_host = cadgrid.asdf
init_work_dir = /home/cable

torque default runlevels

To have torque registered correctly on all RHEL releases, please specify a two-digit
runlevel.

best regards,

Florian La Roche

--- torque-4.2.1/contrib/init.d/pbs_mom.in.lr 2013-03-06 02:04:28.000000000 +0100
+++ torque-4.2.1/contrib/init.d/pbs_mom.in 2013-03-16 12:07:44.890520177 +0100
@@ -2,7 +2,7 @@

pbs_mom This script will start and stop the PBS Mom

-# chkconfig: 345 95 5
+# chkconfig: 345 95 05

description: TORQUE/PBS is a versatile batch system for SMPs and clusters

ulimit -n 32768
--- torque-4.2.1/contrib/init.d/pbs_sched.in.lr 2013-03-06 02:04:28.000000000 +0100
+++ torque-4.2.1/contrib/init.d/pbs_sched.in 2013-03-16 12:07:44.890520177 +0100
@@ -2,7 +2,7 @@

pbs_sched This script will start and stop the PBS Scheduler

-# chkconfig: 345 95 5
+# chkconfig: 345 95 05

description: PBS is a batch versatile batch system for SMPs and clusters

Source the library functions

--- torque-4.2.1/contrib/init.d/pbs_server.in.lr 2013-03-06 02:04:28.000000000 +0100
+++ torque-4.2.1/contrib/init.d/pbs_server.in 2013-03-16 12:09:29.082520269 +0100
@@ -2,7 +2,7 @@

pbs_server This script will start and stop the PBS Server

-# chkconfig: 345 95 5
+# chkconfig: 345 95 05

description: PBS is a versatile batch system for SMPs and clusters

Source the library functions

--- torque-4.2.1/contrib/init.d/trqauthd.in.lr 2013-03-06 02:04:28.000000000 +0100
+++ torque-4.2.1/contrib/init.d/trqauthd.in 2013-03-16 12:07:44.891520177 +0100
@@ -2,7 +2,7 @@

trqauthd This script will start and stop the Torque Authorization Daemon

-# chkconfig: 345 95 5
+# chkconfig: 345 95 05

description: PBS is a batch versatile batch system for SMPs and clusters

Source the library functions

Resurrect accounting information at job end records

USESAVEDRESOURCES governs if we will take consumed resources from the
saved information inside the job itself, but consumed resources from
job obituary request must be used unconditionally.

This is a regression from
dbbe2f9
where blindly fixing GCC warnings resulted in functionality loss.

Patch is available,
http://computing.kiae.ru/~rea/torque-2.5.13-resurrect-accounting-information-in-job-end-records.patch

It is now being tested on our Torque farm and shown no problems up to date, while resources_used.* records reappeared in accounting and server logs.

pbs_server -f is not documented in man page

It is possible to run pbs_server -t create -f and bypass the "do you wish to continue" prompt.

pbs_server -t create
You have selected to start pbs_server in create mode.
If the server database exists it will be overwritten.
do you wish to continue y/(n)?

The pbs_server man page makes no reference to the '-f' flag. This should be added

torque incorrectly reports memory usage

torque does not report memory usage of processes that use setsid and thus run in a session different from the session that runs the submission script. Since anybody can run setsid it is more or less trivial to circumvent any memory usage enforcement (e.g., we use the RESOURCELIMITPOLICY in moab to enforce memory limits). The patch below expands the injob function such that if the sessionid of a process is not that of the jobscript, a) it is checked whether the sessionid is in the cpuset of the job or b) (if cpusets are not enabled) it is checked whether one of the parents' sessionid equals that of the jobscript.
I have tested the patch under torque-4.2.5 (even though the issue probably exists in all torque versions). In both scenarios (with and without cpusets) mem and vmem are now reported correctly, i.e., include memory usage of processes that run in a new session.

  • Martin

<torque-4.2.5-injob.patch>
--- torque-4.2.5/src/resmom/linux/mom_mach.c.orig 2013-09-05 15:09:24.000000000 -0700
+++ torque-4.2.5/src/resmom/linux/mom_mach.c 2013-10-25 11:24:12.707659182 -0700
@@ -915,6 +915,13 @@

{
task *ptask;

  • pid_t pid;
    +#ifdef PENABLE_LINUX26_CPUSETS
  • struct pidl *pids = NULL;
  • struct pidl *pp;
    +#else
  • proc_stat_t ps;
    +#endif /
    PENABLE_LINUX26_CPUSETS */

for (ptask = (task *)GET_NEXT(pjob->ji_tasks);
ptask != NULL;
@@ -929,6 +936,65 @@
}
}

  • /* processes with a different sessionid are not necessarily not part of the
  • job: the job can call setsid; need to check whether one of the parent
    
  • processes has a sessionid that is in the job */
    
    +#ifdef PENABLE_LINUX26_CPUSETS
    +
  • /* check whether the sid is in the job's cpuset */
  • pids = get_cpuset_pidlist(pjob->ji_qs.ji_jobid, pids);
  • pp = pids;
  • while (pp != NULL)
  • {
  • pid = pp->pid;
  • pp = pp->next;
  • if (pid == sid)
  •  {
    
  •  free_pidlist(pids);
    
  •  return(TRUE);
    
  •  }
    
  • }
  • free_pidlist(pids);
    +#else
  • /* get the parent process id of the sid and check whether it is part of
  • the job; iterate */
    
  • pid = sid;
  • while (pid > 1)
  • {
  • if ((ps = get_proc_stat(pid)) == NULL)
  •  {
    
  •  if (errno != ENOENT)
    
  •    {
    
  •    sprintf(log_buffer, "%d: get_proc_stat", pid);
    
  •    log_err(errno, **func**, log_buffer);
    
  •    }
    
  •    continue;
    
  •  }
    
  • pid = getsid(ps->ppid);
  • if (pid <= 1)
  •  break;
    
  • for (ptask = (task *)GET_NEXT(pjob->ji_tasks);
  •     ptask != NULL;
    
  •     ptask = (task *)GET_NEXT(ptask->ti_jobtask))
    
  •  {
    
  •  if (ptask->ti_qs.ti_sid <= 1)
    
  •    continue;
    
  •  if (ptask->ti_qs.ti_sid == pid)
    
  •    {
    
  •    return(TRUE);
    
  •    }
    
  •  }
    
  • }
    +#endif /* PENABLE_LINUX26_CPUSETS _/

return(FALSE);
} /_ END injob() */

</torque-4.2.5-injob.patch>

pbs_mom networking split into multiple IPs

Hi,
One compute node (node1 - Ubuntu precise) I have following IP setup:
eth0 (static): 10.2.68.7 and eth1(dhcp): 10.2.68.153. I downloaded torque.4.2.7.tar.gz from adaptivecomputing.com homepage, compiled and installed pbs_mom and torque-clients programs on this compute node.
Because node1 status is down, I started digging many hours trying to bring it up.
Here are some log:

  1. On node1:
sudo momctl -d3

Host: node1/node1.localdomain  Version: 4.2.7   PID: 735
Server[0]: masternode (10.2.69.63:15001)
  Last Msg From Server:   24756 seconds (CLUSTER_ADDRS)
  Last Msg To Server:     24786 seconds
HomeDirectory:          /var/spool/torque/mom_priv
stdout/stderr spool directory: '/var/spool/torque/spool/' (11028613blocks available)
NOTE:  syslog enabled
MOM active:             25081 seconds
Check Poll Time:        45 seconds
Server Update Interval: 45 seconds
LogLevel:               0 (use SIGUSR1/SIGUSR2 to adjust)
Communication Model:    TCP
MemLocked:              TRUE  (mlock)
TCP Timeout:            60 seconds
Prolog:                 /var/spool/torque/mom_priv/prologue (disabled)
Alarm Time:             0 of 10 seconds
Trusted Client List:  10.2.68.153:0,10.2.68.153:15003,10.2.69.63:0,10.2.69.63:15003,127.0.0.1:0:  0
Copy Command:           /usr/bin/scp -rpB
NOTE:  no local jobs detected

diagnostics complete
Apr 14 11:02:46 node1 pbs_mom: LOG_ERROR::read_tcp_reply, Mismatching protocols. Expected protocol 4 but read reply for 0
Apr 14 11:02:46 node1 pbs_mom: LOG_ERROR::read_tcp_reply, Could not read reply for protocol 4 command 4: End of File
Apr 14 11:02:46 node1 pbs_mom: LOG_ERROR::mom_server_update_stat, Couldn't read a reply from the server
Apr 14 11:02:46 node1 pbs_mom: LOG_ERROR::send_update_to_a_server, Could not contact any of the servers to send an update
Apr 14 11:02:46 node1 pbs_mom: LOG_ERROR::send_update_to_a_server, Status not successfully updated for 570 MOM status update intervals

So it seems pbs_mom is using 10.2.68.153 for its communication to the masternode. However on the master node I see:

  1. On the master node:
    Many lines like this:
    Apr 14 11:04:40 masternode PBS_Server: LOG_ERROR::svr_is_request, bad attempt to connect from 10.2.68.7:190 (address not trusted - check entry in server_priv/nodes)

I don't know what pbs_mom is using 10.2.68.7:190 for. How do I force pbs_mom to use only one of the two IPs?

install log.h

log.h is not installed as part of the include files, making pbs_log.h from Liblog useless (log.h needs to be moved in src/include/Makefile.am from noinst_HEADERS to include_HEADERS)

Spread job polling more iniformly

This is the update of the patch I had submitted back to 2012,
http://www.supercluster.org/pipermail/torquedev/2012-January/003963.html
Comparing to the initial one, it contains the fix from Lukasz Flis (thanks!) that enables the 'else' branch that spreads jobs when (running_count < JobStatRate). Everything else is unmodified since the original version.

It was tested at our Tier-2 site (1.5K job slots), at JINR Tier-2 site (2.5K job slots) and at Lukasz's resources: no issues were found (apart the one Lukasz helped to fix).

Current patch version: http://computing.kiae.ru/~rea/torque-2.5.10-spread-polls-uniformly.patch

Allow mail_from to also use a name

I notice that the name used for 'mail_from' seems to be 'root' no matter what the settings of the parameter. I suspect it probably pulls the name from the user pbs_server is running under.

By this I mean I can set mail_from to '[email protected]' and it will show as from 'root' (although the from address is indeed [email protected]).

I know torque is using a sendmail command line of the nature:
/usr/sbin/sendmail -f [email protected] job_owner@mail_domain.com

What I am looking for is a setting for the '-F' parameter from sendmail:
/usr/sbin/sendmail -F 'Cluster Administrator' -f [email protected] job_owner@mail_domain.com

Could this be added? Users get confused when their status emails come from 'root'

node_check_script with node_check_interval=jobstart only executed on first node

we have in our mom_priv/config

 $node_check_script /var/spool/torque/mom_priv/health-check.sh
 $node_check_interval 0,jobstart

I would expect the jobstart-nodecheck to be run at/before jobstart on EVERY node of the job. However, the health-chck script is only executed on the first node of a multi-node job, i.e. only on the node with the Mother Superior. This problems exists in both, torque-2.5.12 and torque-4.2.5/4.2.6.

4.1.6.h2 snprintf not enough arguments

In SHA: bef0163 src/server/node_manager.c line 659 is a snprintf statements with 2 format strings, but without additional arguments. This causes pbs_server to crash, when this log message should be printed.

autogen.sh gives confusing error message

If you do not have openssl-devel installed, then you get a very confusing error that is difficult to troubleshoot

configure.ac:50: error: possibly undefined macro: AC_MSG_ERROR
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.

Steps to reproduce

Cent 6.4 - 64bit

Run ./autogen.sh

./autogen.sh
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, buildutils'. libtoolize: copying filebuildutils/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIR, buildutils'. libtoolize: copying filebuildutils/libtool.m4'
libtoolize: copying file buildutils/ltoptions.m4' libtoolize: copying filebuildutils/ltsugar.m4'
libtoolize: copying file buildutils/ltversion.m4' libtoolize: copying filebuildutils/lt~obsolete.m4'
configure.ac:42: warning: macro `AM_PROG_AR' not found in library
configure.ac:50: error: possibly undefined macro: AC_MSG_ERROR
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.

[root@cent torque]# make
make: *** No targets specified and no makefile found. Stop.

Solution

yum install openssl-devel

Purposed Improvement

The error should include a statement that openssl-devel is required. This would make this much easier to troubleshoot.

pbsnodes not showing state per docs or listings in pbsnodes.c

Just moving this one over from bugzilla. Bug 120 from awhile ago...

Brian Andrus 2011-04-11 11:12:09 MDT
From the pbsnodes man page:

   -l             List  node  names  and  their  state.  If no state is

specified, only nodes in the DOWN, OFFLINE, or UNKNOWN states are
listed. Specifying a state string acts as an output
filter. Valid state strings are "free", "offline", "down",
"reserve", "job-exclusive", "job-sharing", "busy",
"time-shared", or "state-unknown".

But:

[root@hamming2 log]# pbsnodes -l reserve
pbsnodes: Unknown node MSG=cannot locate specified node
[root@hamming2 log]# pbsnodes -l state-unknown
pbsnodes: Unknown node MSG=cannot locate specified node
[root@hamming2 log]# pbsnodes -l job-exclusive
pbsnodes: Unknown node MSG=cannot locate specified node
[root@hamming2 log]# pbsnodes -l job-sharing
pbsnodes: Unknown node MSG=cannot locate specified node
[root@hamming2 log]# pbsnodes -l time-shared
pbsnodes: Unknown node MSG=cannot locate specified node

Seems to work for "free", "offline", "down", "busy"

I also see why, from pbsnodes.c:

const char *NState[] =
{
"NONE",
"active",
"all",
"busy",
"down",
"free",
"offline",
"unknown",
"up",
NULL
};

Maybe add the missing or update man page?

MUNGE Authentication for 4.2

Munge authentication in 4.2 still relies on popen/exec calls.

I would recommend switching to the new libmunge based implementation like in 2.5-dev:

#68

Is anyone interested in this functionality?

ghost jobs in torque 2.5.x

Dear All,

We have observed strange behaviour when using jobs with stagein options.

It happens when:

  1. moab wants to start job on host X
  2. moab issues AsyncJobStart request to pbs_server with X as exec_host
  3. pbs_server sets exec_host for a job and tries to start it
  4. connection to mom on port 150002 fails (due to connection refused error)
  5. jobs stay in Q state but have exec_host set
  6. pbsnodes shows that job has been allocated cpu on the host X but it is not running
  7. moab considers the job as not running and tries to start it again

As a result it may happen that pbsnodes -a X indicates two cpus allocated for the job and
job is really running on host Y.

Well known symptom you all may have seen looks like that:

06/11/2013 00:00:13;0080;PBS_Server;Req;req_reject;Reject reply code=15046(Resource temporarily unavailable REJHOST=n1074-amd.zeus MSG=cannot allocate node 'n1074-amd.zeus' to job - node not currently available (nps needed/free: 1/0, gpus needed/free: 0/0, joblist: 32022181.batch... type=AsyncRunJob, from root@batch

As the result we see many jobslots which are blocked by such kind of a ghost jobs.

Log extract for the first failure
06/11/2013 00:13:59;0004;PBS_Server;Svr;svr_connect;attempting connect to host 10.10.4.83 port 15002
06/11/2013 00:13:59;0100;PBS_Server;Req;;Type AsyncRunJob request received from root@batch, sock=13
06/11/2013 00:13:59;0040;PBS_Server;Req;set_nodes;allocating nodes for job 32026672.batch.grid.cyf-kr.edu.pl with node expression 'n1103-amd.zeus'
06/11/2013 00:13:59;0040;PBS_Server;Req;set_nodes;job 32026672.batch allocated 1 nodes (nodelist=n1103-amd.zeus/52)
06/11/2013 00:13:59;0008;PBS_Server;Job;32026672.batch;Job Run at request of root@batch
06/11/2013 00:13:59;0004;PBS_Server;Svr;svr_connect;attempting connect to host 10.10.4.83 port 15002
06/11/2013 00:13:59;0004;PBS_Server;Svr;svr_connect;cannot connect to host port 15002 - cannot establish connection () - time=0 seconds
06/11/2013 00:13:59;0001;PBS_Server;Req;;Server could not connect to MOM

I am trying to diagnose it but any help is very appreciated

Best Regards

Lukasz Flis

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.