Giter VIP home page Giter VIP logo

pycondor's Introduction

PyCondor

Build Status codecov pypi version PyPI - Python Version license

PyCondor (Python HTCondor) is a tool to help build and submit workflows to HTCondor in a straight-forward manner with minimal hassle.

Documentation

The documentation for PyCondor can be found at https://pycondor.github.io/pycondor.

Installation

PyCondor can be installed using pip

pip install pycondor

or conda

conda install -c conda-forge pycondor

For more information see the installation instructions in the documentation.

Contributing

To contribute to PyCondor, please see the contributing guide in the documentation.

License

MIT License

Copyright (c) 2018 James Bourbeau

pycondor's People

Contributors

duncanmmacleod avatar gregoryashton avatar jrbourbeau avatar khornlund avatar mattmeehan avatar mmatera avatar zdgriffith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pycondor's Issues

Make object methods return self

It would be nice if the Job and Dagman object methods that currently return nothing returned their own instances instead. This would allow for the following syntax

job = job.add_parent(parentjob)

while still allowing for the currently accepted syntax

job.add_parent(parentjob)

Run tests on Windows

I'd like to start running our tests on Windows (via adding appveyor CI). It looks like there are currently some issues with running pycondor on Windows (ref: #105, #106).

Add additional functionality to get_queue

I would like there to be additional features added to get_queue in base.py. For example, something like get_queue(user='username') returning the output from condor_q -submitter username would be a nice additional feature

Add subdag functionality

Currently only Job objects can be added to a Dagman, but it would be great if other dags could be added to a dag

Command line job submission

Description

It would be nice to quickly submit an executable from the command line. Something like

$ pycondor_submit --request_memory 3GB --executable my_script.py

This could be useful for testing / debugging

Add unique ID functionality for job error, out, and log files

It was brought to my attention (thanks @mattmeehan) that some users would like to have individual error, out, and log files for each node submitted by a pycondor Job and/or Dagman. Currently if you have a job that has multiple arguments, all the corresponding nodes that get submitted will write to a single error file, a single out file, and a single log file. While I think this is consolidation of files is desireable, I can imagine situations in which you might want each node to have their own files.

Windows require escaped backslash in submit files

Description

Windows requires backslashes in paths to be escaped, in order for them to be read correctly by HTCondor.

For example the contents of job.submit need to be:

log = condor\\log\\$(job_name).log
output = condor\\output\\$(job_name).output
error = condor\\error\\$(job_name).error

Due to use of os.path.join in Job and Dagman, the paths are printed to job.submit without being escaped.

This causes all of the error and output files to be transferred back to the submit machine at the root directory.

Suggestion:

It would be useful to have an optional param when calling build()/build_submit() which will escape all backslashes printed to submit/dag files.

Thoughts?

Add code coverage

Description

I'd like to add code coverage (coverage.py) to the tests

Add click to the test requirements file.

Description

Click is required to run the tests, but not in the requirements file. It should be added.

Actual results

ImportError while importing test module '/home/zgriffith/pycondor/pycondor/tests/test_cli.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
pycondor/tests/test_cli.py:5: in <module>
    from click.testing import CliRunner
E   ModuleNotFoundError: No module named 'click'

pycondor.Job: condor_submit command built/submitted incorrectly

Description

Bug in pycondor.Job submit_job method

Steps/code to reproduce issue

See below for an example trying to submit a basic job.

from pycondor import Job, Dagman
import os

os.environ['PYCONDOR_SUBMIT_DIR'] = os.path.abspath('condor\\submit')
os.environ['PYCONDOR_ERROR_DIR']  = os.path.abspath('condor\\error')
os.environ['PYCONDOR_LOG_DIR']    = os.path.abspath('condor\\log')
os.environ['PYCONDOR_OUTPUT_DIR'] = os.path.abspath('condor\\output')

exe = os.path.abspath('DummyExe.exe')

job = Job(
  name = 'job01',
  executable = exe,
)

job.add_arg(arg = 'arg01 arg02')

print('building...')
job.build()

print('submitting...')
job.submit_job()

Expected results

building... submitting... Submitting job(s). 1 job(s) submitted to cluster 24.

Actual results

building... submitting... The filename, directory name, or volume label syntax is incorrect.

dagman Rescue file not using unique job names

Description

The rescue file created by the unique name from add_arg and instead uses the generic name such as JOBNAME_arg_###.

Steps/code to reproduce issue

import pycondor

Declare the error, output, log, and submit directories for Condor Job

error = 'condor/error'
output = 'condor/output'
log = 'condor/log'
submit = 'condor/submit'

Setting up a PyCondor Job

job = pycondor.Job('examplejob', 'savelist.py',
error=error, output=output,
log=log, submit=submit, verbose=2)

Adding arguments to job

job.add_arg('--length 50', name = '50')
job.add_arg('--length 100', name ='100')
job.add_arg('--length 200', name = '200')

Setting up a PyCondor Dagman

dagman = pycondor.Dagman('exampledagman', submit=submit, verbose=2)

Add job to dagman

dagman.add_job(job)

Write all necessary submit files and submit job to Condor

dagman.build_submit()

Where example script will fail so as to create a rescue file.

Expected results

Names in the rescue file to be 'examplejob_50', 'examplejob_100', 'examplejob_200'.

Actual results

Names in the rescue file are 'examplejob_arg_00', 'examplejob_arg_01', 'examplejob_arg_02''.

Add issue and pull request templates

I'd like to add a .github/ directory that has a PULL_REQUEST_TEMPLATE.md and ISSUE_TEMPLATE.md file to make formatting new issues and pull requests a little easier.

Failing tests on Windows due to hard coded '/' separator

Description

pycondor/tests/test_dagman.py test_job_arg_name_files fails on Windows:

image

Steps/code to reproduce issue

Run on Windows:
pytest pycondor

Expected results

============== 76 passed, 1 skipped, 57 warnings in 0.69 seconds ==============

Actual results

========= 2 failed, 74 passed, 1 skipped, 57 warnings in 0.82 seconds =========

Add job monitoring utility

Description

I'd like to have some kind of dagman monitoring functionality โ€” a progress bar would be a good start!

Job and Dagman as context managers

Description

It would be nice (and I think relatively easy to implement) allowing Job and Dagman objects to be context managers. It would be natural for their __exit__ method to call build_submit().

For example,

import pycondor

with pycondor.Dagman('dag_name') as dagman:
    job = pycondor.Job('job_name', 'my_script.py', dag=dagman)

Dagman build raises error depending on Job order

Description

In #53, _iter_job_args was added so that the JOB nodes in a Dagman submit file would use argument names (if provided). However, I've run into a bug that occurs when parent child Jobs are added in the child then parent order. That is

dagman.add_job(parent_job)
dagman.add_job(child_job)

works fine, but

dagman.add_job(child_job)
dagman.add_job(parent_job)

raises a the ValueError

ValueError: Job parentjob must be built before adding it to a Dagman

This should be fixed and a test added that checks that the order a Job is added doesn't affect the corresponding Dagman submit file that is built.

GPU Allocation

Description

pycondor currently cannot handle gpu allocation.
Trying it with the 'extra_lines' option does not work either.

Steps/code to reproduce issue

submit a job with 'extra_lines=['request_gpus=1']'
gpu will not be allocated

Add request_cpu attribute to Job object

Right now, request_cpu can be added to a job submit file via the extra_lines attribute. However, because it's a commonly used feature, it would be nice to add it as it's own attribute that can be declared during Job declaration.

Add dag parameter to Job class

Description

Currently, a Job must be explicitly added to a Dagman like

from pycondor import Job, Dagman

dag = Dagman('example_dag')
job = Job('example_job', 'script.py')
dag.add_job(job)

However, a dag parameter could be added to the Job initialization to specify if you would like this Job to be added to a Dagman (similar to operators in Airflow). For example,

from pycondor import Job, Dagman

dag = Dagman('example_dag')
job = Job('example_job', 'script.py', dag=dag)

This dag option makes adding a Job to a Dag a little cleaner in my opinion.

Add environment variables for submit, log, out, and error directories

Description

It would be nice if pycondor supported setting environment variables for Job and Dagman submit, log, out, and error directories. So something like

>>> import pycondor
>>> job = pycondor.Job('myjob', '/path/to/executable.py`)
>>> job.build()

would automatically check for environment variables like

PYCONDOR_SUBMIT_DIR=/path/to/submit
PYCONDOR_LOG_DIR=/path/to/log
PYCONDOR_OUT_DIR=/path/to/out
PYCONDOR_ERROR_DIR=/path/to/error

and set the directories in the built submit file appropriately. E.g.

# submit file at /path/to/submit/myjob.submit
output = /path/to/out/myjob.out   
error = /path/to/error/myjob.error  
log = /path/to/log/myjob.log                                           

Update Job defaults to None

Description

I'd like to change the default values of the getenv, universe, and notification parameters for Job objects to be None.

This way, if not specified by the user, then the default value for these parameters on the submitting machine is used.

For instance, currently if getenv=False by default on the submitting machine, then

from pycondor import Job
job = Job(name='sleep', executable='/usr/bin/sleep')
job.build_submit()

will submit a job with getenv=True. I'd like to move more towards making non-default behavior an opt-in procedure here (i.e. the user should specify explicitly if they want to use non-default settings).

Will need to add a FutureWarning and eventually change the default values in a future release.

Job fancyname not working when submit not given

Description

When building a Job submission file with fancyname=True and not specifying a submit directory, the resulting built submit file overwrites existing submit files instead of adding a unique id.

Steps/code to reproduce issue

Only produces one submit file, examplejob_20180214_01.submit:

from pycondor import Job

job_1 = Job('examplejob', 'savelist.py')
job_1.build(fancyname=True)

job_2 = Job('examplejob', 'savelist.py')
job_2.build(fancyname=True)

While if submit is given explicitly, then two submit files are produced (as expected):

import os
from pycondor import Job

job_1 = Job('examplejob', 'savelist.py', submit=os.getcwd())
job_1.build(fancyname=True)

job_2 = Job('examplejob', 'savelist.py', submit=os.getcwd())
job_2.build(fancyname=True)

Expected results

Should have two submit files, examplejob_20180214_01.submit and examplejob_20180214_02.submit.

Actual results

Only have one submit file, examplejob_20180214_01.submit.

Logging outdated

Logging makes reference to old dagmanager project. This must have not been updated when switching dagmanager over to pycondor.

Number for error/log/output files starts at 02

Description

Naming scheme for job output files inconsistent with submit files.

Steps/code to reproduce issue

import pycondor

error = "/home/error"
output = '/home/output'
log = '/home/log'
submit = '/home/submit'

job = pycondor.Job('test', '/home/test.py', error=error, output=output, log=log, submit=submit, verbose=2)

dagman = pycondor.Dagman('tauMaster_dagman', submit=submit)
dagman.add_job(job)

for i in range(0,10):
job.add_arg(' -i ' + i, name = str(i))

dagman.build_submit()

Expected results

Files in the output/log/error folders to be named test_DATE_01_NAME to match the submit files in the submit folders

Actual results

Files in the output/log/error folders are named test_DATE_02_NAME

Bytes error in printed output of `submit_dag`

Description

When submitting a dag via pycondor, I get the following garbled output

b'Submitting job(s).\n1 job(s) submitted to cluster 49593987.\n\n-----------------------------------------------------------------------\nFile for submitting this DAG to HTCondor           : GW150914/submit/main_GW150914_20181017_03.submit.condor.sub\nLog of DAGMan debugging messages                 : GW150914/submit/main_GW150914_20181017_03.submit.dagman.out\nLog of HTCondor library output                     : GW150914/submit/main_GW150914_20181017_03.submit.lib.out\nLog of HTCondor library error messages             : GW150914/submit/main_GW150914_20181017_03.submit.lib.err\nLog of the life of condor_dagman itself          : GW150914/submit/main_GW150914_20181017_03.submit.dagman.log\n\n-----------------------------------------------------------------------\n'

This is due to the print(out) statement where out is the first positional output from submit_dag.communicate().

This can be readily solved by something like

try:                                                                        
    out = out.decode('utf-8')                       
except AttributeError:                                                      
    pass

ensuring that if it is a bytes array, it is converted before calling print(out).

Extra info

Issue found on
Python version: 3.6 (anaconda-installed)
CondorVersion: 8.6.12 Jul 31 2018 BuildID: 446077
CondorPlatform: x86_64_RedHat7

Also, thanks for building this awesome tool. It is making my life a hell of a lot easier!

dagman_progress is rounding the % done up

Description

The percentage of done jobs that is outputted fromdagman_progress is rounded up. For example, I've run into the following:

[############################# ] 100% Done | 1039 done, 1 queued, 0 ready, 1 unready, 0 failed | 91.6m

dagman_progress shouldn't print 100% Done when there are still jobs running. It should instead round the percentage done down to 99% Done.

Queue parameter in pycondor.Job not added to submit file when built by dagman

When a pycondor.Job that includes the queue parameter is built by a dagman object, the resulting submit file for the job does not contain the input queue parameter.

We do get the desired behavior in this case, when just building the job:

job = pycondor.Job('examplejob', 'examplejob.py',
                          error=error, output=output,
                          log=log, submit=submit, verbose=2,
                          queue=5)
job.build()

However, when the job is built by a dagman object, we do not get the desired behavior:

job = pycondor.Job('examplejob', 'examplejob.py',
                          error=error, output=output,
                          log=log, submit=submit, verbose=2,
                          queue=5)
dagman = pycondor.Dagman('exampledagman', submit=submit, verbose=2)
dagman.add_job(job)
dagman.build()

It seems to be an issue with the dagman implementation.

Bytes error in `get_condor_version()`

Description

When building a dag with the submit=True flag, I get the following error

Traceback (most recent call last):
  File "/home/gregory.ashton/anaconda3/bin/bilby_pipe", line 11, in <module>
    load_entry_point('bilby-pipe', 'console_scripts', 'bilby_pipe')()
  File "/home/gregory.ashton/bilby_pipe/bilby_pipe/main.py", line 370, in main
    dag = Dag(inputs)
  File "/home/gregory.ashton/bilby_pipe/bilby_pipe/main.py", line 215, in __init__
    self.build_submit()
  File "/home/gregory.ashton/bilby_pipe/bilby_pipe/main.py", line 319, in build_submit
    self.dag.build_submit()
  File "/home/gregory.ashton/pycondor/pycondor/utils.py", line 133, in wrapper
    return func(*args, **kwargs)
  File "/home/gregory.ashton/pycondor/pycondor/dagman.py", line 412, in build_submit
    self.submit_dag(submit_options=submit_options)
  File "/home/gregory.ashton/pycondor/pycondor/utils.py", line 133, in wrapper
    return func(*args, **kwargs)
  File "/home/gregory.ashton/pycondor/pycondor/dagman.py", line 368, in submit_dag
    condor_version = get_condor_version()
  File "/home/gregory.ashton/pycondor/pycondor/utils.py", line 161, in get_condor_version
    condor_info_str).group(1)
  File "/home/gregory.ashton/anaconda3/lib/python3.6/re.py", line 182, in search
    return _compile(pattern, flags).search(string)
TypeError: cannot use a string pattern on a bytes-like object

This is caused by subprocess.Popen returning a bytes array.

Suggested solution

Adding the following before the re.search resolves the issue. I haven't added a PR as it is pretty minor. I think this should be safe across python 2 as well.

try:                                                                        
    condor_info_str = condor_info_str.decode('utf-8')                       
except AttributeError:                                                      
    pass

Extra info

Issue found on
Python version: 3.6 (anaconda-installed)
CondorVersion: 8.6.12 Jul 31 2018 BuildID: 446077
CondorPlatform: x86_64_RedHat7

Replace submit kwargs with submit_options parameter

Description

I'd like to replace the kwargs in submit_job and submit_dag with a submit_options parameter. This allows users to directly type whatever options they want passed to condor_submit or condor_submit_dag.

Current syntax:

kwargs = {'-maxjobs': 1000, '-batch-name': 'mybatch'}
job.submit_job(**kwargs)

New syntax:

job.submit_job(submit_options='-maxjobs 1000 -batch-name mybatch')

Also, this allows for non key-value pair options (e.g. -interactive, -debug, -verbose, etc.)

job.submit_job(submit_options='-interactive')

which currently aren't supported in pycondor.

Job argument names with automatic variables

Description

Currently Job argument names can be any string. However, when they include automatic variables used by condor (e.g. '$(Process)', '$(Cluster)', etc), unexpected behavior can occur.

Steps/code to reproduce issue

import pycondor

error = 'condor/error'
output = 'condor/output'
log = 'condor/log'
submit = 'condor/submit'

dag = pycondor.Dagman('exampledag', submit=submit)
job = pycondor.Job('examplejob',
                   executable='/usr/bin/echo',
                   error=error,
                   output=output,
                   log=log,
                   submit=submit,
                   queue=3,
                   dag=dag,
                   )
job.add_arg('hello world!', name='$(Cluster)_$(Process)')

dag.build_submit()

Expected results

In the above example, I could expect to have "echo hello world!" run three times, resulting in three output files with "hello world!" in each file.

Actual results

Many jobs are submitted to condor and run, each producing it's own output file containing "hello world!".

Looking at the dagman output file (details below) it appears jobs are submitted to condor, they fail, then condor enters "recovery mode" where the jobs are submitted and run successfully, but then this process repeats itself. The result is jobs keep getting submitted to condor.

Dagman dagman.out file details
11/21/18 17:15:24 Setting maximum file descriptors to 80000.
11/21/18 17:15:24 ******************************************************
11/21/18 17:15:24 ** condor_scheduniv_exec.9964452.0 (CONDOR_DAGMAN) STARTING UP
11/21/18 17:15:24 ** /usr/bin/condor_dagman
11/21/18 17:15:24 ** SubsystemInfo: name=DAGMAN type=DAGMAN(10) class=DAEMON(1)
11/21/18 17:15:24 ** Configuration: subsystem:DAGMAN local:<NONE> class:DAEMON
11/21/18 17:15:24 ** $CondorVersion: 8.7.7 Mar 13 2018 BuildID: 435313 $
11/21/18 17:15:24 ** $CondorPlatform: x86_64_RedHat7 $
11/21/18 17:15:24 ** PID = 81396
11/21/18 17:15:24 ** Log last touched time unavailable (No such file or directory)
11/21/18 17:15:24 ******************************************************
11/21/18 17:15:24 Using config source: /etc/condor/condor_config
11/21/18 17:15:24 Using local config sources: 
11/21/18 17:15:24    /etc/condor/config.d/00_logging
11/21/18 17:15:24    /etc/condor/config.d/00_security
11/21/18 17:15:24    /etc/condor/config.d/20_ganglia
11/21/18 17:15:24    /etc/condor/config.d/20_master_daemons
11/21/18 17:15:24    /etc/condor/config.d/20_schedd
11/21/18 17:15:24    /etc/condor/config.d/20_schedd_limit_num_jobs
11/21/18 17:15:24    /etc/condor/config.d/20_schedd_stats
11/21/18 17:15:24    /etc/condor/config.d/20_schedd_stats_rss
11/21/18 17:15:24    /etc/condor/config.d/20_shared_port
11/21/18 17:15:24    /etc/condor/config.d/20_submit
11/21/18 17:15:24    /etc/condor/config.d/21_submit_singularity
11/21/18 17:15:24    /etc/condor/config.d/60_npx_submit
11/21/18 17:15:24    /etc/condor/config.d/60_npx_submit_sane_reqs
11/21/18 17:15:24    /etc/condor/config.d/99_block_submissions_group_long
11/21/18 17:15:24    /etc/condor/condor_config.local
11/21/18 17:15:24 config Macros = 134, Sorted = 134, StringBytes = 5412, TablesBytes = 4984
11/21/18 17:15:24 CLASSAD_CACHING is ENABLED
11/21/18 17:15:24 Daemon Log is logging: D_ALWAYS D_ERROR
11/21/18 17:15:24 DaemonCore: No command port requested.
11/21/18 17:15:24 DAGMAN_USE_STRICT setting: 1
11/21/18 17:15:24 DAGMAN_VERBOSITY setting: 3
11/21/18 17:15:24 DAGMAN_DEBUG_CACHE_SIZE setting: 5242880
11/21/18 17:15:24 DAGMAN_DEBUG_CACHE_ENABLE setting: False
11/21/18 17:15:24 DAGMAN_SUBMIT_DELAY setting: 1
11/21/18 17:15:24 DAGMAN_MAX_SUBMIT_ATTEMPTS setting: 6
11/21/18 17:15:24 DAGMAN_STARTUP_CYCLE_DETECT setting: False
11/21/18 17:15:24 DAGMAN_MAX_SUBMITS_PER_INTERVAL setting: 5
11/21/18 17:15:24 DAGMAN_AGGRESSIVE_SUBMIT setting: False
11/21/18 17:15:24 DAGMAN_USER_LOG_SCAN_INTERVAL setting: 5
11/21/18 17:15:24 DAGMAN_QUEUE_UPDATE_INTERVAL setting: 300
11/21/18 17:15:24 DAGMAN_DEFAULT_PRIORITY setting: 0
11/21/18 17:15:24 DAGMAN_SUPPRESS_NOTIFICATION setting: True
11/21/18 17:15:24 allow_events (DAGMAN_ALLOW_EVENTS) setting: 114
11/21/18 17:15:24 DAGMAN_RETRY_SUBMIT_FIRST setting: True
11/21/18 17:15:24 DAGMAN_RETRY_NODE_FIRST setting: False
11/21/18 17:15:24 DAGMAN_MAX_JOBS_IDLE setting: 1000
11/21/18 17:15:24 DAGMAN_MAX_JOBS_SUBMITTED setting: 0
11/21/18 17:15:24 DAGMAN_MAX_PRE_SCRIPTS setting: 20
11/21/18 17:15:24 DAGMAN_MAX_POST_SCRIPTS setting: 20
11/21/18 17:15:24 DAGMAN_MUNGE_NODE_NAMES setting: True
11/21/18 17:15:24 DAGMAN_PROHIBIT_MULTI_JOBS setting: False
11/21/18 17:15:24 DAGMAN_SUBMIT_DEPTH_FIRST setting: False
11/21/18 17:15:24 DAGMAN_ALWAYS_RUN_POST setting: False
11/21/18 17:15:24 DAGMAN_ABORT_DUPLICATES setting: True
11/21/18 17:15:24 DAGMAN_ABORT_ON_SCARY_SUBMIT setting: True
11/21/18 17:15:24 DAGMAN_PENDING_REPORT_INTERVAL setting: 600
11/21/18 17:15:24 DAGMAN_AUTO_RESCUE setting: True
11/21/18 17:15:24 DAGMAN_MAX_RESCUE_NUM setting: 100
11/21/18 17:15:24 DAGMAN_WRITE_PARTIAL_RESCUE setting: True
11/21/18 17:15:24 DAGMAN_DEFAULT_NODE_LOG setting: @(DAG_DIR)/@(DAG_FILE).nodes.log
11/21/18 17:15:24 DAGMAN_GENERATE_SUBDAG_SUBMITS setting: True
11/21/18 17:15:24 DAGMAN_MAX_JOB_HOLDS setting: 100
11/21/18 17:15:24 DAGMAN_HOLD_CLAIM_TIME setting: 20
11/21/18 17:15:24 ALL_DEBUG setting: 
11/21/18 17:15:24 DAGMAN_DEBUG setting: 
11/21/18 17:15:24 DAGMAN_SUPPRESS_JOB_LOGS setting: False
11/21/18 17:15:24 DAGMAN_REMOVE_NODE_JOBS setting: True
11/21/18 17:15:24 argv[0] == "condor_scheduniv_exec.9964452.0"
11/21/18 17:15:24 argv[1] == "-Lockfile"
11/21/18 17:15:24 argv[2] == "condor/submit/exampledag.submit.lock"
11/21/18 17:15:24 argv[3] == "-AutoRescue"
11/21/18 17:15:24 argv[4] == "1"
11/21/18 17:15:24 argv[5] == "-DoRescueFrom"
11/21/18 17:15:24 argv[6] == "0"
11/21/18 17:15:24 argv[7] == "-Dag"
11/21/18 17:15:24 argv[8] == "condor/submit/exampledag.submit"
11/21/18 17:15:24 argv[9] == "-Suppress_notification"
11/21/18 17:15:24 argv[10] == "-CsdVersion"
11/21/18 17:15:24 argv[11] == "$CondorVersion: 8.7.7 Mar 13 2018 BuildID: 435313 $"
11/21/18 17:15:24 argv[12] == "-Dagman"
11/21/18 17:15:24 argv[13] == "/usr/bin/condor_dagman"
11/21/18 17:15:24 Workflow batch-name: <exampledag.submit+9964452>
11/21/18 17:15:24 Workflow accounting_group: <>
11/21/18 17:15:24 Workflow accounting_group_user: <>
11/21/18 17:15:24 Warning: failed to get attribute DAGNodeName
11/21/18 17:15:24 DAGMAN_LOG_ON_NFS_IS_ERROR setting: False
11/21/18 17:15:24 Default node log file is: </home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log>
11/21/18 17:15:24 DAG Lockfile will be written to condor/submit/exampledag.submit.lock
11/21/18 17:15:24 DAG Input file is condor/submit/exampledag.submit
11/21/18 17:15:24 Parsing 1 dagfiles
11/21/18 17:15:24 Parsing condor/submit/exampledag.submit ...
11/21/18 17:15:24 Dag contains 1 total jobs
11/21/18 17:15:24 Sleeping for 3 seconds to ensure ProcessId uniqueness
11/21/18 17:15:27 Bootstrapping...
11/21/18 17:15:27 Number of pre-completed nodes: 0
11/21/18 17:15:27 MultiLogFiles: truncating log file /home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log
11/21/18 17:15:27 DAG status: 0 (DAG_STATUS_OK)
11/21/18 17:15:27 Of 1 nodes total:
11/21/18 17:15:27  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/21/18 17:15:27   ===     ===      ===     ===     ===        ===      ===
11/21/18 17:15:27     0       0        0       0       1          0        0
11/21/18 17:15:27 0 job proc(s) currently held
11/21/18 17:15:27 DAGMan Runtime Statistics: [ EventCycleTimeSum = 0.0; EventCycleTimeCount = 0.0; SleepCycleTimeCount = 0.0; SubmitCycleTimeSum = 0.0; SubmitCycleTimeCount = 0.0; LogProcessCycleTimeSum = 0.0; SleepCycleTimeSum = 0.0; LogProcessCycleTimeCount = 0.0; ]
11/21/18 17:15:27 Registering condor_event_timer...
11/21/18 17:15:28 Sleeping for 1 s (DAGMAN_SUBMIT_DELAY) to throttle submissions...
11/21/18 17:15:29 Submitting HTCondor Node examplejob_$(Cluster)_$(Process) job(s)...
11/21/18 17:15:29 Adding a DAGMan workflow log /home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log
11/21/18 17:15:29 Masking the events recorded in the DAGMAN workflow log
11/21/18 17:15:29 Mask for workflow log is 0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36
11/21/18 17:15:29 submitting: /usr/bin/condor_submit -a dag_node_name' '=' 'examplejob_$(Cluster)_$(Process) -a +DAGManJobId' '=' '9964452 -a DAGManJobId' '=' '9964452 -batch-name exampledag.submit+9964452 -a submit_event_notes' '=' 'DAG' 'Node:' 'examplejob_$(Cluster)_$(Process) -a dagman_log' '=' '/home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log -a +DAGManNodesMask' '=' '"0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36" -a ARGS' '=' 'hello' 'world! -a job_name' '=' 'examplejob_$(Cluster)_$(Process) -a DAG_STATUS' '=' '0 -a FAILED_COUNT' '=' '0 -a notification' '=' 'never -a +DAGParentNodeNames' '=' '"" condor/submit/examplejob.submit
11/21/18 17:15:29 From submit: Submitting job(s)...
11/21/18 17:15:29 From submit: 3 job(s) submitted to cluster 9964472.
11/21/18 17:15:29 	assigned HTCondor ID (9964472.0.0)
11/21/18 17:15:29 Just submitted 1 job this cycle...
11/21/18 17:15:29 Currently monitoring 1 HTCondor log file(s)
11/21/18 17:15:29 ERROR: job examplejob_9964472_0 not found!
11/21/18 17:15:29 Event: ULOG_SUBMIT for unknown Node (9964472.0.0) {11/21/18 17:15:29}: ignoring...
11/21/18 17:15:29 ERROR: job examplejob_9964472_1 not found!
11/21/18 17:15:29 Event: ULOG_SUBMIT for unknown Node (9964472.1.0) {11/21/18 17:15:29}: ignoring...
11/21/18 17:15:29 ERROR: job examplejob_9964472_2 not found!
11/21/18 17:15:29 Event: ULOG_SUBMIT for unknown Node (9964472.2.0) {11/21/18 17:15:29}: ignoring...
11/21/18 17:15:29 DAG status: 0 (DAG_STATUS_OK)
11/21/18 17:15:29 Of 1 nodes total:
11/21/18 17:15:29  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/21/18 17:15:29   ===     ===      ===     ===     ===        ===      ===
11/21/18 17:15:29     0       0        1       0       0          0        0
11/21/18 17:15:29 0 job proc(s) currently held
11/21/18 17:15:29 DAGMan Runtime Statistics: [ EventCycleTimeCount = 0.0; SleepCycleTimeSum = 0.0; EventCycleTimeSum = 0.0; LogProcessCycleTimeCount = 1.0; LogProcessCycleTimeSum = 0.0001499652862548828; LogProcessCycleTimeMin = 0.0001499652862548828; LogProcessCycleTimeMax = 0.0001499652862548828; LogProcessCycleTimeAvg = 0.0001499652862548828; LogProcessCycleTimeStd = 0.0001499652862548828; SubmitCycleTimeMax = 1.090536117553711; SubmitCycleTimeCount = 1.0; SleepCycleTimeCount = 0.0; SubmitCycleTimeSum = 1.090536117553711; SubmitCycleTimeStd = 1.090536117553711; SubmitCycleTimeAvg = 1.090536117553711; SubmitCycleTimeMin = 1.090536117553711; ]
11/21/18 17:21:24 Currently monitoring 1 HTCondor log file(s)
11/21/18 17:21:24 Event: ULOG_EXECUTE for HTCondor Node examplejob_$(Cluster)_$(Process) (9964472.0.0) {11/21/18 17:21:23}
11/21/18 17:21:24 BAD EVENT: job (9964472.0.0) executing, submit count < 1 (0)
11/21/18 17:21:24 BAD EVENT is warning only
11/21/18 17:21:24 Number of idle job procs: 0
11/21/18 17:21:24 Event: ULOG_EXECUTE for HTCondor Node examplejob_$(Cluster)_$(Process) (9964472.1.0) {11/21/18 17:21:24}
11/21/18 17:21:24 BAD EVENT: job (9964472.1.0) executing, submit count < 1 (0)
11/21/18 17:21:24 BAD EVENT is warning only
11/21/18 17:21:24 Number of idle job procs: 0
11/21/18 17:21:24 Event: ULOG_JOB_TERMINATED for HTCondor Node examplejob_$(Cluster)_$(Process) (9964472.0.0) {11/21/18 17:21:24}
11/21/18 17:21:24 BAD EVENT: job (9964472.0.0) ended, submit count < 1 (0)
11/21/18 17:21:24 BAD EVENT is warning only
11/21/18 17:21:24 Number of idle job procs: 0
11/21/18 17:21:24 ERROR "Assertion ERROR on (node->_queuedNodeJobProcs >= 0)" at line 4261 in file /slots/10/dir_2238096/userdir/.tmp4gngbv/BUILD/condor-8.7.7/src/condor_dagman/dag.cpp
11/21/18 17:21:29 Setting maximum file descriptors to 80000.
11/21/18 17:21:29 ******************************************************
11/21/18 17:21:29 ** condor_scheduniv_exec.9964452.0 (CONDOR_DAGMAN) STARTING UP
11/21/18 17:21:29 ** /usr/bin/condor_dagman
11/21/18 17:21:29 ** SubsystemInfo: name=DAGMAN type=DAGMAN(10) class=DAEMON(1)
11/21/18 17:21:29 ** Configuration: subsystem:DAGMAN local:<NONE> class:DAEMON
11/21/18 17:21:29 ** $CondorVersion: 8.7.7 Mar 13 2018 BuildID: 435313 $
11/21/18 17:21:29 ** $CondorPlatform: x86_64_RedHat7 $
11/21/18 17:21:29 ** PID = 92774
11/21/18 17:21:29 ** Log last touched 11/21 17:21:24
11/21/18 17:21:29 ******************************************************
11/21/18 17:21:29 Using config source: /etc/condor/condor_config
11/21/18 17:21:29 Using local config sources: 
11/21/18 17:21:29    /etc/condor/config.d/00_logging
11/21/18 17:21:29    /etc/condor/config.d/00_security
11/21/18 17:21:29    /etc/condor/config.d/20_ganglia
11/21/18 17:21:29    /etc/condor/config.d/20_master_daemons
11/21/18 17:21:29    /etc/condor/config.d/20_schedd
11/21/18 17:21:29    /etc/condor/config.d/20_schedd_limit_num_jobs
11/21/18 17:21:29    /etc/condor/config.d/20_schedd_stats
11/21/18 17:21:29    /etc/condor/config.d/20_schedd_stats_rss
11/21/18 17:21:29    /etc/condor/config.d/20_shared_port
11/21/18 17:21:29    /etc/condor/config.d/20_submit
11/21/18 17:21:29    /etc/condor/config.d/21_submit_singularity
11/21/18 17:21:29    /etc/condor/config.d/60_npx_submit
11/21/18 17:21:29    /etc/condor/config.d/60_npx_submit_sane_reqs
11/21/18 17:21:29    /etc/condor/config.d/99_block_submissions_group_long
11/21/18 17:21:29    /etc/condor/condor_config.local
11/21/18 17:21:29 config Macros = 134, Sorted = 134, StringBytes = 5412, TablesBytes = 4984
11/21/18 17:21:29 CLASSAD_CACHING is ENABLED
11/21/18 17:21:29 Daemon Log is logging: D_ALWAYS D_ERROR
11/21/18 17:21:29 DaemonCore: No command port requested.
11/21/18 17:21:29 DAGMAN_USE_STRICT setting: 1
11/21/18 17:21:29 DAGMAN_VERBOSITY setting: 3
11/21/18 17:21:29 DAGMAN_DEBUG_CACHE_SIZE setting: 5242880
11/21/18 17:21:29 DAGMAN_DEBUG_CACHE_ENABLE setting: False
11/21/18 17:21:29 DAGMAN_SUBMIT_DELAY setting: 1
11/21/18 17:21:29 DAGMAN_MAX_SUBMIT_ATTEMPTS setting: 6
11/21/18 17:21:29 DAGMAN_STARTUP_CYCLE_DETECT setting: False
11/21/18 17:21:29 DAGMAN_MAX_SUBMITS_PER_INTERVAL setting: 5
11/21/18 17:21:29 DAGMAN_AGGRESSIVE_SUBMIT setting: False
11/21/18 17:21:29 DAGMAN_USER_LOG_SCAN_INTERVAL setting: 5
11/21/18 17:21:29 DAGMAN_QUEUE_UPDATE_INTERVAL setting: 300
11/21/18 17:21:29 DAGMAN_DEFAULT_PRIORITY setting: 0
11/21/18 17:21:29 DAGMAN_SUPPRESS_NOTIFICATION setting: True
11/21/18 17:21:29 allow_events (DAGMAN_ALLOW_EVENTS) setting: 114
11/21/18 17:21:29 DAGMAN_RETRY_SUBMIT_FIRST setting: True
11/21/18 17:21:29 DAGMAN_RETRY_NODE_FIRST setting: False
11/21/18 17:21:29 DAGMAN_MAX_JOBS_IDLE setting: 1000
11/21/18 17:21:29 DAGMAN_MAX_JOBS_SUBMITTED setting: 0
11/21/18 17:21:29 DAGMAN_MAX_PRE_SCRIPTS setting: 20
11/21/18 17:21:29 DAGMAN_MAX_POST_SCRIPTS setting: 20
11/21/18 17:21:29 DAGMAN_MUNGE_NODE_NAMES setting: True
11/21/18 17:21:29 DAGMAN_PROHIBIT_MULTI_JOBS setting: False
11/21/18 17:21:29 DAGMAN_SUBMIT_DEPTH_FIRST setting: False
11/21/18 17:21:29 DAGMAN_ALWAYS_RUN_POST setting: False
11/21/18 17:21:29 DAGMAN_ABORT_DUPLICATES setting: True
11/21/18 17:21:29 DAGMAN_ABORT_ON_SCARY_SUBMIT setting: True
11/21/18 17:21:29 DAGMAN_PENDING_REPORT_INTERVAL setting: 600
11/21/18 17:21:29 DAGMAN_AUTO_RESCUE setting: True
11/21/18 17:21:29 DAGMAN_MAX_RESCUE_NUM setting: 100
11/21/18 17:21:29 DAGMAN_WRITE_PARTIAL_RESCUE setting: True
11/21/18 17:21:29 DAGMAN_DEFAULT_NODE_LOG setting: @(DAG_DIR)/@(DAG_FILE).nodes.log
11/21/18 17:21:29 DAGMAN_GENERATE_SUBDAG_SUBMITS setting: True
11/21/18 17:21:29 DAGMAN_MAX_JOB_HOLDS setting: 100
11/21/18 17:21:29 DAGMAN_HOLD_CLAIM_TIME setting: 20
11/21/18 17:21:29 ALL_DEBUG setting: 
11/21/18 17:21:29 DAGMAN_DEBUG setting: 
11/21/18 17:21:29 DAGMAN_SUPPRESS_JOB_LOGS setting: False
11/21/18 17:21:29 DAGMAN_REMOVE_NODE_JOBS setting: True
11/21/18 17:21:29 argv[0] == "condor_scheduniv_exec.9964452.0"
11/21/18 17:21:29 argv[1] == "-Lockfile"
11/21/18 17:21:29 argv[2] == "condor/submit/exampledag.submit.lock"
11/21/18 17:21:29 argv[3] == "-AutoRescue"
11/21/18 17:21:29 argv[4] == "1"
11/21/18 17:21:29 argv[5] == "-DoRescueFrom"
11/21/18 17:21:29 argv[6] == "0"
11/21/18 17:21:29 argv[7] == "-Dag"
11/21/18 17:21:29 argv[8] == "condor/submit/exampledag.submit"
11/21/18 17:21:29 argv[9] == "-Suppress_notification"
11/21/18 17:21:29 argv[10] == "-CsdVersion"
11/21/18 17:21:29 argv[11] == "$CondorVersion: 8.7.7 Mar 13 2018 BuildID: 435313 $"
11/21/18 17:21:29 argv[12] == "-Dagman"
11/21/18 17:21:29 argv[13] == "/usr/bin/condor_dagman"
11/21/18 17:21:29 Workflow batch-name: <exampledag.submit+9964452>
11/21/18 17:21:29 Workflow accounting_group: <>
11/21/18 17:21:29 Workflow accounting_group_user: <>
11/21/18 17:21:29 Warning: failed to get attribute DAGNodeName
11/21/18 17:21:29 DAGMAN_LOG_ON_NFS_IS_ERROR setting: False
11/21/18 17:21:29 Default node log file is: </home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log>
11/21/18 17:21:29 DAG Lockfile will be written to condor/submit/exampledag.submit.lock
11/21/18 17:21:29 DAG Input file is condor/submit/exampledag.submit
11/21/18 17:21:29 Parsing 1 dagfiles
11/21/18 17:21:29 Parsing condor/submit/exampledag.submit ...
11/21/18 17:21:29 Dag contains 1 total jobs
11/21/18 17:21:29 Lock file condor/submit/exampledag.submit.lock detected, 
11/21/18 17:21:29 Duplicate DAGMan PID 81396 is no longer alive; this DAGMan should continue.
11/21/18 17:21:29 Using default node job log file
11/21/18 17:21:29 Sleeping for 3 seconds to ensure ProcessId uniqueness
11/21/18 17:21:32 Bootstrapping...
11/21/18 17:21:32 Number of pre-completed nodes: 0
11/21/18 17:21:32 Running in RECOVERY mode... >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
11/21/18 17:21:32 Currently monitoring 1 HTCondor log file(s)
11/21/18 17:21:32 ERROR: job examplejob_9964472_0 not found!
11/21/18 17:21:32 Event: ULOG_SUBMIT for unknown Node (9964472.0.0) {11/21/18 17:15:29}: ignoring... [recovery mode]
11/21/18 17:21:32 ERROR: job examplejob_9964472_1 not found!
11/21/18 17:21:32 Event: ULOG_SUBMIT for unknown Node (9964472.1.0) {11/21/18 17:15:29}: ignoring... [recovery mode]
11/21/18 17:21:32 ERROR: job examplejob_9964472_2 not found!
11/21/18 17:21:32 Event: ULOG_SUBMIT for unknown Node (9964472.2.0) {11/21/18 17:15:29}: ignoring... [recovery mode]
11/21/18 17:21:32 ERROR: node for condor ID 9964472.0.0 not found! (might be because of node retries)
11/21/18 17:21:32 Event: ULOG_EXECUTE for unknown Node (9964472.0.0) {11/21/18 17:21:23}: ignoring... [recovery mode]
11/21/18 17:21:32 ERROR: node for condor ID 9964472.1.0 not found! (might be because of node retries)
11/21/18 17:21:32 Event: ULOG_EXECUTE for unknown Node (9964472.1.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:21:32 ERROR: node for condor ID 9964472.0.0 not found! (might be because of node retries)
11/21/18 17:21:32 Event: ULOG_JOB_TERMINATED for unknown Node (9964472.0.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:21:32 ERROR: node for condor ID 9964472.2.0 not found! (might be because of node retries)
11/21/18 17:21:32 Event: ULOG_EXECUTE for unknown Node (9964472.2.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:21:32 ERROR: node for condor ID 9964472.1.0 not found! (might be because of node retries)
11/21/18 17:21:32 Event: ULOG_JOB_TERMINATED for unknown Node (9964472.1.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:21:32 ERROR: node for condor ID 9964472.2.0 not found! (might be because of node retries)
11/21/18 17:21:32 Event: ULOG_JOB_TERMINATED for unknown Node (9964472.2.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:21:32     ------------------------------
11/21/18 17:21:32        HTCondor Recovery Complete
11/21/18 17:21:32     ------------------------------
11/21/18 17:21:32 ...done with RECOVERY mode <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
11/21/18 17:21:32 DAG status: 0 (DAG_STATUS_OK)
11/21/18 17:21:32 Of 1 nodes total:
11/21/18 17:21:32  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/21/18 17:21:32   ===     ===      ===     ===     ===        ===      ===
11/21/18 17:21:32     0       0        0       0       0          1        0
11/21/18 17:21:32 0 job proc(s) currently held
11/21/18 17:21:32 DAGMan Runtime Statistics: [ EventCycleTimeSum = 0.0; EventCycleTimeCount = 0.0; SleepCycleTimeCount = 0.0; SubmitCycleTimeSum = 0.0; SubmitCycleTimeCount = 0.0; LogProcessCycleTimeSum = 0.0; SleepCycleTimeSum = 0.0; LogProcessCycleTimeCount = 0.0; ]
11/21/18 17:21:32 DAG status: 0 (DAG_STATUS_OK)
11/21/18 17:21:32 Of 1 nodes total:
11/21/18 17:21:32  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/21/18 17:21:32   ===     ===      ===     ===     ===        ===      ===
11/21/18 17:21:32     0       0        0       0       1          0        0
11/21/18 17:21:32 0 job proc(s) currently held
11/21/18 17:21:32 DAGMan Runtime Statistics: [ EventCycleTimeSum = 0.0; EventCycleTimeCount = 0.0; SleepCycleTimeCount = 0.0; SubmitCycleTimeSum = 0.0; SubmitCycleTimeCount = 0.0; LogProcessCycleTimeSum = 0.0; SleepCycleTimeSum = 0.0; LogProcessCycleTimeCount = 0.0; ]
11/21/18 17:21:32 Registering condor_event_timer...
11/21/18 17:21:33 Sleeping for 1 s (DAGMAN_SUBMIT_DELAY) to throttle submissions...
11/21/18 17:21:34 Submitting HTCondor Node examplejob_$(Cluster)_$(Process) job(s)...
11/21/18 17:21:34 Adding a DAGMan workflow log /home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log
11/21/18 17:21:34 Masking the events recorded in the DAGMAN workflow log
11/21/18 17:21:34 Mask for workflow log is 0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36
11/21/18 17:21:34 submitting: /usr/bin/condor_submit -a dag_node_name' '=' 'examplejob_$(Cluster)_$(Process) -a +DAGManJobId' '=' '9964452 -a DAGManJobId' '=' '9964452 -batch-name exampledag.submit+9964452 -a submit_event_notes' '=' 'DAG' 'Node:' 'examplejob_$(Cluster)_$(Process) -a dagman_log' '=' '/home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log -a +DAGManNodesMask' '=' '"0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36" -a ARGS' '=' 'hello' 'world! -a job_name' '=' 'examplejob_$(Cluster)_$(Process) -a DAG_STATUS' '=' '0 -a FAILED_COUNT' '=' '0 -a notification' '=' 'never -a +DAGParentNodeNames' '=' '"" condor/submit/examplejob.submit
11/21/18 17:21:34 From submit: Submitting job(s)...
11/21/18 17:21:34 From submit: 3 job(s) submitted to cluster 9964867.
11/21/18 17:21:34 	assigned HTCondor ID (9964867.0.0)
11/21/18 17:21:34 Just submitted 1 job this cycle...
11/21/18 17:21:34 Currently monitoring 1 HTCondor log file(s)
11/21/18 17:21:34 ERROR: job examplejob_9964867_0 not found!
11/21/18 17:21:34 Event: ULOG_SUBMIT for unknown Node (9964867.0.0) {11/21/18 17:21:34}: ignoring...
11/21/18 17:21:34 ERROR: job examplejob_9964867_1 not found!
11/21/18 17:21:34 Event: ULOG_SUBMIT for unknown Node (9964867.1.0) {11/21/18 17:21:34}: ignoring...
11/21/18 17:21:34 ERROR: job examplejob_9964867_2 not found!
11/21/18 17:21:34 Event: ULOG_SUBMIT for unknown Node (9964867.2.0) {11/21/18 17:21:34}: ignoring...
11/21/18 17:21:34 DAG status: 0 (DAG_STATUS_OK)
11/21/18 17:21:34 Of 1 nodes total:
11/21/18 17:21:34  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/21/18 17:21:34   ===     ===      ===     ===     ===        ===      ===
11/21/18 17:21:34     0       0        1       0       0          0        0
11/21/18 17:21:34 0 job proc(s) currently held
11/21/18 17:21:34 DAGMan Runtime Statistics: [ EventCycleTimeCount = 0.0; SleepCycleTimeSum = 0.0; EventCycleTimeSum = 0.0; LogProcessCycleTimeCount = 1.0; LogProcessCycleTimeSum = 9.989738464355469E-05; LogProcessCycleTimeMin = 9.989738464355469E-05; LogProcessCycleTimeMax = 9.989738464355469E-05; LogProcessCycleTimeAvg = 9.989738464355469E-05; LogProcessCycleTimeStd = 9.989738464355469E-05; SubmitCycleTimeMax = 1.084985971450806; SubmitCycleTimeCount = 1.0; SleepCycleTimeCount = 0.0; SubmitCycleTimeSum = 1.084985971450806; SubmitCycleTimeStd = 1.084985971450806; SubmitCycleTimeAvg = 1.084985971450806; SubmitCycleTimeMin = 1.084985971450806; ]
11/21/18 17:23:24 Currently monitoring 1 HTCondor log file(s)
11/21/18 17:23:24 Event: ULOG_EXECUTE for HTCondor Node examplejob_$(Cluster)_$(Process) (9964867.0.0) {11/21/18 17:23:24}
11/21/18 17:23:24 BAD EVENT: job (9964867.0.0) executing, submit count < 1 (0)
11/21/18 17:23:24 BAD EVENT is warning only
11/21/18 17:23:24 Number of idle job procs: 0
11/21/18 17:23:24 Event: ULOG_EXECUTE for HTCondor Node examplejob_$(Cluster)_$(Process) (9964867.1.0) {11/21/18 17:23:24}
11/21/18 17:23:24 BAD EVENT: job (9964867.1.0) executing, submit count < 1 (0)
11/21/18 17:23:24 BAD EVENT is warning only
11/21/18 17:23:24 Number of idle job procs: 0
11/21/18 17:23:24 Event: ULOG_EXECUTE for HTCondor Node examplejob_$(Cluster)_$(Process) (9964867.2.0) {11/21/18 17:23:24}
11/21/18 17:23:24 BAD EVENT: job (9964867.2.0) executing, submit count < 1 (0)
11/21/18 17:23:24 BAD EVENT is warning only
11/21/18 17:23:24 Number of idle job procs: 0
11/21/18 17:23:24 Event: ULOG_JOB_TERMINATED for HTCondor Node examplejob_$(Cluster)_$(Process) (9964867.0.0) {11/21/18 17:23:24}
11/21/18 17:23:24 BAD EVENT: job (9964867.0.0) ended, submit count < 1 (0)
11/21/18 17:23:24 BAD EVENT is warning only
11/21/18 17:23:24 Number of idle job procs: 0
11/21/18 17:23:24 ERROR "Assertion ERROR on (node->_queuedNodeJobProcs >= 0)" at line 4261 in file /slots/10/dir_2238096/userdir/.tmp4gngbv/BUILD/condor-8.7.7/src/condor_dagman/dag.cpp
11/21/18 17:24:23 Setting maximum file descriptors to 80000.
11/21/18 17:24:23 ******************************************************
11/21/18 17:24:23 ** condor_scheduniv_exec.9964452.0 (CONDOR_DAGMAN) STARTING UP
11/21/18 17:24:23 ** /usr/bin/condor_dagman
11/21/18 17:24:23 ** SubsystemInfo: name=DAGMAN type=DAGMAN(10) class=DAEMON(1)
11/21/18 17:24:23 ** Configuration: subsystem:DAGMAN local:<NONE> class:DAEMON
11/21/18 17:24:23 ** $CondorVersion: 8.7.7 Mar 13 2018 BuildID: 435313 $
11/21/18 17:24:23 ** $CondorPlatform: x86_64_RedHat7 $
11/21/18 17:24:23 ** PID = 97750
11/21/18 17:24:23 ** Log last touched 11/21 17:23:24
11/21/18 17:24:23 ******************************************************
11/21/18 17:24:23 Using config source: /etc/condor/condor_config
11/21/18 17:24:23 Using local config sources: 
11/21/18 17:24:23    /etc/condor/config.d/00_logging
11/21/18 17:24:23    /etc/condor/config.d/00_security
11/21/18 17:24:23    /etc/condor/config.d/20_ganglia
11/21/18 17:24:23    /etc/condor/config.d/20_master_daemons
11/21/18 17:24:23    /etc/condor/config.d/20_schedd
11/21/18 17:24:23    /etc/condor/config.d/20_schedd_limit_num_jobs
11/21/18 17:24:23    /etc/condor/config.d/20_schedd_stats
11/21/18 17:24:23    /etc/condor/config.d/20_schedd_stats_rss
11/21/18 17:24:23    /etc/condor/config.d/20_shared_port
11/21/18 17:24:23    /etc/condor/config.d/20_submit
11/21/18 17:24:23    /etc/condor/config.d/21_submit_singularity
11/21/18 17:24:23    /etc/condor/config.d/60_npx_submit
11/21/18 17:24:23    /etc/condor/config.d/60_npx_submit_sane_reqs
11/21/18 17:24:23    /etc/condor/config.d/99_block_submissions_group_long
11/21/18 17:24:23    /etc/condor/condor_config.local
11/21/18 17:24:23 config Macros = 134, Sorted = 134, StringBytes = 5412, TablesBytes = 4984
11/21/18 17:24:23 CLASSAD_CACHING is ENABLED
11/21/18 17:24:23 Daemon Log is logging: D_ALWAYS D_ERROR
11/21/18 17:24:23 DaemonCore: No command port requested.
11/21/18 17:24:23 DAGMAN_USE_STRICT setting: 1
11/21/18 17:24:23 DAGMAN_VERBOSITY setting: 3
11/21/18 17:24:23 DAGMAN_DEBUG_CACHE_SIZE setting: 5242880
11/21/18 17:24:23 DAGMAN_DEBUG_CACHE_ENABLE setting: False
11/21/18 17:24:23 DAGMAN_SUBMIT_DELAY setting: 1
11/21/18 17:24:23 DAGMAN_MAX_SUBMIT_ATTEMPTS setting: 6
11/21/18 17:24:23 DAGMAN_STARTUP_CYCLE_DETECT setting: False
11/21/18 17:24:23 DAGMAN_MAX_SUBMITS_PER_INTERVAL setting: 5
11/21/18 17:24:23 DAGMAN_AGGRESSIVE_SUBMIT setting: False
11/21/18 17:24:23 DAGMAN_USER_LOG_SCAN_INTERVAL setting: 5
11/21/18 17:24:23 DAGMAN_QUEUE_UPDATE_INTERVAL setting: 300
11/21/18 17:24:23 DAGMAN_DEFAULT_PRIORITY setting: 0
11/21/18 17:24:23 DAGMAN_SUPPRESS_NOTIFICATION setting: True
11/21/18 17:24:23 allow_events (DAGMAN_ALLOW_EVENTS) setting: 114
11/21/18 17:24:23 DAGMAN_RETRY_SUBMIT_FIRST setting: True
11/21/18 17:24:23 DAGMAN_RETRY_NODE_FIRST setting: False
11/21/18 17:24:23 DAGMAN_MAX_JOBS_IDLE setting: 1000
11/21/18 17:24:23 DAGMAN_MAX_JOBS_SUBMITTED setting: 0
11/21/18 17:24:23 DAGMAN_MAX_PRE_SCRIPTS setting: 20
11/21/18 17:24:23 DAGMAN_MAX_POST_SCRIPTS setting: 20
11/21/18 17:24:23 DAGMAN_MUNGE_NODE_NAMES setting: True
11/21/18 17:24:23 DAGMAN_PROHIBIT_MULTI_JOBS setting: False
11/21/18 17:24:23 DAGMAN_SUBMIT_DEPTH_FIRST setting: False
11/21/18 17:24:23 DAGMAN_ALWAYS_RUN_POST setting: False
11/21/18 17:24:23 DAGMAN_ABORT_DUPLICATES setting: True
11/21/18 17:24:23 DAGMAN_ABORT_ON_SCARY_SUBMIT setting: True
11/21/18 17:24:23 DAGMAN_PENDING_REPORT_INTERVAL setting: 600
11/21/18 17:24:23 DAGMAN_AUTO_RESCUE setting: True
11/21/18 17:24:23 DAGMAN_MAX_RESCUE_NUM setting: 100
11/21/18 17:24:23 DAGMAN_WRITE_PARTIAL_RESCUE setting: True
11/21/18 17:24:23 DAGMAN_DEFAULT_NODE_LOG setting: @(DAG_DIR)/@(DAG_FILE).nodes.log
11/21/18 17:24:23 DAGMAN_GENERATE_SUBDAG_SUBMITS setting: True
11/21/18 17:24:23 DAGMAN_MAX_JOB_HOLDS setting: 100
11/21/18 17:24:23 DAGMAN_HOLD_CLAIM_TIME setting: 20
11/21/18 17:24:23 ALL_DEBUG setting: 
11/21/18 17:24:23 DAGMAN_DEBUG setting: 
11/21/18 17:24:23 DAGMAN_SUPPRESS_JOB_LOGS setting: False
11/21/18 17:24:23 DAGMAN_REMOVE_NODE_JOBS setting: True
11/21/18 17:24:23 argv[0] == "condor_scheduniv_exec.9964452.0"
11/21/18 17:24:23 argv[1] == "-Lockfile"
11/21/18 17:24:23 argv[2] == "condor/submit/exampledag.submit.lock"
11/21/18 17:24:23 argv[3] == "-AutoRescue"
11/21/18 17:24:23 argv[4] == "1"
11/21/18 17:24:23 argv[5] == "-DoRescueFrom"
11/21/18 17:24:23 argv[6] == "0"
11/21/18 17:24:23 argv[7] == "-Dag"
11/21/18 17:24:23 argv[8] == "condor/submit/exampledag.submit"
11/21/18 17:24:23 argv[9] == "-Suppress_notification"
11/21/18 17:24:23 argv[10] == "-CsdVersion"
11/21/18 17:24:23 argv[11] == "$CondorVersion: 8.7.7 Mar 13 2018 BuildID: 435313 $"
11/21/18 17:24:23 argv[12] == "-Dagman"
11/21/18 17:24:23 argv[13] == "/usr/bin/condor_dagman"
11/21/18 17:24:23 Workflow batch-name: <exampledag.submit+9964452>
11/21/18 17:24:23 Workflow accounting_group: <>
11/21/18 17:24:23 Workflow accounting_group_user: <>
11/21/18 17:24:23 Warning: failed to get attribute DAGNodeName
11/21/18 17:24:23 DAGMAN_LOG_ON_NFS_IS_ERROR setting: False
11/21/18 17:24:23 Default node log file is: </home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log>
11/21/18 17:24:23 DAG Lockfile will be written to condor/submit/exampledag.submit.lock
11/21/18 17:24:23 DAG Input file is condor/submit/exampledag.submit
11/21/18 17:24:23 Parsing 1 dagfiles
11/21/18 17:24:23 Parsing condor/submit/exampledag.submit ...
11/21/18 17:24:23 Dag contains 1 total jobs
11/21/18 17:24:23 Lock file condor/submit/exampledag.submit.lock detected, 
11/21/18 17:24:23 Duplicate DAGMan PID 92774 is no longer alive; this DAGMan should continue.
11/21/18 17:24:23 Using default node job log file
11/21/18 17:24:23 Sleeping for 3 seconds to ensure ProcessId uniqueness
11/21/18 17:24:26 Bootstrapping...
11/21/18 17:24:26 Number of pre-completed nodes: 0
11/21/18 17:24:26 Running in RECOVERY mode... >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
11/21/18 17:24:26 Currently monitoring 1 HTCondor log file(s)
11/21/18 17:24:26 ERROR: job examplejob_9964472_0 not found!
11/21/18 17:24:26 Event: ULOG_SUBMIT for unknown Node (9964472.0.0) {11/21/18 17:15:29}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: job examplejob_9964472_1 not found!
11/21/18 17:24:26 Event: ULOG_SUBMIT for unknown Node (9964472.1.0) {11/21/18 17:15:29}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: job examplejob_9964472_2 not found!
11/21/18 17:24:26 Event: ULOG_SUBMIT for unknown Node (9964472.2.0) {11/21/18 17:15:29}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964472.0.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_EXECUTE for unknown Node (9964472.0.0) {11/21/18 17:21:23}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964472.1.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_EXECUTE for unknown Node (9964472.1.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964472.0.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964472.0.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964472.2.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_EXECUTE for unknown Node (9964472.2.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964472.1.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964472.1.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964472.2.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964472.2.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: job examplejob_9964867_0 not found!
11/21/18 17:24:26 Event: ULOG_SUBMIT for unknown Node (9964867.0.0) {11/21/18 17:21:34}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: job examplejob_9964867_1 not found!
11/21/18 17:24:26 Event: ULOG_SUBMIT for unknown Node (9964867.1.0) {11/21/18 17:21:34}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: job examplejob_9964867_2 not found!
11/21/18 17:24:26 Event: ULOG_SUBMIT for unknown Node (9964867.2.0) {11/21/18 17:21:34}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964867.0.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_EXECUTE for unknown Node (9964867.0.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964867.1.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_EXECUTE for unknown Node (9964867.1.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964867.2.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_EXECUTE for unknown Node (9964867.2.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964867.0.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964867.0.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964867.1.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964867.1.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:24:26 ERROR: node for condor ID 9964867.2.0 not found! (might be because of node retries)
11/21/18 17:24:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964867.2.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:24:26     ------------------------------
11/21/18 17:24:26        HTCondor Recovery Complete
11/21/18 17:24:26     ------------------------------
11/21/18 17:24:26 ...done with RECOVERY mode <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
11/21/18 17:24:26 DAG status: 0 (DAG_STATUS_OK)
11/21/18 17:24:26 Of 1 nodes total:
11/21/18 17:24:26  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/21/18 17:24:26   ===     ===      ===     ===     ===        ===      ===
11/21/18 17:24:26     0       0        0       0       0          1        0
11/21/18 17:24:26 0 job proc(s) currently held
11/21/18 17:24:26 DAGMan Runtime Statistics: [ EventCycleTimeSum = 0.0; EventCycleTimeCount = 0.0; SleepCycleTimeCount = 0.0; SubmitCycleTimeSum = 0.0; SubmitCycleTimeCount = 0.0; LogProcessCycleTimeSum = 0.0; SleepCycleTimeSum = 0.0; LogProcessCycleTimeCount = 0.0; ]
11/21/18 17:24:26 DAG status: 0 (DAG_STATUS_OK)
11/21/18 17:24:26 Of 1 nodes total:
11/21/18 17:24:26  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/21/18 17:24:26   ===     ===      ===     ===     ===        ===      ===
11/21/18 17:24:26     0       0        0       0       1          0        0
11/21/18 17:24:26 0 job proc(s) currently held
11/21/18 17:24:26 DAGMan Runtime Statistics: [ EventCycleTimeSum = 0.0; EventCycleTimeCount = 0.0; SleepCycleTimeCount = 0.0; SubmitCycleTimeSum = 0.0; SubmitCycleTimeCount = 0.0; LogProcessCycleTimeSum = 0.0; SleepCycleTimeSum = 0.0; LogProcessCycleTimeCount = 0.0; ]
11/21/18 17:24:26 Registering condor_event_timer...
11/21/18 17:24:27 Sleeping for 1 s (DAGMAN_SUBMIT_DELAY) to throttle submissions...
11/21/18 17:24:28 Submitting HTCondor Node examplejob_$(Cluster)_$(Process) job(s)...
11/21/18 17:24:28 Adding a DAGMan workflow log /home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log
11/21/18 17:24:28 Masking the events recorded in the DAGMAN workflow log
11/21/18 17:24:28 Mask for workflow log is 0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36
11/21/18 17:24:28 submitting: /usr/bin/condor_submit -a dag_node_name' '=' 'examplejob_$(Cluster)_$(Process) -a +DAGManJobId' '=' '9964452 -a DAGManJobId' '=' '9964452 -batch-name exampledag.submit+9964452 -a submit_event_notes' '=' 'DAG' 'Node:' 'examplejob_$(Cluster)_$(Process) -a dagman_log' '=' '/home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log -a +DAGManNodesMask' '=' '"0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36" -a ARGS' '=' 'hello' 'world! -a job_name' '=' 'examplejob_$(Cluster)_$(Process) -a DAG_STATUS' '=' '0 -a FAILED_COUNT' '=' '0 -a notification' '=' 'never -a +DAGParentNodeNames' '=' '"" condor/submit/examplejob.submit
11/21/18 17:24:29 From submit: Submitting job(s)...
11/21/18 17:24:29 From submit: 3 job(s) submitted to cluster 9965000.
11/21/18 17:24:29 	assigned HTCondor ID (9965000.0.0)
11/21/18 17:24:29 Just submitted 1 job this cycle...
11/21/18 17:24:29 Currently monitoring 1 HTCondor log file(s)
11/21/18 17:24:29 ERROR: job examplejob_9965000_0 not found!
11/21/18 17:24:29 Event: ULOG_SUBMIT for unknown Node (9965000.0.0) {11/21/18 17:24:29}: ignoring...
11/21/18 17:24:29 ERROR: job examplejob_9965000_1 not found!
11/21/18 17:24:29 Event: ULOG_SUBMIT for unknown Node (9965000.1.0) {11/21/18 17:24:29}: ignoring...
11/21/18 17:24:29 ERROR: job examplejob_9965000_2 not found!
11/21/18 17:24:29 Event: ULOG_SUBMIT for unknown Node (9965000.2.0) {11/21/18 17:24:29}: ignoring...
11/21/18 17:24:29 DAG status: 0 (DAG_STATUS_OK)
11/21/18 17:24:29 Of 1 nodes total:
11/21/18 17:24:29  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/21/18 17:24:29   ===     ===      ===     ===     ===        ===      ===
11/21/18 17:24:29     0       0        1       0       0          0        0
11/21/18 17:24:29 0 job proc(s) currently held
11/21/18 17:24:29 DAGMan Runtime Statistics: [ EventCycleTimeCount = 0.0; SleepCycleTimeSum = 0.0; EventCycleTimeSum = 0.0; LogProcessCycleTimeCount = 1.0; LogProcessCycleTimeSum = 0.000102996826171875; LogProcessCycleTimeMin = 0.000102996826171875; LogProcessCycleTimeMax = 0.000102996826171875; LogProcessCycleTimeAvg = 0.000102996826171875; LogProcessCycleTimeStd = 0.000102996826171875; SubmitCycleTimeMax = 1.086297035217285; SubmitCycleTimeCount = 1.0; SleepCycleTimeCount = 0.0; SubmitCycleTimeSum = 1.086297035217285; SubmitCycleTimeStd = 1.086297035217285; SubmitCycleTimeAvg = 1.086297035217285; SubmitCycleTimeMin = 1.086297035217285; ]
11/21/18 17:26:29 Currently monitoring 1 HTCondor log file(s)
11/21/18 17:26:29 Event: ULOG_EXECUTE for HTCondor Node examplejob_$(Cluster)_$(Process) (9965000.2.0) {11/21/18 17:26:24}
11/21/18 17:26:29 BAD EVENT: job (9965000.2.0) executing, submit count < 1 (0)
11/21/18 17:26:29 BAD EVENT is warning only
11/21/18 17:26:29 Number of idle job procs: 0
11/21/18 17:26:29 Event: ULOG_EXECUTE for HTCondor Node examplejob_$(Cluster)_$(Process) (9965000.0.0) {11/21/18 17:26:24}
11/21/18 17:26:29 BAD EVENT: job (9965000.0.0) executing, submit count < 1 (0)
11/21/18 17:26:29 BAD EVENT is warning only
11/21/18 17:26:29 Number of idle job procs: 0
11/21/18 17:26:29 Event: ULOG_EXECUTE for HTCondor Node examplejob_$(Cluster)_$(Process) (9965000.1.0) {11/21/18 17:26:24}
11/21/18 17:26:29 BAD EVENT: job (9965000.1.0) executing, submit count < 1 (0)
11/21/18 17:26:29 BAD EVENT is warning only
11/21/18 17:26:29 Number of idle job procs: 0
11/21/18 17:26:29 Event: ULOG_JOB_TERMINATED for HTCondor Node examplejob_$(Cluster)_$(Process) (9965000.2.0) {11/21/18 17:26:24}
11/21/18 17:26:29 BAD EVENT: job (9965000.2.0) ended, submit count < 1 (0)
11/21/18 17:26:29 BAD EVENT is warning only
11/21/18 17:26:29 Number of idle job procs: 0
11/21/18 17:26:29 ERROR "Assertion ERROR on (node->_queuedNodeJobProcs >= 0)" at line 4261 in file /slots/10/dir_2238096/userdir/.tmp4gngbv/BUILD/condor-8.7.7/src/condor_dagman/dag.cpp
11/21/18 17:27:23 Setting maximum file descriptors to 80000.
11/21/18 17:27:23 ******************************************************
11/21/18 17:27:23 ** condor_scheduniv_exec.9964452.0 (CONDOR_DAGMAN) STARTING UP
11/21/18 17:27:23 ** /usr/bin/condor_dagman
11/21/18 17:27:23 ** SubsystemInfo: name=DAGMAN type=DAGMAN(10) class=DAEMON(1)
11/21/18 17:27:23 ** Configuration: subsystem:DAGMAN local:<NONE> class:DAEMON
11/21/18 17:27:23 ** $CondorVersion: 8.7.7 Mar 13 2018 BuildID: 435313 $
11/21/18 17:27:23 ** $CondorPlatform: x86_64_RedHat7 $
11/21/18 17:27:23 ** PID = 103976
11/21/18 17:27:23 ** Log last touched 11/21 17:26:29
11/21/18 17:27:23 ******************************************************
11/21/18 17:27:23 Using config source: /etc/condor/condor_config
11/21/18 17:27:23 Using local config sources: 
11/21/18 17:27:23    /etc/condor/config.d/00_logging
11/21/18 17:27:23    /etc/condor/config.d/00_security
11/21/18 17:27:23    /etc/condor/config.d/20_ganglia
11/21/18 17:27:23    /etc/condor/config.d/20_master_daemons
11/21/18 17:27:23    /etc/condor/config.d/20_schedd
11/21/18 17:27:23    /etc/condor/config.d/20_schedd_limit_num_jobs
11/21/18 17:27:23    /etc/condor/config.d/20_schedd_stats
11/21/18 17:27:23    /etc/condor/config.d/20_schedd_stats_rss
11/21/18 17:27:23    /etc/condor/config.d/20_shared_port
11/21/18 17:27:23    /etc/condor/config.d/20_submit
11/21/18 17:27:23    /etc/condor/config.d/21_submit_singularity
11/21/18 17:27:23    /etc/condor/config.d/60_npx_submit
11/21/18 17:27:23    /etc/condor/config.d/60_npx_submit_sane_reqs
11/21/18 17:27:23    /etc/condor/config.d/99_block_submissions_group_long
11/21/18 17:27:23    /etc/condor/condor_config.local
11/21/18 17:27:23 config Macros = 134, Sorted = 134, StringBytes = 5414, TablesBytes = 4984
11/21/18 17:27:23 CLASSAD_CACHING is ENABLED
11/21/18 17:27:23 Daemon Log is logging: D_ALWAYS D_ERROR
11/21/18 17:27:23 DaemonCore: No command port requested.
11/21/18 17:27:23 DAGMAN_USE_STRICT setting: 1
11/21/18 17:27:23 DAGMAN_VERBOSITY setting: 3
11/21/18 17:27:23 DAGMAN_DEBUG_CACHE_SIZE setting: 5242880
11/21/18 17:27:23 DAGMAN_DEBUG_CACHE_ENABLE setting: False
11/21/18 17:27:23 DAGMAN_SUBMIT_DELAY setting: 1
11/21/18 17:27:23 DAGMAN_MAX_SUBMIT_ATTEMPTS setting: 6
11/21/18 17:27:23 DAGMAN_STARTUP_CYCLE_DETECT setting: False
11/21/18 17:27:23 DAGMAN_MAX_SUBMITS_PER_INTERVAL setting: 5
11/21/18 17:27:23 DAGMAN_AGGRESSIVE_SUBMIT setting: False
11/21/18 17:27:23 DAGMAN_USER_LOG_SCAN_INTERVAL setting: 5
11/21/18 17:27:23 DAGMAN_QUEUE_UPDATE_INTERVAL setting: 300
11/21/18 17:27:23 DAGMAN_DEFAULT_PRIORITY setting: 0
11/21/18 17:27:23 DAGMAN_SUPPRESS_NOTIFICATION setting: True
11/21/18 17:27:23 allow_events (DAGMAN_ALLOW_EVENTS) setting: 114
11/21/18 17:27:23 DAGMAN_RETRY_SUBMIT_FIRST setting: True
11/21/18 17:27:23 DAGMAN_RETRY_NODE_FIRST setting: False
11/21/18 17:27:23 DAGMAN_MAX_JOBS_IDLE setting: 1000
11/21/18 17:27:23 DAGMAN_MAX_JOBS_SUBMITTED setting: 0
11/21/18 17:27:23 DAGMAN_MAX_PRE_SCRIPTS setting: 20
11/21/18 17:27:23 DAGMAN_MAX_POST_SCRIPTS setting: 20
11/21/18 17:27:23 DAGMAN_MUNGE_NODE_NAMES setting: True
11/21/18 17:27:23 DAGMAN_PROHIBIT_MULTI_JOBS setting: False
11/21/18 17:27:23 DAGMAN_SUBMIT_DEPTH_FIRST setting: False
11/21/18 17:27:23 DAGMAN_ALWAYS_RUN_POST setting: False
11/21/18 17:27:23 DAGMAN_ABORT_DUPLICATES setting: True
11/21/18 17:27:23 DAGMAN_ABORT_ON_SCARY_SUBMIT setting: True
11/21/18 17:27:23 DAGMAN_PENDING_REPORT_INTERVAL setting: 600
11/21/18 17:27:23 DAGMAN_AUTO_RESCUE setting: True
11/21/18 17:27:23 DAGMAN_MAX_RESCUE_NUM setting: 100
11/21/18 17:27:23 DAGMAN_WRITE_PARTIAL_RESCUE setting: True
11/21/18 17:27:23 DAGMAN_DEFAULT_NODE_LOG setting: @(DAG_DIR)/@(DAG_FILE).nodes.log
11/21/18 17:27:23 DAGMAN_GENERATE_SUBDAG_SUBMITS setting: True
11/21/18 17:27:23 DAGMAN_MAX_JOB_HOLDS setting: 100
11/21/18 17:27:23 DAGMAN_HOLD_CLAIM_TIME setting: 20
11/21/18 17:27:23 ALL_DEBUG setting: 
11/21/18 17:27:23 DAGMAN_DEBUG setting: 
11/21/18 17:27:23 DAGMAN_SUPPRESS_JOB_LOGS setting: False
11/21/18 17:27:23 DAGMAN_REMOVE_NODE_JOBS setting: True
11/21/18 17:27:23 argv[0] == "condor_scheduniv_exec.9964452.0"
11/21/18 17:27:23 argv[1] == "-Lockfile"
11/21/18 17:27:23 argv[2] == "condor/submit/exampledag.submit.lock"
11/21/18 17:27:23 argv[3] == "-AutoRescue"
11/21/18 17:27:23 argv[4] == "1"
11/21/18 17:27:23 argv[5] == "-DoRescueFrom"
11/21/18 17:27:23 argv[6] == "0"
11/21/18 17:27:23 argv[7] == "-Dag"
11/21/18 17:27:23 argv[8] == "condor/submit/exampledag.submit"
11/21/18 17:27:23 argv[9] == "-Suppress_notification"
11/21/18 17:27:23 argv[10] == "-CsdVersion"
11/21/18 17:27:23 argv[11] == "$CondorVersion: 8.7.7 Mar 13 2018 BuildID: 435313 $"
11/21/18 17:27:23 argv[12] == "-Dagman"
11/21/18 17:27:23 argv[13] == "/usr/bin/condor_dagman"
11/21/18 17:27:23 Workflow batch-name: <exampledag.submit+9964452>
11/21/18 17:27:23 Workflow accounting_group: <>
11/21/18 17:27:23 Workflow accounting_group_user: <>
11/21/18 17:27:23 Warning: failed to get attribute DAGNodeName
11/21/18 17:27:23 DAGMAN_LOG_ON_NFS_IS_ERROR setting: False
11/21/18 17:27:23 Default node log file is: </home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log>
11/21/18 17:27:23 DAG Lockfile will be written to condor/submit/exampledag.submit.lock
11/21/18 17:27:23 DAG Input file is condor/submit/exampledag.submit
11/21/18 17:27:23 Parsing 1 dagfiles
11/21/18 17:27:23 Parsing condor/submit/exampledag.submit ...
11/21/18 17:27:23 Dag contains 1 total jobs
11/21/18 17:27:23 Lock file condor/submit/exampledag.submit.lock detected, 
11/21/18 17:27:23 Duplicate DAGMan PID 97750 is no longer alive; this DAGMan should continue.
11/21/18 17:27:23 Using default node job log file
11/21/18 17:27:23 Sleeping for 3 seconds to ensure ProcessId uniqueness
11/21/18 17:27:26 Bootstrapping...
11/21/18 17:27:26 Number of pre-completed nodes: 0
11/21/18 17:27:26 Running in RECOVERY mode... >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
11/21/18 17:27:26 Currently monitoring 1 HTCondor log file(s)
11/21/18 17:27:26 ERROR: job examplejob_9964472_0 not found!
11/21/18 17:27:26 Event: ULOG_SUBMIT for unknown Node (9964472.0.0) {11/21/18 17:15:29}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: job examplejob_9964472_1 not found!
11/21/18 17:27:26 Event: ULOG_SUBMIT for unknown Node (9964472.1.0) {11/21/18 17:15:29}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: job examplejob_9964472_2 not found!
11/21/18 17:27:26 Event: ULOG_SUBMIT for unknown Node (9964472.2.0) {11/21/18 17:15:29}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964472.0.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_EXECUTE for unknown Node (9964472.0.0) {11/21/18 17:21:23}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964472.1.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_EXECUTE for unknown Node (9964472.1.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964472.0.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964472.0.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964472.2.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_EXECUTE for unknown Node (9964472.2.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964472.1.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964472.1.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964472.2.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964472.2.0) {11/21/18 17:21:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: job examplejob_9964867_0 not found!
11/21/18 17:27:26 Event: ULOG_SUBMIT for unknown Node (9964867.0.0) {11/21/18 17:21:34}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: job examplejob_9964867_1 not found!
11/21/18 17:27:26 Event: ULOG_SUBMIT for unknown Node (9964867.1.0) {11/21/18 17:21:34}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: job examplejob_9964867_2 not found!
11/21/18 17:27:26 Event: ULOG_SUBMIT for unknown Node (9964867.2.0) {11/21/18 17:21:34}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964867.0.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_EXECUTE for unknown Node (9964867.0.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964867.1.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_EXECUTE for unknown Node (9964867.1.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964867.2.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_EXECUTE for unknown Node (9964867.2.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964867.0.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964867.0.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964867.1.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964867.1.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9964867.2.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_JOB_TERMINATED for unknown Node (9964867.2.0) {11/21/18 17:23:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: job examplejob_9965000_0 not found!
11/21/18 17:27:26 Event: ULOG_SUBMIT for unknown Node (9965000.0.0) {11/21/18 17:24:29}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: job examplejob_9965000_1 not found!
11/21/18 17:27:26 Event: ULOG_SUBMIT for unknown Node (9965000.1.0) {11/21/18 17:24:29}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: job examplejob_9965000_2 not found!
11/21/18 17:27:26 Event: ULOG_SUBMIT for unknown Node (9965000.2.0) {11/21/18 17:24:29}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9965000.2.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_EXECUTE for unknown Node (9965000.2.0) {11/21/18 17:26:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9965000.0.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_EXECUTE for unknown Node (9965000.0.0) {11/21/18 17:26:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9965000.1.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_EXECUTE for unknown Node (9965000.1.0) {11/21/18 17:26:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9965000.2.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_JOB_TERMINATED for unknown Node (9965000.2.0) {11/21/18 17:26:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9965000.0.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_JOB_TERMINATED for unknown Node (9965000.0.0) {11/21/18 17:26:24}: ignoring... [recovery mode]
11/21/18 17:27:26 ERROR: node for condor ID 9965000.1.0 not found! (might be because of node retries)
11/21/18 17:27:26 Event: ULOG_JOB_TERMINATED for unknown Node (9965000.1.0) {11/21/18 17:26:24}: ignoring... [recovery mode]
11/21/18 17:27:26     ------------------------------
11/21/18 17:27:26        HTCondor Recovery Complete
11/21/18 17:27:26     ------------------------------
11/21/18 17:27:26 ...done with RECOVERY mode <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
11/21/18 17:27:26 DAG status: 0 (DAG_STATUS_OK)
11/21/18 17:27:26 Of 1 nodes total:
11/21/18 17:27:26  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/21/18 17:27:26   ===     ===      ===     ===     ===        ===      ===
11/21/18 17:27:26     0       0        0       0       0          1        0
11/21/18 17:27:26 0 job proc(s) currently held
11/21/18 17:27:26 DAGMan Runtime Statistics: [ EventCycleTimeSum = 0.0; EventCycleTimeCount = 0.0; SleepCycleTimeCount = 0.0; SubmitCycleTimeSum = 0.0; SubmitCycleTimeCount = 0.0; LogProcessCycleTimeSum = 0.0; SleepCycleTimeSum = 0.0; LogProcessCycleTimeCount = 0.0; ]
11/21/18 17:27:26 DAG status: 0 (DAG_STATUS_OK)
11/21/18 17:27:26 Of 1 nodes total:
11/21/18 17:27:26  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/21/18 17:27:26   ===     ===      ===     ===     ===        ===      ===
11/21/18 17:27:26     0       0        0       0       1          0        0
11/21/18 17:27:26 0 job proc(s) currently held
11/21/18 17:27:26 DAGMan Runtime Statistics: [ EventCycleTimeSum = 0.0; EventCycleTimeCount = 0.0; SleepCycleTimeCount = 0.0; SubmitCycleTimeSum = 0.0; SubmitCycleTimeCount = 0.0; LogProcessCycleTimeSum = 0.0; SleepCycleTimeSum = 0.0; LogProcessCycleTimeCount = 0.0; ]
11/21/18 17:27:26 Registering condor_event_timer...
11/21/18 17:27:27 Sleeping for 1 s (DAGMAN_SUBMIT_DELAY) to throttle submissions...
11/21/18 17:27:28 Submitting HTCondor Node examplejob_$(Cluster)_$(Process) job(s)...
11/21/18 17:27:28 Adding a DAGMan workflow log /home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log
11/21/18 17:27:28 Masking the events recorded in the DAGMAN workflow log
11/21/18 17:27:28 Mask for workflow log is 0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36
11/21/18 17:27:28 submitting: /usr/bin/condor_submit -a dag_node_name' '=' 'examplejob_$(Cluster)_$(Process) -a +DAGManJobId' '=' '9964452 -a DAGManJobId' '=' '9964452 -batch-name exampledag.submit+9964452 -a submit_event_notes' '=' 'DAG' 'Node:' 'examplejob_$(Cluster)_$(Process) -a dagman_log' '=' '/home/jbourbeau/software/pycondor/examples_untracked/condor/submit/exampledag.submit.nodes.log -a +DAGManNodesMask' '=' '"0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36" -a ARGS' '=' 'hello' 'world! -a job_name' '=' 'examplejob_$(Cluster)_$(Process) -a DAG_STATUS' '=' '0 -a FAILED_COUNT' '=' '0 -a notification' '=' 'never -a +DAGParentNodeNames' '=' '"" condor/submit/examplejob.submit
11/21/18 17:27:28 From submit: Submitting job(s)...
11/21/18 17:27:28 From submit: 3 job(s) submitted to cluster 9965294.
11/21/18 17:27:28 	assigned HTCondor ID (9965294.0.0)
11/21/18 17:27:28 Just submitted 1 job this cycle...
11/21/18 17:27:28 Currently monitoring 1 HTCondor log file(s)
11/21/18 17:27:28 ERROR: job examplejob_9965294_0 not found!
11/21/18 17:27:28 Event: ULOG_SUBMIT for unknown Node (9965294.0.0) {11/21/18 17:27:28}: ignoring...
11/21/18 17:27:28 ERROR: job examplejob_9965294_1 not found!
11/21/18 17:27:28 Event: ULOG_SUBMIT for unknown Node (9965294.1.0) {11/21/18 17:27:28}: ignoring...
11/21/18 17:27:28 ERROR: job examplejob_9965294_2 not found!
11/21/18 17:27:28 Event: ULOG_SUBMIT for unknown Node (9965294.2.0) {11/21/18 17:27:28}: ignoring...
11/21/18 17:27:28 DAG status: 0 (DAG_STATUS_OK)
11/21/18 17:27:28 Of 1 nodes total:
11/21/18 17:27:28  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
11/21/18 17:27:28   ===     ===      ===     ===     ===        ===      ===
11/21/18 17:27:28     0       0        1       0       0          0        0
11/21/18 17:27:28 0 job proc(s) currently held
11/21/18 17:27:28 DAGMan Runtime Statistics: [ EventCycleTimeCount = 0.0; SleepCycleTimeSum = 0.0; EventCycleTimeSum = 0.0; LogProcessCycleTimeCount = 1.0; LogProcessCycleTimeSum = 0.0001089572906494141; LogProcessCycleTimeMin = 0.0001089572906494141; LogProcessCycleTimeMax = 0.0001089572906494141; LogProcessCycleTimeAvg = 0.0001089572906494141; LogProcessCycleTimeStd = 0.0001089572906494141; SubmitCycleTimeMax = 1.104439973831177; SubmitCycleTimeCount = 1.0; SleepCycleTimeCount = 0.0; SubmitCycleTimeSum = 1.104439973831177; SubmitCycleTimeStd = 1.104439973831177; SubmitCycleTimeAvg = 1.104439973831177; SubmitCycleTimeMin = 1.104439973831177; ]

Add tests

Adding tests would be a nice enhancement

Add extra_lines option to Dagman object

Currently, the Job object has an option, extra_lines, where the user can specify any additional lines they would like to be added to the Job submit file. I would like to add this same feature to the Dagman object so that users can also add custom lines to the dag submit file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.