Giter VIP home page Giter VIP logo

geos-chem-schedule's Introduction

geos-chem-schedule

Reads a GEOS-Chem (http://geos-chem.org/) run directory's input file (input.geos) and uses this to split up a job multiple jobs. These jobs can be submitted as part of this process or manually submitted at a later point.

This script intends to make it easier to split up GEOS-Chem jobs into smaller parts that can better fit into queues available on a high-performance computing (HPC) facility. Currently, this script is compatible with PBS and SLURM.

Install

To install download this repository with git clone. Recommended location would be $HOME/src/geos-chem-schedule. Once downloaded, you can just add a symbolic link for the geos-chem-schedule.py to a run directory and run by calling directly in that directory (e.g. Python geos-chem-schedule.py). Alternatively, you can install to be run through bash and this is done by navigating to the geos-chem-schedule folder containing geos-chem-schedule.py and running the following steps:

  1. Download the script and run the setup with the following commands.
mkdir -p $HOME/src
cd $HOME/src
git clone https://github.com/wacl-york/geos-chem-schedule.git
python geos-chem-schedule/geos-chem-schedule.py --setup

OR

If you have AC_tools downloaded (https://github.com/tsherwen/AC_tools), Go to your AC_tools dir and do the following commands.

git submodule update --recursive --remote
python Scripts/geos-chem-schedule/geos-chem-schedule.py --setup

Copy and paste the command provided into the terminal, which will allow you to use the command "geos-chem-schedule.py" from any folder

Edit your settings.json file for options like default memory requirements, default run queue, default job name, and add your email address.

Use

Via Python

Create a symbolic link in the GEOS-Chem run directory and then call this and follow the steps on screen

ln -s <route to geos-chem-schedule>/geos-chem-schedule/geos-chem-schedule.py .
Python geos-chem-schedule.py

Via bash

Go to your GEOS-Chem run directory and confirm your input.geos file is correct. type the following command if you have followed up the setup:

geos-chem-schedule

Follow the UI instructions on screen.

The final option allows you to run the script immediately, or if you want to run the command later, it creates a file run_geos.sh. This file can be executed by typing:

bash run_geos.sh

The script has a UI to chose job name, queue name, priority, if you want to start the jobs outside of work hours, and if you want to have the script submit the job to the queue.

The script can also take command line arguments. Type geos-chem-schedule.py --help for more info. This can be useful if you have lots of simulations you want to send off via a script.

For example:

geos-chem-schedule.py --job-name=bob --step=month --queue-name=run --queue-priority=100 --out-of-hours=yes --submit=yes

This will give the the job a name of 'bob' in the 'run' queue, split up the jobs into months, and run the job with priority of 100 (only availible for PBS jobs currently). The jobs will only start if out-of-hours and if the job starts in working hours it will resubmit itself with a command to wait until 1800. The job will be submitted at the end of the script.

WARNINGS:

If using bpch output for GEOS-Chem instead of the default NetCDF (v11+), then note this script forces bpch output to be produced (setting=3) for the end of simulation date and replaces all other days with a 0. If you want every day to run with a 3 then use --step=daily.

History

  • 2020-07-31

Updated to allow use of script with SLURM schedular General re-writing of functions for clarity and addition of documentation Various fixes applied (inc. enable email functionality for SLURM jobs) Repository name updated to geos-chem-schedule to reflect functionality

  • 2016-11-14

Allow options for --step=week,day,month Allow setup via geos-chem-schedule.py --setup Code changes to make it more readable

  • 2015-04-01

Updated the naming scheme of the log to YYYYMMDD.geos.log Changed from an error if the job name is over 9 characters to truncating the name to only 9 characters.

  • 2015-02-18

Added options to submit arguments from the command line instead of the UI Added an option to only start jobs out of work hours (0800-1800 Monday - Friday) Changed the naming scheme of the logs.

  • 2015-01-14

Option to send the run script to the queue straight away. Option to name the job (up to 9 characters). Option to name the queue you which to run on with error checking. Option to chose priority of the jobs. Now only sends one job to the queue at a time, reducing mess in qstat. Now calls then next month upon completion of the current month.

geos-chem-schedule's People

Contributors

bennewsome avatar kilicomu avatar tsherwen avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

tsherwen

geos-chem-schedule's Issues

Restore ability to call from a terminal command

It used to be possible to call the scheduler script from the command line (e.g. with the command below).

python geos-chem-schedule.py --job-name=hello_world --step=day --queue-name=test --submit=yes

However, this now results in the error message below:

Invalid argument --submit=yes
                     Try --help for more info.

Overhaul command line argument parsing

geos-chem-schedule/core.py

Lines 80 to 145 in f543a09

def get_arguments(inputs, debug=False):
"""
Get the arguments supplied from command line
Parameters
-------
inputs (GC_Job class): Class containing various inputs like a dictionary
debug (bool): Print debugging output to the screen
Returns
-------
(GC_Job class)
"""
# If there are no arguments then run the GUI
if len(sys.argv) > 1:
for arg in sys.argv:
if "geos-chem-schedule" in arg:
continue
if arg.startswith("--setup"):
setup_script()
elif arg.startswith("--job-name="):
inputs.job_name = (arg[11:].strip())[:9]
elif arg.startswith("--step="):
inputs.step = arg[7:].strip()
elif arg.startswith("--queue-name="):
inputs.queue_name = arg[13:].strip()
elif arg.startswith("--queue-priority="):
inputs.queue_priority = arg[17:].strip()
elif arg.startswith("--submit="):
inputs.run_script_string = arg[9:].strip()
elif arg.startswith("--out-of-hours="):
inputs.out_of_hours_string = arg[15:].strip()
elif arg.startswith("--wall-time="):
inputs.wall_time = arg[12:].strip()
elif arg.startswith("--cpus-need="):
inputs.cpus_need = arg[12:].strip()
elif arg.startswith("--submit_jobs_together="):
inputs.cpus_need = arg[23:].strip()
elif arg.startswith("--memory-need="):
inputs.memory_need = arg[14:].strip()
elif arg.startswith("--help"):
print("""
geos-chem-schedule.py
For UI run without arguments
Arguments are:
--job-name=
--step=
--queue-name=
--queue-priority=
--submit=
--out-of-hours=
--wall-time=
--submit_jobs_together=
--memory-need=
--cpus-need=
e.g. to set the queue name to 'bob' write --queue-name=bob
""")
else:
print("""Invalid argument {arg}
Try --help for more info.""".format(arg=arg)
)
sys.exit(2)
else:
inputs = get_variables_from_cli(inputs)
return inputs

Python provides a nice way to do this via argparse. Custom sys.argv handling is messy and will inevitably result in unforeseen errors!

Add option to loop a single year a number of times

GCST uses a single year run twice, with the 1st as starting input for second, for the 1 year benchmarking. This would be a useful functionality for comparisons with benchmarks. It also may be a better approach in future to avoid the "is the atmosphere in equilibrium" question as a single year rather than contiguous years are used for spin-up/analysis.

Ensure split jobs stop if a proceeding job fails

Currently, split jobs can be submitted with a dependency to proceed if the preceding job is completed successfully. However, when jobs exit with a model crash the rest of the queued jobs submitted together are proceeding.

This is [create_SLURM_run_script2submit_together](https://github.com/wacl-york/geos-chem-schedule/blob/main/core.py#L1033-L1065) which uses the SLURM option --dependency=afterok.

TODO: work out how to capture all the job/model fail codes via SLURM and then abort the following model runs in the queue.

def create_SLURM_run_script2submit_together(times):
    """
    Create the script that can set the 1st scheduled job running
    Parameters
    -------
    time (str): string time to run job script for in the format YYYYMMDD
    Returns
    -------
    (None)
    """
    print(times)
    FileName = 'run_geos_SLURM_queue_all_jobs.sh'
    run_script = open(FileName, 'w')
    Line0 = "#!/bin/bash \n"
    Line1 = """job_num_{time}=$(sbatch --parsable SLURM_queue_files/{time}.sbatch) \n"""
    Line2 = """echo "$job_num_{time}" \n"""
    Line3 = """job_num_{time2}=$(sbatch --parsable --dependency=afterok:"$job_num_{time1}" SLURM_queue_files/{time2}.sbatch) \n"""
    for n_time, time in enumerate(times[:-1]):
        #
        if time == times[0]:
            run_script.write(Line0)
            run_script.write(Line1.format(time=time))
            run_script.write(Line2.format(time=time))
        else:
            run_script.write(Line3.format(time1=times[n_time-1], time2=time))
            run_script.write(Line2.format(time=time))
    run_script.close()
    # Change the permissions so it is executable
    st = os.stat(FileName)
    os.chmod(FileName, st.st_mode | stat.S_IEXEC)
    return

Example emailed error codes are (1) and a model run abort output of (2).

(1) Slurm Job_id=17908287 Name=Iso.UnlimAll.2 Ended, Run time 1-09:03:48, COMPLETED, ExitCode 0

(2)

---> DATE: 2018/06/05  UTC: 09:30  X-HRS:   3729.500000
===============================================================================
WETDEP: ERROR at   42  23  71 for species  128 in area RESUSPENSION in middle levels
 LS          :  T
 PDOWN       :   0.000000000000000E+000
 QQ          :   0.000000000000000E+000
 ALPHA       :   0.000000000000000E+000
 ALPHA2      :   0.000000000000000E+000
 RAINFRAC    :   0.000000000000000E+000
 WASHFRAC    :   0.000000000000000E+000
 MASS_WASH   :   0.000000000000000E+000
 MASS_NOWASH :   0.000000000000000E+000
 WETLOSS     :   0.000000000000000E+000
 GAINED      :   0.000000000000000E+000
 LOST        :   0.000000000000000E+000
 DSpc(NW,:)  :   0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000
 Spc(I,J,:N) :   0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000  0.000000000000000E+000
  0.000000000000000E+000  0.000000000000000E+000 -1.198463418495579E-013
 -3.378856226737475E-014 -2.314320910996734E-015 -3.019816125666191E-017
 -2.520870547180549E-019 -2.822882195877689E-020 -1.445442763521664E-020
 -6.900126364332953E-022 -8.767709715535021E-024 -4.400381547465612E-025
 -1.720033035554861E-026 -4.865030418189842E-028 -1.209928241506788E-029
 -2.093866091759672E-031 -5.970450354644719E-033 -5.162779587092464E-035
 -3.748736158226715E-037 -6.734895733485428E-040 -1.499747125810236E-039
 -1.842168830893617E-038 -1.148225302626263E-037 -8.781446960823125E-037
 -5.560642728382409E-034 -3.794567594400997E-031 -1.821084089043769E-029
 -4.488053629617312E-028 -1.680343276761365E-026 -1.155873919999522E-024
 -3.060986958260714E-022 -1.313308483959904E-021 -1.574449846171001E-021
 -1.338100164273038E-021 -6.956426779262325E-022 -3.506180178581743E-022
 -6.266069886329827E-022 -1.353555184532673E-021 -4.106777266211684E-021
 -1.014640714706731E-020 -1.134631269064776E-020 -9.833174509975247E-021
 -7.707150917708346E-021
===============================================================================
===============================================================================
GEOS-Chem ERROR: Error encountered in wet deposition!
 -> at SAFETY (in module GeosCore/wetscav_mod.F90)
===============================================================================

===============================================================================
GEOS-Chem ERROR: Error encountered in "Safety"!
 -> at Do_Complete_Reevap (in module GeosCore/wetscav_mod.F90)
===============================================================================

===============================================================================
GEOS-Chem ERROR:
 -> at WetDep (in module GeosCore/wetscav_mod.F90)
===============================================================================

===============================================================================
GEOS-Chem ERROR: Error encountered in "Wetdep"!
 -> at Do_WetDep (in module GeosCore/wetscav_mod.F90)
===============================================================================

===============================================================================
GEOS-CHEM ERROR: Error encountered in "Do_WetDep"!
STOP at  -> at GEOS-Chem (in GeosCore/main.F90)
===============================================================================
srun: error: node112: task 0: Exited with exit code 159

Refactor repo (e.g. split off testing suite)

Refactor the code from the current setup where all functionality is in a single file, to functionality by file names with appropriate names (e.g. testing, core, scripts etc).

Rename repository to reflect job submission is no longer linked to months?

Currently, the functionality is to split runs by various numbers of days, weeks, and months. Originally this code was only used to split up runs by months. The name of the repository no longer reflects the main functionality/use so a new name may be better.

An example could be "GC_job_split".

Improve installation process

def setup_script():
"""
Creates a symbolic link to allow running "geos-chem-schedule" from any directory
"""
print("\n",
"geos-chem-schedule setup complete. Change your default settings in settings.json\n",
"To run the script from anywhere with the geos-chem-schedule command,",
"copy the following code into your terminal. \n")
script_location = os.path.realpath(__file__)
# make sure the script is excecutable
print("chmod 755 {script}".format(script=script_location))
# Make sure there is a ~/bin file
print("mkdir -p $HOME/bin")
# Create a symlink from the file to the bin
print("ln -s {script} $HOME/bin/geos-chem-schedule".format(script=script_location))
# Make sure the ~/bin is in the bashrc
# with open('$HOME/.bashrc','a') as bashrc:
# bashrc.write('## Written by geos-chem-schedule')
# bashrc.write('export PATH=$PATH:$HOME/bin')
print('echo "## Written by geos-chem-schedule " >> $HOME/.bashrc')
print('echo "export PATH=\$PATH:\$HOME/bin" >> $HOME/.bashrc')
# Source the bashrc
print("source $HOME/.bashrc")
print("\n")
sys.exit()

The current installation process for this package is quite invasive (see above). A rework to follow a simple package structure (see Packaging Python Projects) would go a long way to smoothing this out.

Update README to reflect current state of project

The README appears to be written for running this on earth0, so doesn't make for easy reading when thinking about how to use this on Viking, e.g.

The script has a UI to chose job name, queue name, priority, if you want to start the jobs outside of work hours, and if you want to have the script submit the job to the queue.

Remove lines of scheduler info at end of geos.log file?

Example of lines currently output geos.log pasted below. To aid reading of GEOS-Chem information, could theses lines be output to a separate file?

Also, the job submission template should be updated so that the the command not found SLURM messages are not presented on reading BASH variables strings.

**************   E N D   O F   G E O S -- C H E M   **************
/var/spool/slurmdspool/job8817675/slurm_script: line 123: last_line: command not found
/var/spool/slurmdspool/job8817675/slurm_script: line 124: complete_last_line: command not found
Submitted batch job 8856961

============================
 Job utilisation efficiency
============================

Job ID: 8817675
Cluster: viking
User/Group: ts551/clusterusers
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 20
CPU Utilized: 27-00:30:01
CPU Efficiency: 98.91% of 27-07:40:00 core-walltime
Job Wall-clock time: 1-08:47:00
Memory Utilized: 8.82 GB
Memory Efficiency: 22.04% of 40.00 GB
 Requested wall clock time: 2-00:00:00
    Actual wall clock time: 1-08:47:00
Wall clock time efficiency: 68.3%
           Job queued time: 00:00:01

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.