Cron services
Cheyenne crontab
archives
Users who previously had cron
entries on Cheyenne can locate their old crontab
files under /glade/u/ssg/ch/archived_crontabs/
Occasionally users may want to automate a common recurring task. Typical use cases are to initiate batch jobs, transfer input data from an external site, or run some automated testing. The UNIX cron
service allows users to schedule scripts to be run based on a recurrence rule. As of December 2023 we have deployed a high-availability cron
service independent of the individual HPC systems to support these workflows. This separated, HA solution allows us to perform maintenance on the HPC resources while not interrupting cron
workflows that can tolerate the downtime.
Logging in¶
Once you have an HPC systems account, you can log in to cron.hpc.ucar.edu
via ssh
command:
cron
server. Cron server IP address¶
For certain automation workflows external sites may need to "allow" access from NCAR's systems based on IP address.
cron.hpc.ucar.edu
:128.117.211.234
If you are performing automated connections to remote sites and encounter access issues, it may be necessary to work with the remote site's administrators to add this IP address to their trusted connections configuration (details are site- and process-specific, work with your remote site support team).
Appropriate usage, user environment, and resource restrictions¶
The primary use case for this resources is to initiate routine, scheduled work that is primarily performed elsewhere, such as in the HPC batch environment on either Derecho or Casper. As a result, the user software environment is intentionally sparse, and each user is placed into a control group limited to 1GB of system memory to protect system resource utilization. The typical GLADE file systems are accessible, however there is no default software environment provided.
Typical usage of this cron
resource is:
- Interacting with PBS,
- Performing small, automated file processing activities, and
- Connecting to the HPC systems directly through
ssh
to perform additional processing tasks.
Accessing PBS commands¶
The typical PBS commands `qsub
, qstat
, etc... are available by default, and users can access both Derecho and Casper PBS queues from the cron
system provided that PBS server names are appended to the usual queue specifications (similar to the usual PBS cross-submission described here):
Command-line specification of a Derecho queue:
PBS Script specification of a Derecho queue:Connecting to other NCAR HPC resources¶
The cron
servers are trusted by other HPC resources, allowing users to ssh
to other systems without additional two-factor authentication. A common workflow then is for a small, lightweight script to be initiated on the cron
servers which in turn runs additional commands on Derecho or Casper.
Installing and editing crontab
entries¶
To schedule a process with cron
, a user must establish a crontab
entry. This can be done interactively by running the command crontab -e
to edit your crontab directly, or by creating a file and "installing" it with crontab <filename>
. Additionally, you can list your current crontab entries via crontab -l
. (See man crontab
for more details.)
In either case, the crontab
entry has a very particular, fixed format.
crontab
syntax¶
# ┌───────────── minute (0–59)
# │ ┌───────────── hour (0–23)
# │ │ ┌───────────── day of the month (1–31)
# │ │ │ ┌───────────── month (1–12)
# │ │ │ │ ┌───────────── day of the week (0–6) (Sunday to Saturday);
# │ │ │ │ │
# │ │ │ │ │
# │ │ │ │ │
* * * * * <command to execute>
# run every 15 minutes:
*/15 * * * * <my rapid command>
# run every night at 23:04 (11:04 PM):
4 23 * * * <my daily command>
# 09:23 on every day-of-week from Monday through Friday.
23 9 * * 1-5 <my weekday commands>
# the first day of every-other month
0 0 1 */2 * <my infrequent command>
crontab
commands¶
Keep your crontab
commands as simple as possible, and do not make any assumptions regarding the execution environment (initial working directories, environment variables, etc...). We also recommend redirecting script output to aid in monitoring and debugging. The command can be a short sequence of commands chained together with the shell operator &&
if desired, for example:
# run every night at 23:04 (11:04 PM):
4 23 * * * cd /glade/work/<username>/my_cron_stuff/ && ./run_nightly.sh &>> ./run_nightly.log
run_nightly.sh
from within the directory /glade/work/<username>/my_cron_stuff/
, appending both standard output and standard error into the file run_nightly.log
. Best practices for cron jobs¶
Locking for exclusive execution¶
Many shell scripts are not designed to be run concurrently, and even for those that are, concurrent execution is often not the users' intent. In such scenarios the user should make provisions so that only one occurrence of the script is running at a time.
File locking is a convenient mechanism to implement this behavior, whereby a process will only run after it has gained exclusive access to a resource - in this case a "lock file". This approach is particularly useful under cron
- if a particular instance of a script is running slow, cron
could re-launch the same script potentially many times. File locking prevents such script "pile-up."
The utility lockfile
can be incorporated into shell scripts as follows:
LOCK="${HOME}/.my_cron_job.lock"
remove_lock()
{
rm -f "${LOCK}"
}
another_instance()
{
echo "Cannot acquire lock on ${LOCK}"
echo "There is another instance running, exiting"
exit 1
}
lockfile -r 5 -l 3600 "${LOCK}" || another_instance
trap remove_lock EXIT
We declare two utility functions: remove_lock
and another_instance
. The command lockfile
will attempt to gain exclusive access to our ${LOCK}
file, retrying 5 times (-r 5
). If we cannot acquire the desired lock, another_instance
prints some information and exits the script. Note that when the script exits - cleanly or not - we want to remove our lock file. We use the bash trap
mechanism to accomplish this by calling remove_lock
at EXIT
.
In this case we forcibly remove any "stale" lock files when more than an hour old (3,600 seconds, -l 3600
) as defensive measure. This would be appropriate for a short running script, when we can assume any leftover lock files beyond some threshold age are invalid. See man lockfile
for additional options.
Pedantic error checking¶
It is always a good idea to perform error checking inside shell scripts, but especially so when running under cron. For example, when changing directories inside a script:
This will abort the job if thecd
fails. (The || exit 1
construct is executed only if the first command exits with a failure status.) Without this type of pedantic error checking the script would continue to run, and the remainder of the script would attempt to run, but in the wrong directory - especially dangerous if later steps remove files!! Logging¶
In addition to any typical logging of expected output from your automated processes, it is beneficial to capture some information from the system as well.
For example,
timestamp="$(date +%F@%H:%M)"
cron_logdir="${HOME}/.my_cron_logs/"
scriptdir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
mkdir -p ${cron_logdir} || exit 1
echo "[${timestamp}]: Running ${0} on $(hostname) from $(pwd); scriptdir=${scriptdir}" \
| tee -a ${cron_logdir}/particular_tool.log
cron
stops working for you and you have forgotten where your cron
scripts were located. Sample Cron script and crontab
processes¶
Sample Cron script and crontab
installation processes
Our cron_driver.sh
script will submit two PBS batch jobs: one for prepossessing to run on Casper, and another model execution to run on Derecho.
#!/bin/bash -l
#--------------------------------------------------------------
LOCK="${HOME}/.my_cron_job.lock"
remove_lock()
{
rm -f "${LOCK}"
}
another_instance()
{
echo "Cannot acquire lock on ${LOCK}"
echo "There is another instance running, exiting"
exit 1
}
# acquire an exclusive lock on our ${LOCK} file to make sure
# only one copy of this script is running at a time.
lockfile -r 5 -l 3600 "${LOCK}" || another_instance
trap remove_lock EXIT
#--------------------------------------------------------------
# logging: Print status to standard out, and redirect also to our
# specified cron_logdir.
timestamp="$(date +%F@%H:%M)"
cron_logdir="${HOME}/.my_cron_logs/"
scriptdir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
mkdir -p ${cron_logdir} || exit 1
echo -e "[${timestamp}]: Running ${0} on $(hostname)\n\tfrom $(pwd)\n\tscriptdir=${scriptdir}" \
| tee -a ${cron_logdir}/sample_job.log
#--------------------------------------------------------------
# go to desired directory, exit on failure:
cd /glade/work/${USER}/my_cron_job/ || { echo "cannot cd to the desired directory!!"; exit 1; }
# clean up any old INPUT_DATA
rm -f ./INPUT_DATA
# lauch preprocessing job, capturing its job ID
PREREQ=$(qsub -q casper@casper-pbs ./prep_job.pbs) || { echo "cannot connect to Casper PBS"; exit 1; }
echo ${PREREQ}
# lauch model job
qsub -q main@desched1 -W depend=afterok:${PREREQ} ./run_model.pbs || { echo "cannot connect to Derecho PBS"; exit 1; }
prep_job.pbs
) will create a file called INPUT_DATA
which is used by the second job (run_model.pbs
). Because the second job depends on the first, we submit the second using a PBS job dependency. We also remove the INPUT_DATA
at the beginning of the script - since it should be created by prep_job.pbs
, we want to make sure that step functioned as intended. In this way if INPUT_DATA
exists when we execute run_model.pbs
it can only be because the preceding step ran successfully - and our INPUT_DATA
is not stale.
Sample crontab
entries
First we prepare a simple text file that contains the entries for all our desired cron processes:
# run my frequent cron job every 15 minutes
*/15 * * * * cd /glade/u/home/benkirk/my_cron/ && mkdir -p logs && ./frequent.sh &>> logs/frequent.log
# run my hourly cron jobs at 10 minutes past every hour
10 * * * * cd /glade/u/home/benkirk/my_cron/ && mkdir -p logs && ./hourly.sh &>> logs/hourly.log
# run my daily cron jobs at 00:45 am.
45 0 * * * cd /glade/u/home/benkirk/my_cron/ && mkdir -p logs && ./daily.sh &>> logs/daily.log
# run my daily cron jobs at 01:30 am.
30 1 * * * cd /glade/work/benkirk/my_cron_job/ && ./cron_driver.sh &>> cron.log
Installing crontab
entries
We can then install, inspect, and edit our entries with the crontab
command.
crontab
entries # Inspect installed crontab:
cron$ crontab -l
# run my frequent cron job every 15 minutes
*/15 * * * * cd /glade/u/home/benkirk/my_cron/ && mkdir -p logs && ./frequent.sh &>> logs/frequent.log
# run my hourly cron jobs at 10 minutes past every hour
10 * * * * cd /glade/u/home/benkirk/my_cron/ && mkdir -p logs && ./hourly.sh &>> logs/hourly.log
# run my daily cron jobs at 00:45 am.
45 0 * * * cd /glade/u/home/benkirk/my_cron/ && mkdir -p logs && ./daily.sh &>> logs/daily.log
Editing crontab
entries