Sample Containerized Workflows¶

Warning

This page contains a sample of containerized workflows that demonstrate various techniques built up in practice, often from resolving user issues. We do not necessarily endorse or support each use case, rather these examples are provided in hopes they may be useful to demonstrate (i) sample containerized workflows, and (ii) solutions to various problems you may encounter.

NVIDIA's NGC containers¶

NVIDIA's NGC is a catalog of software optimized for GPUs. NGC containers allow you to run data science projects "out of the box" without installing, configuring, or integrating the infrastructure.

NVIDIA's Modulus physics-ML framework¶

NVIDIA Modulus is an open source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods. NVIDIA provides a frequently updated Docker image with a containerized PyTorch installation that can be run under Apptainer, albeit with some effort. Because the container is designed for Docker, some additional steps are required as discussed below.

Running containerized NVIDIA-Modulus on a single Casper GPU

Rather than pull the container and run as-is, we will create a derived container that allows us to encapsulate our desired changes. The primary reason for this is the Modulus container assumes the container is writable and makes changes during execution. Since we will run under Apptainer using a compressed, read-only image, this fails. Therefore we will make our own derived image and make the requisite changes during the build process.

This is accomplished first by creating a simple Apptainer definition file:
my_modulus.def
```
Bootstrap: docker
From: nvcr.io/nvidia/modulus/modulus:23.09

%post
    # update pip
    python -m pip install --upgrade pip

    # use pip to install additional packages needed for examples later
    pip install warp-lang

    # Remove cuda compat layer (https://github.com/NVIDIA/nvidia-docker/issues/1256)
    # note that the source container attempts to do this at run-time, but that will
    # fail when launched read-only.  So we do that here instead.
    # (This issue will likely be resolved with newer versions of nvidia-modulus)
    rm -rf /usr/local/cuda/compat/lib
```
The definition file begins by pulling a specified version of the Modulus container, then modifying it in our %post step. In %post we update the pip Python package installer, use pip to install some additional Python packages not in the base image but required for the examples run later, and finally removes a conflicting path from the source image.

Using the my_modulus.def file we now create our derived container and store it as a SIF:
```
module load apptainer
TMPDIR=/var/tmp/ singularity build my_modulus.sif my_modulus.def
```
Note in this step we have explicitly set TMPDIR to a local file system, as occasionally containers fail to build on the large parallel file systems usually used for TMPDIR within NCAR. (The failure symptoms are usually fatal error messages related to xattrs.)
Fetch some examples so we can test our installation:
```
git clone https://github.com/NVIDIA/modulus.git
```
Run the container in an interactive session on a single Casper GPU. We will launch an interactive session, then run the container interactively with the singularity shell command.
```
# Interactive PBS submission from a login node:
qsub -I -A <ACCOUNT> -q casper -l select=1:ncpus=4:mpiprocs=4:ngpus=1 -l gpu_type=v100 -l walltime=1:00:00

# Then on the GPU node:
module load apptainer
singularity shell \
            --nv --cleanenv \
            --bind /glade/work \
            --bind /glade/campaign \
            --bind /glade/derecho/scratch \
            ./my_modulus.sif
```
Note the command line arguments to singularity shell:
- --nv: enable Nvidia support,
- --cleanenv: clean environment before running container, causing the container to be launched with no knowledge of environment variables set on the host. This is default behavior for Docker, and is required in this case to prevent conflicting CUDA_* and other environment variable settings from confusing the contanierized PyTorch.
- --bind /glade/work etc...: binds host file systems into the container, allowing us to read and write from GLADE.

Now we are inside the container, as evidenced by the Apptainer> command line prompt in the final step of this example. We will run one of the sample problems checked out in step 3:

Apptainer> cd modulus/examples/cfd/darcy_fno/
Apptainer> python ./train_fno_darcy.py
Warp 0.10.1 initialized:
   CUDA Toolkit: 11.5, Driver: 12.3
   Devices:
     "cpu"    | x86_64
     "cuda:0" | Tesla V100-SXM2-32GB (sm_70)
   Kernel cache: /glade/u/home/benkirk/.cache/warp/0.10.1
[21:04:13 - mlflow - WARNING] Checking MLFlow logging location is working (if this hangs its not)
[21:04:13 - mlflow - INFO] MLFlow logging location is working
[21:04:13 - mlflow - INFO] No Darcy_FNO experiment found, creating...
[21:04:13 - checkpoint - WARNING] Provided checkpoint directory ./checkpoints does not exist, skipping load
[21:04:13 - darcy_fno - WARNING] Model FourierNeuralOperator does not support AMP on GPUs, turning off
[21:04:13 - darcy_fno - WARNING] Model FourierNeuralOperator does not support AMP on GPUs, turning off
[21:04:13 - darcy_fno - INFO] Training started...
Module modulus.datapipes.benchmarks.kernels.initialization load on device 'cuda:0' took 205.84 ms
Module modulus.datapipes.benchmarks.kernels.utils load on device 'cuda:0' took 212.94 ms
Module modulus.datapipes.benchmarks.kernels.finite_difference load on device 'cuda:0' took 670.44 ms
[21:04:46 - train - INFO] Epoch 1 Metrics: Learning Rate =  1.000e-03, loss =  6.553e-01
[21:04:46 - train - INFO] Epoch Execution Time:  3.241e+01s, Time/Iter:  1.013e+03ms
[21:05:14 - train - INFO] Epoch 2 Metrics: Learning Rate =  1.000e-03, loss =  4.255e-02
[21:05:14 - train - INFO] Epoch Execution Time:  2.812e+01s, Time/Iter:  8.786e+02ms
[...]

While this example demonstrated running the container interactively, alternatively steps 3 and 4 can be combined to be run inside a PBS batch job.

Popular AI/ML tools¶

Optimized Tensorflow and PyTorch models are available directly from the NGC.

Running AI/ML tools from NGC containers

Building an image with Apptainer

Anticipating that we may want to make additions to the container, we will build our own derived Apptainer image using a Definition file.

TensorflowPyTorch

ngc_tensorflow.def

Bootstrap: docker
From: nvcr.io/nvidia/tensorflow:23.11-tf2-py3

%post
    # update pip
    python -m pip install --upgrade pip

[...]

module load apptainer
TMPDIR=/var/tmp/ singularity build my_image.sif ngc_tensorflow.def

ngc_pytorch.def

Bootstrap: docker
From: nvcr.io/nvidia/pytorch:23.11-py3

%post
    # update pip
    python -m pip install --upgrade pip

[...]

module load apptainer
TMPDIR=/var/tmp/ singularity build my_image.sif ngc_pytorch.def

Run the image

module load apptainer
singularity shell \
            --nv --cleanenv \
            --bind /glade/work \
            --bind /glade/campaign \
            --bind /glade/derecho/scratch \
            ./my_image.sif
[...]
Apptainer>

We are now inside the container. Note the command line arguments to singularity shell: --nv --cleanenv enables NVIDIA support with a clean environment; --bind /glade/work etc...: binds host file systems into the container, allowing us to read and write from GLADE.

Building and running containerized WRF under MPI¶

Warning

While we strongly encourage users to keep up with the latest WRF releases, we recognize some users may have customized older versions WRF for particular purposes and porting these changes can pose a significant burden.

In such cases containerization offers a viable (if unpalatable) option for running old code that may be difficult or impossible to compile unchanged on Derecho.

This example demonstrates building general purpose containers to facilitate compiling old versions of WRF.

Containerization approach¶

The container is built off-premises with docker from three related images, each providing a foundation for the next. We begin with an

OpenSUSE version 15 operating system (chosen to maximize Derecho interoperability) with a number of WRF dependencies installed,
then add relevant compilers, MPI, and NetCDF,
then compile various versions of WRF and WPS to demonstrate functionality.

The full set of Dockerfiles and associated resources can be found on GitHub.

The base layer¶

The OpenSUSE 15 base layer

The base layer is the common foundation of components independent of the compiler suite ultimately used for WRF/WPS. It includes the operating system image along with relevant packages installed from package repositories.

NCAR/derecho/WRF/base/Dockerfile

ARG ARCH=amd64
FROM --platform=linux/${ARCH} docker.io/opensuse/leap

ADD extras/docker-clean /usr/bin/docker-clean

RUN echo "basic zypper setup" \
    && set -x \
    && zypper -n refresh \
    && zypper -n update \
    && mkdir -p /container \
    && zypper -n install \
              curl wget which \
              gcc gcc-c++ gcc-fortran gmake git \
              patch gzip unzip xz bzip2 tar perl openssl python3 ncurses-devel \
              libtool automake autoconf \
              man \
    && docker-clean

RUN  echo "zypper install of additional WRF build dependencies" \
    && zypper -n install \
              emacs-nox vim-small \
              tcsh time file hostname perl \
              flex byacc \
              zlib zlib-devel \
              hdf5 hdf5-devel \
              file flex \
              libtirpc-devel \
              libpng16-devel \
              rsync \
    && docker-clean

# set up /container/config_env.sh - will append to later
RUN mkdir -p /container && cd /container \
    && touch config_env.sh \
    && cd /etc/profile.d/ \
    && ln -s /container/config_env.sh ./z00-build-env.sh \
    && for comp in gcc g++ gfortran; do ${comp} --version ; done \
    && docker-clean

# Modify OS to match old WRF source code expectations...
# this last one is quite silly, but works.  WRF expects a static libflex.a,
# we only have a shared, but that's entirely adequate
RUN echo "Hacking OS layout to comply with WRF/WPS expectations..." \
    && cd /usr/include/  && ln -s libpng16/*.h . \
    && cd /usr/lib64/    && ln -s libpng{16,}.so \
    && cd /usr/lib64/    && ln -s libfl.{so,a}

RUN echo "OS/zypper jasper too new for WPSV3, so..." \
    && cd /tmp && wget -q https://sourceforge.net/projects/jpeg/files/jasper/jasper-1.900.1/jasper-1.900.1.zip/download \
    && unzip download && cd jasper-1.900.1 \
    && ./configure \
           --prefix=/container/jasper/1.900.1 \
           CC=$(which gcc) CXX=$(which g++) FC=$(which gfortran) F77=$(which gfortran) \
    && make all install \
    && docker-clean

# environment variables useful for WRF build
ENV JASPERINC /container/jasper/1.900.1/include
ENV JASPERLIB /container/jasper/1.900.1/lib
ENV FLEX_LIB_DIR "/usr/lib64"
ENV YACC "/usr/bin/byacc -d"

# archive this Dockerfile in the image
ADD extras/Dockerfile.* /container/

# Local Variables:
# mode: sh
# End:

Dockerfile Steps

The image begins with a minimal OpenSUSE v15 image, and adds a utility script docker-clean copied from the build host.
The first RUN instruction updates the OS image, creates the /container workspace, and installs a minimal development environment (compilers, file utilities etc...).
The second RUN instruction installs specific packages required by the WRF & WPS build system and build dependencies. We elect here to install HDF5 from the OpenSUSE package repository as a matter of convenience - it is required later to build NetCDF from source, and the packaged version is entirely adequate for that purpose. (Should the user want advanced capabilities within HDF5 it may be necessary instead to compile HDF5 from source.)
The remaining RUN instructions modify the image to expose our customization through environment configuration through a "source-able" file (/container/config_env.sh) and also to comply with the (occasionally overly restrictive) expectations of the WRF build system. For example, the base OS image supplies the PNG library as -L/usr/lib64 -libpng16, whereas WRF expects -L/usr/lib64 -libpng. We also install an old version of the Jasper library from source using the system gcc compiler. While a newer version of Jasper is available directly from the OpenSUSE package repository, this version is too new for certain older WPS releases.
Finally, we set several environment variables used by the WRF/WPS build systems later on through ENV instructions.

Discussion

Notice that each RUN step is finalized with a docker-clean command. This utility script removes temporary files and cached data to minimize the size of the resulting image layers. One consequence is that the first zypper package manager interaction in a RUN statement will re-cache these data. Since cached data are not relevant in the final image - especially when run much later on - we recommend removing it to reduce image bloat.
We choose /container as the base path for all 3^rd-party and compiled software so that when running containers from this image it is obvious what files come from the image vs. the host file system.
We generally choose to add the search paths for compiled libraries to the "system" (container) default search paths rather than rely on LD_LIBRARY_PATH. Since we are installing single versions of the necessary libraries this approach is viable, and makes the resulting development environment less fragile.

The compiler + dependencies layer¶

Next we will extend the base layer to include 3^rd-party compilers, a particular MPI (configured for maximum compatibility with Derecho), and NetCDF. We proceed on three parallel paths:

Installing an old version the Intel "classic" compilers compatible with WRF versions 3 and 4;
Using the OpenSUSE-provided gcc version 7.5.0; and
Installing a recent version of the nvhpc compilers, which provide the legacy Portland Group pgf90 compiler supported by WRF/WPS.

Testing has shown the Intel variant of the following recipes to be the most performant, while the gcc version results in the smallest container image. Our intent here with showing all three options is primarily educational, and may provide solutions to issues encountered in related workflows.

Adding compilers, MPI, and dependencies

IntelGCCNVHPC

Note

Note that the Intel compiler is licensed software and usage is subject to terms of the End User License Agreements (EULA). As indicated in the Dockerfile below, usage of this software is contingent on accepting the terms of the license agreement.

NCAR/derecho/WRF/intel-build-environment

ARG ARCH=amd64
FROM --platform=linux/${ARCH} benjaminkirk/ncar-derecho-wrf-base:latest

# Download URLs for installers inferred from spack:
# https://github.com/spack/spack/blob/develop/var/spack/repos/builtin/packages/intel-oneapi-compilers/package.py
RUN mkdir -p /tmp/intel_installers && cd /tmp/intel_installers \
    && wget -q https://registrationcenter-download.intel.com/akdlm/irc_nas/19049/l_dpcpp-cpp-compiler_p_2022.2.1.16991_offline.sh \
    && wget -q https://registrationcenter-download.intel.com/akdlm/irc_nas/18998/l_fortran-compiler_p_2022.2.1.16992_offline.sh \
    && chmod +x l_*compiler*.sh \
    && ./l_dpcpp-cpp-compiler_p_2022.2.1.16991_offline.sh -a -s --eula accept --install-dir /container/intel --ignore-errors \
    && ./l_fortran-compiler_p_2022.2.1.16992_offline.sh -a -s --eula accept --install-dir /container/intel --ignore-errors \
    && cd /container/intel \
    && rm_paths="mpi debugger tbb conda_channel compiler/2022.2.1/linux/lib/oclfpga" \
    && echo "Removing extra bloat: ${rm_paths}" > README.whered_stuff_go && cat README.whered_stuff_go \
    && rm -rf ${rm_paths} /tmp/intel_installers /var/intel \
    && echo -e "\n# Intel compilers libraries" >> /container/config_env.sh \
    && echo "source /container/intel/setvars.sh >/dev/null 2>&1" >> /container/config_env.sh \
    && docker-clean

# Intel compilers do not -rpath their own libraries, relying instead on LD_LIBRARY_PATH
# We'd rather not need that, so add to the system search path
RUN echo "/container/intel/compiler/2022.2.1/linux/lib"     >> /etc/ld.so.conf.d/intel.conf \
    && echo "/container/intel/compiler/2022.2.1/linux/lib/x64" >> /etc/ld.so.conf.d/intel.conf \
    && echo "/container/intel/compiler/2022.2.1/linux/compiler/lib/intel64_lin" >> /etc/ld.so.conf.d/intel.conf \
    && ldconfig --verbose \
    && echo "silence classic compilers" \
    && echo "-diag-disable=10441" >> /container/intel/compiler/2022.2.1/linux/bin/intel64/icc.cfg \
    && echo "-diag-disable=10441" >> /container/intel/compiler/2022.2.1/linux/bin/intel64/icpc.cfg \
    && echo "-diag-disable=10441" >> /container/intel/compiler/2022.2.1/linux/bin/intel64/ifort.cfg

ENV MPICH_VERSION=3.4.3
ENV NETCDF_C_VERSION=4.9.2
ENV NETCDF_FORTRAN_VERSION=4.6.1

# install desired NETCDF libraries.  we put these into the library search path via
# ldconfig so that they can be found within the final contaner image irrespective of LD_LIBRARY_PATH
RUN mkdir -p /container/netcdf/BUILD_DIR \
    && source /container/config_env.sh \
    && cd /container/netcdf/BUILD_DIR \
    && curl -sSL https://github.com/Unidata/netcdf-c/archive/v${NETCDF_C_VERSION}.tar.gz | tar zx \
    && curl -sSL https://github.com/Unidata/netcdf-fortran/archive/v${NETCDF_FORTRAN_VERSION}.tar.gz | tar zx \
    && cd netcdf-c-${NETCDF_C_VERSION} \
    && ./configure \
           CXX=$(which icpc) CC=$(which icc) FC=$(which ifort) F77=$(which ifort) \
           --prefix=/container/netcdf \
           --disable-libxml2 \
           --disable-dap \
           --disable-byterange \
           --disable-dependency-tracking \
    && make -j 4 && make install \
    && echo -e "\n# NETCDF libraries" >> /container/config_env.sh && echo "PATH=/container/netcdf/bin:\${PATH}" >> /container/config_env.sh \
    && echo "LD_LIBRARY_PATH=/container/netcdf/lib:\${LD_LIBRARY_PATH}" >> /container/config_env.sh \
    && echo "/container/netcdf/lib" >> /etc/ld.so.conf.d/netcdf.conf \
    && ldconfig --verbose \
    && cd /container/netcdf/BUILD_DIR \
    && cd netcdf-fortran-${NETCDF_FORTRAN_VERSION}/ \
    && ./configure \
           CXX=$(which icpc) CC=$(which icc) FC=$(which ifort) F77=$(which ifort) \
           CPPFLAGS="-I/container/netcdf/include" \
           LDFLAGS="-L/container/netcdf/lib" \
           --prefix=/container/netcdf \
           --disable-dap \
           --disable-dependency-tracking \
    && make -j 4 && make install \
    && ldconfig --verbose \
    && rm -rf /container/netcdf/BUILD_DIR \
    && docker-clean

# install desired MPI library. we DO NOT put this into the library search path,
# because later we will override this MPI at runtime via host MPI injection and
# associated LD_LIBRARY_PATH manipulations
RUN mkdir -p /container/mpich/BUILD_DIR \
    && cd /container/mpich/BUILD_DIR \
    && source /container/config_env.sh \
    && curl -sSL http://www.mpich.org/static/downloads/${MPICH_VERSION}/mpich-${MPICH_VERSION}.tar.gz | tar zx \
    && cd mpich-${MPICH_VERSION} \
    && ./configure CXX=$(which icpc) CC=$(which icc) FC=$(which ifort) F77=$(which ifort) \
                   --prefix=/container/mpich \
                   --with-wrapper-dl-type=none \
                   --with-device=ch4:ofi \
                   --disable-dependency-tracking \
                   --enable-fortran |& tee /container/mpich/configure-out.log \
    && make -j 8 install \
    && echo -e "\n# MPICH v${MPICH_VERSION} paths" >> /container/config_env.sh \
    && echo "export PATH=/container/mpich/bin:\${PATH}" >> /container/config_env.sh \
    && echo "export LD_LIBRARY_PATH=/container/mpich/lib:\${LD_LIBRARY_PATH}" >> /container/config_env.sh \
    && rm -rf /container/mpich/BUILD_DIR/ \
    && docker-clean

# We can't get mpicxx without -lmpicxx for mpich-v3.x, yet the -lmpicxx bindings are worthless
# and do not exist on our Cray-EX cray-mpich.  So at minimum, prefent them from being linked,
# since we would not then be able to override them. (sigh...)
RUN sed -i 's/cxxlibs="-lmpicxx"/cxxlibs= #"-lmpicxx"/g' /container/mpich/bin/mpicxx

# environment variables useful for WRF build
#ENV HDF5 /container/hdf5
ENV NETCDF /container/netcdf

# Ok, all done - let's indicate the environment
RUN echo -e "\nPS1=\"WRF-intel-dev> \"" >> /container/config_env.sh

# archive this Dockerfile in the image
ADD extras/Dockerfile.* /container/

# Local Variables:
# mode: sh
# End:

Note that in the first RUN instruction we install the Intel C/C++ and Fortran compilers, and also clean up the installation tree by removing unnecessary components (lines 13-15). By combining these steps into a single RUN instruction we reduce the size of the "layer" produced, and the resulting image size. We can remove these components because they are not required to support WRF/WPS later. This step may need to be adapted if used to support other codes.

NCAR/derecho/WRF/gcc-build-environment

ARG ARCH=amd64
FROM --platform=linux/${ARCH} benjaminkirk/ncar-derecho-wrf-base:latest

ENV MPICH_VERSION=3.4.3
ENV NETCDF_C_VERSION=4.9.2
ENV NETCDF_FORTRAN_VERSION=4.6.1

# install desired NETCDF libraries.  we put these into the library search path via
# ldconfig so that they can be found within the final contaner image irrespective of LD_LIBRARY_PATH
RUN mkdir -p /container/netcdf/BUILD_DIR \
    && source /container/config_env.sh \
    && cd /container/netcdf/BUILD_DIR \
    && curl -sSL https://github.com/Unidata/netcdf-c/archive/v${NETCDF_C_VERSION}.tar.gz | tar zx \
    && curl -sSL https://github.com/Unidata/netcdf-fortran/archive/v${NETCDF_FORTRAN_VERSION}.tar.gz | tar zx \
    && cd netcdf-c-${NETCDF_C_VERSION} \
    && ./configure \
           CXX=$(which g++) CC=$(which gcc) FC=$(which gfortran) F77=$(which gfortran) \
           --prefix=/container/netcdf \
           --disable-libxml2 \
           --disable-dap \
           --disable-byterange \
           --disable-dependency-tracking \
    && make -j 4 && make install \
    && echo -e "\n# NETCDF libraries" >> /container/config_env.sh && echo "PATH=/container/netcdf/bin:\${PATH}" >> /container/config_env.sh \
    && echo "/container/netcdf/lib" >> /etc/ld.so.conf.d/netcdf.conf \
    && ldconfig --verbose \
    && cd /container/netcdf/BUILD_DIR \
    && cd netcdf-fortran-${NETCDF_FORTRAN_VERSION}/ \
    && ./configure \
           CXX=$(which g++) CC=$(which gcc) FC=$(which gfortran) F77=$(which gfortran) \
           CPPFLAGS="-I/container/netcdf/include" \
           LDFLAGS="-L/container/netcdf/lib" \
           --prefix=/container/netcdf \
           --disable-dap \
           --disable-dependency-tracking \
    && make -j 4 && make install \
    && ldconfig --verbose \
    && rm -rf /container/netcdf/BUILD_DIR \
    && docker-clean

# install desired MPI library. we DO NOT put this into the library search path,
# because later we will override this MPI at runtime via host MPI injection and
# associated LD_LIBRARY_PATH manipulations
RUN mkdir -p /container/mpich/BUILD_DIR \
    && cd /container/mpich/BUILD_DIR \
    && source /container/config_env.sh \
    && curl -sSL http://www.mpich.org/static/downloads/${MPICH_VERSION}/mpich-${MPICH_VERSION}.tar.gz | tar zx \
    && cd mpich-${MPICH_VERSION} \
    && ./configure CXX=$(which g++) CC=$(which gcc) FC=$(which gfortran) F77=$(which gfortran) \
                   --prefix=/container/mpich \
                   --with-wrapper-dl-type=none \
                   --with-device=ch4:ofi \
                   --disable-dependency-tracking \
                   --enable-fortran |& tee /container/mpich/configure-out.log \
    && make -j 8 install \
    && echo -e "\n# MPICH v${MPICH_VERSION} paths" >> /container/config_env.sh \
    && echo "export PATH=/container/mpich/bin:\${PATH}" >> /container/config_env.sh \
    && echo "export LD_LIBRARY_PATH=/container/mpich/lib:\${LD_LIBRARY_PATH}" >> /container/config_env.sh \
    && rm -rf /container/mpich/BUILD_DIR/ \
    && docker-clean

# We can't get mpicxx without -lmpicxx for mpich-v3.x, yet the -lmpicxx bindings are worthless
# and do not exist on our Cray-EX cray-mpich.  So at minimum, prefent them from being linked,
# since we would not then be able to override them. (sigh...)
RUN sed -i 's/cxxlibs="-lmpicxx"/cxxlibs= #"-lmpicxx"/g' /container/mpich/bin/mpicxx

# environment variables useful for WRF build
ENV NETCDF /container/netcdf

# Ok, all done - let's indicate the environment
RUN echo -e "\nPS1=\"WRF-gcc-dev> \"" >> /container/config_env.sh

# archive this Dockerfile in the image
ADD extras/Dockerfile.* /container/

# Local Variables:
# mode: sh
# End:

For demonstration purposes only

Testing has shown the nvhpc variant does not offer any performance benefits vs. the Intel or GCC variants, and therefore is not recommended for production runs. It is provided for demonstration purposes only, and in hopes it may prove useful for other projects that have a critical dependency on this particular compiler suite.

Note

Note that the NVHPC compiler is licensed software and usage is subject to terms of the End User License Agreements (EULA).

NCAR/derecho/WRF/nvhpc-build-environment

ARG ARCH=amd64
FROM --platform=linux/${ARCH} benjaminkirk/ncar-derecho-wrf-base:latest

ENV NVHPC_VERSION=23.9
RUN cd /tmp \
    && echo "Downloading NVHPC v${NVHPC_VERSION}" \
    && curl -sSL https://developer.download.nvidia.com/hpc-sdk/23.9/nvhpc_2023_239_Linux_x86_64_cuda_12.2.tar.gz | tar zx \
    && echo "Installing NVHPC v${NVHPC_VERSION}" \
    && NVHPC_SILENT="true" \
       NVHPC_INSTALL_DIR="/container/nvidia/hpc_sdk" \
       NVHPC_INSTALL_TYPE="single" \
       ./nvhpc_2023_239_Linux_x86_64_cuda_12.2/install \
    && cd /container \
    && echo -e "\n# NVHPC version ${NVHPC_VERSION}" >> config_env.sh \
    && echo "export NVARCH=`uname -s`_`uname -m`" >> config_env.sh \
    && echo "export NVCOMPILERS=/container/nvidia/hpc_sdk" >> config_env.sh \
    && echo "export MANPATH=\$MANPATH:\$NVCOMPILERS/\$NVARCH/${NVHPC_VERSION}/compilers/man" >> config_env.sh \
    && echo "export PATH=\$NVCOMPILERS/\$NVARCH/${NVHPC_VERSION}/compilers/bin:\$PATH" >> config_env.sh \
    && cd /container/nvidia/hpc_sdk/Linux_x86_64/${NVHPC_VERSION} \
    && rm_paths="comm_libs cuda math_libs profilers" \
    && echo "Removing extra bloat: ${rm_paths}" > README.whered_stuff_go && cat README.whered_stuff_go \
    && echo "/container/nvidia/hpc_sdk/Linux_x86_64/${NVHPC_VERSION}/compilers/lib" >> /etc/ld.so.conf.d/nvhpc.conf \
    && ldconfig --verbose \
    && rm -rf ${rm_paths} \
    && docker-clean

ENV MPICH_VERSION=3.4.3
ENV NETCDF_C_VERSION=4.9.2
ENV NETCDF_FORTRAN_VERSION=4.6.1

# install desired HDF5 & NETCDF libraries.  we put these into the library search path via
# ldconfig so that they can be found within the final contaner image irrespective of LD_LIBRARY_PATH
RUN mkdir -p /container/netcdf/BUILD_DIR \
    && source /container/config_env.sh \
    && cd /container/netcdf/BUILD_DIR \
    && curl -sSL https://github.com/Unidata/netcdf-c/archive/v${NETCDF_C_VERSION}.tar.gz | tar zx \
    && curl -sSL https://github.com/Unidata/netcdf-fortran/archive/v${NETCDF_FORTRAN_VERSION}.tar.gz | tar zx \
    && cd netcdf-c-${NETCDF_C_VERSION} \
    && ./configure \
           CXX=$(which g++) CC=$(which gcc) FC=$(which nvfortran) F77=$(which nvfortran) \
           CXXFLAGS="-fPIC" CFLAGS="-fPIC" FFLAGS="-fPIC" FCFLAGS="-fPIC" \
           --prefix=/container/netcdf \
           --disable-libxml2 \
           --disable-dap \
           --disable-byterange \
           --disable-dependency-tracking \
    && make -j 4 && make install \
    && echo -e "\n# NETCDF libraries" >> /container/config_env.sh && echo "PATH=/container/netcdf/bin:\${PATH}" >> /container/config_env.sh \
    && echo "/container/netcdf/lib" >> /etc/ld.so.conf.d/netcdf.conf \
    && ldconfig --verbose \
    && cd /container/netcdf/BUILD_DIR \
    && cd netcdf-fortran-${NETCDF_FORTRAN_VERSION}/ \
    && ./configure \
           CXX=$(which g++) CC=$(which gcc) FC=$(which nvfortran) F77=$(which nvfortran) \
           CXXFLAGS="-fPIC" CFLAGS="-fPIC" FFLAGS="-fPIC" FCFLAGS="-fPIC" \
           CPPFLAGS="-I/container/netcdf/include" \
           LDFLAGS="-L/container/netcdf/lib" \
           --prefix=/container/netcdf \
           --disable-dap \
           --disable-dependency-tracking \
    && make -j 4 && make install \
    && ldconfig --verbose \
    && rm -rf /container/netcdf/BUILD_DIR \
    && docker-clean

# install desired MPI library. we DO NOT put this into the library search path,
# because later we will override this MPI at runtime via host MPI injection and
# associated LD_LIBRARY_PATH manipulations
#
# mpich-3.4.3 and nvfortran misunderstand each other regarding real*16, avoid false positive.
# so we take the rather drastic step of modyfying configure.ac to prevent building the f08 interface,
# no matter what configure *thinks* it finds...
RUN mkdir -p /container/mpich/BUILD_DIR \
    && cd /container/mpich/BUILD_DIR \
    && source /container/config_env.sh \
    && curl -sSL http://www.mpich.org/static/downloads/${MPICH_VERSION}/mpich-${MPICH_VERSION}.tar.gz | tar zx \
    && cd mpich-${MPICH_VERSION} \
    && sed -i 's/f08_works=yes/f08_works=no/g' configure.ac \
    && ./autogen.sh \
    && ./configure \
           CXX=$(which g++) CC=$(which gcc) FC=$(which nvfortran) F77=$(which nvfortran) \
           CXXFLAGS="-fPIC" CFLAGS="-fPIC" FFLAGS="-fPIC" FCFLAGS="-fPIC" \
           --prefix=/container/mpich \
           --with-wrapper-dl-type=none \
           --with-device=ch4:ofi \
           --disable-dependency-tracking \
           --enable-fortran |& tee /container/mpich/configure-out.log \
    && make -j 8 install \
    && echo -e "\n# MPICH v${MPICH_VERSION} paths" >> /container/config_env.sh \
    && echo "export PATH=/container/mpich/bin:\${PATH}" >> /container/config_env.sh \
    && echo "export LD_LIBRARY_PATH=/container/mpich/lib:\${LD_LIBRARY_PATH}" >> /container/config_env.sh \
    && rm -rf /container/mpich/BUILD_DIR/ \
    && docker-clean

# We can't get mpicxx without -lmpicxx for mpich-v3.x, yet the -lmpicxx bindings are worthless
# and do not exist on our Cray-EX cray-mpich.  So at minimum, prefent them from being linked,
# since we would not then be able to override them. (sigh...)
RUN sed -i 's/cxxlibs="-lmpicxx"/cxxlibs= #"-lmpicxx"/g' /container/mpich/bin/mpicxx

# environment variables useful for WRF build
#ENV HDF5 /container/hdf5
ENV NETCDF /container/netcdf

# Ok, all done - let's indicate the environment
RUN echo -e "\nPS1=\"WRF-nvhpc-dev> \"" >> /container/config_env.sh

# archive this Dockerfile in the image
ADD extras/Dockerfile.* /container/

# Local Variables:
# mode: sh
# End:

Note that in the first RUN instruction we install the NVHPC C/C++ and Fortran compilers, and also clean up the installation tree by removing unnecessary components (lines 20-21). By combining these steps into a single RUN instruction we reduce the size of the "layer" produced, and the resulting image size. We can remove these components because they are not required to support WRF/WPS later. This step may need to be adapted if used to support other codes.

Dockerfile Steps

We begin by installing the desired compiler suite (Intel & NVHPC cases). For GCC, the compilers already exist from the base layer.
We then install NetCDF using the chosen compilers. We need to provide the Fortran interface to NetCDF, which is why we install from source here using our chosen compiler rather than selecting available versions from the package repository (as was the case with HDF5).
- The options --disable-byterange, --disable-dap, and --disable-libxml2 are specified to prevent NetCDF from requiring additional dependencies (unnecessary for our WRF/WPS use case) we chose not to install in the base layer.
- The option --disable-dependency-tracking is common to all GNU Automake packages and allows us to speed up one-time-only builds by skipping Automake's automated dependency generation.
Finally, we install MPICH using the chosen compilers.

Discussion

Note that MPICH is necessary within the container in order to support building WRF. At runtime we will ultimately "replace" the container's MPICH with cray-mpich on Derecho. For this to work properly, it is important the two implementations be ABI compatible, and that we are able to replace any container MPI shared libraries by their host counterpart. Since cray-mpich is derived from the open-source MPICH-3.x, the version choice and configuration options here are very deliberate, and also this is why we do NOT add this MPI to the system library search path.

The WRF/WPS layer¶

The final step is to used the development environment built up in the previous two steps to compile WRF and WPS. As an exercise to assess the completeness of the environment we choose to install the latest versions of WRF/WPS (4.x) as well as the most recent release of the 3.x series, including both default WRF as well as WRF-Chem compilations. Inside the container we build WRF, WRF-Chem, and WPS according to compiler-version specific recipes (listed below).

Adding WRF & WPS

IntelGCCNVHPC

NCAR/derecho/WRF/intel

ARG ARCH=amd64
ARG BUILD=intel
FROM --platform=linux/${ARCH} benjaminkirk/ncar-derecho-wrf-${BUILD}-build-environment:latest

# clone WRF & WPS, check out only desired versions
ENV WRF3_VERSION 3.9.1.1
ENV WPS3_VERSION 3.9.1
RUN echo "cloning WRF3-v${WRF3_VERSION} & WPS3-v${WPS3_VERSION}" \
    && cd /container && rm -rf WRFV3 WPSV3 \
    && cd /container && git clone --depth 1 --branch V${WRF3_VERSION} https://github.com/NCAR/WRFV3.git    WRFV3 && cd WRFV3 \
    && cd /container && git clone --depth 1 --branch v${WPS3_VERSION} https://github.com/wrf-model/WPS.git WPSV3 && cd WPSV3 \
    && docker-clean

# clone WRF & WPS, check out only desired versions
ENV WRF4_VERSION 4.5.2
ENV WPS4_VERSION 4.5
RUN echo "cloning WRF-v${WRF4_VERSION} & WPS3-v${WPS4_VERSION}" \
    && cd /container && rm -rf WRF WPS \
    && cd /container && git clone --depth 1 --branch v${WRF4_VERSION} https://github.com/wrf-model/WRF.git WRF && cd WRF \
    && cd /container && git clone --depth 1 --branch v${WPS4_VERSION} https://github.com/wrf-model/WPS.git WPS && cd WPS \
    && docker-clean

ADD extras/build_wrf.sh      /container/WRFV3/build_wrf.sh
ADD extras/build_wrf_chem.sh /container/WRFV3/build_wrf_chem.sh
ADD extras/build_wps.sh      /container/WPSV3/build_wps.sh

RUN cd /container/WRFV3 \
    && chmod +x build_wrf*.sh \
    && git add build_wrf*.sh \
    && cd /container/WPSV3 \
    && chmod +x build_wps*.sh \
    && git add build_wps*.sh

RUN echo "building WRFV3" \
    && export WRF_VERSION=${WRF3_VERSION} \
    && cd /container/WRFV3 && ./build_wrf.sh \
    && cd /container/WRFV3 && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

RUN echo "building WRFV3-Chem & WPSV3" \
    && export WRF_VERSION=${WRF3_VERSION} \
    && export WPS_VERSION=${WPS3_VERSION} \
    && cd /container/WRFV3 && ./build_wrf_chem.sh \
    && cd /container/WPSV3 && ./build_wps.sh \
    && cd /container/WRFV3 && git clean -xdf . >/dev/null 2>&1 \
    && cd /container/WPSV3 && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

ADD extras/build_wrf.sh      /container/WRF/build_wrf.sh
ADD extras/build_wrf_chem.sh /container/WRF/build_wrf_chem.sh
ADD extras/build_wps.sh      /container/WPS/build_wps.sh

RUN cd /container/WRF \
    && chmod +x build_wrf*.sh \
    && git add build_wrf*.sh \
    && cd /container/WPS \
    && chmod +x build_wps*.sh \
    && git add build_wps*.sh

RUN echo "building WRF" \
    && export WRF_VERSION=${WRF4_VERSION} \
    && cd /container/WRF && ./build_wrf.sh \
    && cd /container/WRF && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

RUN echo "building WRF-Chem & WPS" \
    && export WRF_VERSION=${WRF4_VERSION} \
    && export WPS_VERSION=${WPS4_VERSION} \
    && cd /container/WRF && ./build_wrf_chem.sh \
    && cd /container/WPS && ./build_wps.sh \
    && cd /container/WRF && git clean -xdf . >/dev/null 2>&1 \
    && cd /container/WPS && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

# Ok, all done - let's indicate the environment
RUN sed -i 's/WRF-${BUILD}-dev/WRF-${BUILD}-env/g' /container/config_env.sh \
    && find /container/w*-*/ -name "*.exe" | sort > /container/exe_list.txt \
    && cat /container/exe_list.txt \
    && echo "Total # *.exe: $(wc -l /container/exe_list.txt | awk '{print $1}')"

# archive this Dockerfile in the image
ADD extras/Dockerfile.* /container/

# Local Variables:
# mode: sh
# End:

Build Scripts

NCAR/derecho/WRF/gcc

ARG ARCH=amd64
ARG BUILD=gcc
FROM --platform=linux/${ARCH} benjaminkirk/ncar-derecho-wrf-${BUILD}-build-environment:latest

# clone WRF & WPS, check out only desired versions
ENV WRF3_VERSION 3.9.1.1
ENV WPS3_VERSION 3.9.1
RUN echo "cloning WRF3-v${WRF3_VERSION} & WPS3-v${WPS3_VERSION}" \
    && cd /container && rm -rf WRFV3 WPSV3 \
    && cd /container && git clone --depth 1 --branch V${WRF3_VERSION} https://github.com/NCAR/WRFV3.git    WRFV3 && cd WRFV3 \
    && cd /container && git clone --depth 1 --branch v${WPS3_VERSION} https://github.com/wrf-model/WPS.git WPSV3 && cd WPSV3 \
    && docker-clean

# clone WRF & WPS, check out only desired versions
ENV WRF4_VERSION 4.5.2
ENV WPS4_VERSION 4.5
RUN echo "cloning WRF-v${WRF4_VERSION} & WPS3-v${WPS4_VERSION}" \
    && cd /container && rm -rf WRF WPS \
    && cd /container && git clone --depth 1 --branch v${WRF4_VERSION} https://github.com/wrf-model/WRF.git WRF && cd WRF \
    && cd /container && git clone --depth 1 --branch v${WPS4_VERSION} https://github.com/wrf-model/WPS.git WPS && cd WPS \
    && docker-clean

ADD extras/build_wrf.sh      /container/WRFV3/build_wrf.sh
ADD extras/build_wrf_chem.sh /container/WRFV3/build_wrf_chem.sh
ADD extras/build_wps.sh      /container/WPSV3/build_wps.sh

RUN cd /container/WRFV3 \
    && chmod +x build_wrf*.sh \
    && git add build_wrf*.sh \
    && cd /container/WPSV3 \
    && chmod +x build_wps*.sh \
    && git add build_wps*.sh

RUN echo "building WRFV3" \
    && export WRF_VERSION=${WRF3_VERSION} \
    && cd /container/WRFV3 && ./build_wrf.sh \
    && cd /container/WRFV3 && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

RUN echo "building WRFV3-Chem & WPSV3" \
    && export WRF_VERSION=${WRF3_VERSION} \
    && export WPS_VERSION=${WPS3_VERSION} \
    && cd /container/WRFV3 && ./build_wrf_chem.sh \
    && cd /container/WPSV3 && ./build_wps.sh \
    && cd /container/WRFV3 && git clean -xdf . >/dev/null 2>&1 \
    && cd /container/WPSV3 && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

ADD extras/build_wrf.sh      /container/WRF/build_wrf.sh
ADD extras/build_wrf_chem.sh /container/WRF/build_wrf_chem.sh
ADD extras/build_wps.sh      /container/WPS/build_wps.sh

RUN cd /container/WRF \
    && chmod +x build_wrf*.sh \
    && git add build_wrf*.sh \
    && cd /container/WPS \
    && chmod +x build_wps*.sh \
    && git add build_wps*.sh

RUN echo "building WRF" \
    && export WRF_VERSION=${WRF4_VERSION} \
    && cd /container/WRF && ./build_wrf.sh \
    && cd /container/WRF && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

RUN echo "building WRF-Chem & WPS" \
    && export WRF_VERSION=${WRF4_VERSION} \
    && export WPS_VERSION=${WPS4_VERSION} \
    && cd /container/WRF && ./build_wrf_chem.sh \
    && cd /container/WPS && ./build_wps.sh \
    && cd /container/WRF && git clean -xdf . >/dev/null 2>&1 \
    && cd /container/WPS && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

# Ok, all done - let's indicate the environment
RUN sed -i 's/WRF-${BUILD}-dev/WRF-${BUILD}-env/g' /container/config_env.sh \
    && find /container/w*-*/ -name "*.exe" | sort > /container/exe_list.txt \
    && cat /container/exe_list.txt \
    && echo "Total # *.exe: $(wc -l /container/exe_list.txt | awk '{print $1}')"

# archive this Dockerfile in the image
ADD extras/Dockerfile.* /container/

# Local Variables:
# mode: sh
# End:

Build Scripts

For demonstration purposes only

Testing has shown the nvhpc variant does not offer any performance benefits vs. the Intel or GCC variants, and therefore is not recommended for production runs. It is provided for demonstration purposes only, and in hopes it may prove useful for other projects that have a critical dependency on this particular compiler suite.

NCAR/derecho/WRF/nvhpc

ARG ARCH=amd64
ARG BUILD=nvhpc
FROM --platform=linux/${ARCH} benjaminkirk/ncar-derecho-wrf-${BUILD}-build-environment:latest

# clone WRF & WPS, check out only desired versions
ENV WRF3_VERSION 3.9.1.1
ENV WPS3_VERSION 3.9.1
RUN echo "cloning WRF3-v${WRF3_VERSION} & WPS3-v${WPS3_VERSION}" \
    && cd /container && rm -rf WRFV3 WPSV3 \
    && cd /container && git clone --depth 1 --branch V${WRF3_VERSION} https://github.com/NCAR/WRFV3.git    WRFV3 && cd WRFV3 \
    && cd /container && git clone --depth 1 --branch v${WPS3_VERSION} https://github.com/wrf-model/WPS.git WPSV3 && cd WPSV3 \
    && docker-clean

# clone WRF & WPS, check out only desired versions
ENV WRF4_VERSION 4.5.2
ENV WPS4_VERSION 4.5
RUN echo "cloning WRF-v${WRF4_VERSION} & WPS3-v${WPS4_VERSION}" \
    && cd /container && rm -rf WRF WPS \
    && cd /container && git clone --depth 1 --branch v${WRF4_VERSION} https://github.com/wrf-model/WRF.git WRF && cd WRF \
    && cd /container && git clone --depth 1 --branch v${WPS4_VERSION} https://github.com/wrf-model/WPS.git WPS && cd WPS \
    && docker-clean

ADD extras/build_wrf.sh      /container/WRFV3/build_wrf.sh
ADD extras/build_wrf_chem.sh /container/WRFV3/build_wrf_chem.sh
ADD extras/build_wps.sh      /container/WPSV3/build_wps.sh

RUN cd /container/WRFV3 \
    && chmod +x build_wrf*.sh \
    && git add build_wrf*.sh \
    && cd /container/WPSV3 \
    && chmod +x build_wps*.sh \
    && git add build_wps*.sh

RUN echo "building WRFV3" \
    && export WRF_VERSION=${WRF3_VERSION} \
    && cd /container/WRFV3 && ./build_wrf.sh \
    && cd /container/WRFV3 && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

RUN echo "building WRFV3-Chem & WPSV3" \
    && export WRF_VERSION=${WRF3_VERSION} \
    && export WPS_VERSION=${WPS3_VERSION} \
    && cd /container/WRFV3 && ./build_wrf_chem.sh \
    && cd /container/WPSV3 && ./build_wps.sh \
    && cd /container/WRFV3 && git clean -xdf . >/dev/null 2>&1 \
    && cd /container/WPSV3 && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

ADD extras/build_wrf.sh      /container/WRF/build_wrf.sh
ADD extras/build_wrf_chem.sh /container/WRF/build_wrf_chem.sh
ADD extras/build_wps.sh      /container/WPS/build_wps.sh

RUN cd /container/WRF \
    && chmod +x build_wrf*.sh \
    && git add build_wrf*.sh \
    && cd /container/WPS \
    && chmod +x build_wps*.sh \
    && git add build_wps*.sh

RUN echo "building WRF" \
    && export WRF_VERSION=${WRF4_VERSION} \
    && cd /container/WRF && ./build_wrf.sh \
    && cd /container/WRF && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

RUN echo "building WRF-Chem & WPS" \
    && export WRF_VERSION=${WRF4_VERSION} \
    && export WPS_VERSION=${WPS4_VERSION} \
    && cd /container/WRF && ./build_wrf_chem.sh \
    && cd /container/WPS && ./build_wps.sh \
    && cd /container/WRF && git clean -xdf . >/dev/null 2>&1 \
    && cd /container/WPS && git clean -xdf . >/dev/null 2>&1 \
    && docker-clean

# Ok, all done - let's indicate the environment
RUN sed -i 's/WRF-${BUILD}-dev/WRF-${BUILD}-env/g' /container/config_env.sh \
    && find /container/w*-*/ -name "*.exe" | sort > /container/exe_list.txt \
    && cat /container/exe_list.txt \
    && echo "Total # *.exe: $(wc -l /container/exe_list.txt | awk '{print $1}')"

# archive this Dockerfile in the image
ADD extras/Dockerfile.* /container/

# Local Variables:
# mode: sh
# End:

Build Scripts

Dockerfile Steps

We begin by cloning specific versions of WRF/WPS. To reduce image size, we clone only the relevant branch and not the full git repository history.
Next we install the build recipes for the specific compiler suite.
Finally, we compile WRF, then WRF-Chem, and WPS, version 3.x then 4.x - taking care at each step to clean intermediate files to limit the size of the container layers.

The completed images are then pushed to DockerHub.

Deploying the container on NCAR's HPC resources with Apptainer¶

All the previous steps were performed off-premises with a Docker installation. We can now deploy the resulting container images with Apptainer. We construct a Singularity Image File (.sif) image from the DockerHub image, and also a convenience shell script that allows us to launch the container for interactive use.

Constructing SIF images & using the container interactively as a development environment

The Deffile below simply pulls the desired image from DockerHub and performs shell initialization for interactive use. We also print some information on the configuration that is displayed when the user launches the container via singularity run.

IntelGCCNVHPC

ncar-derecho-wrf-intel.sif

Bootstrap: docker
From: benjaminkirk/ncar-derecho-wrf-intel:latest

%post
    echo "source /container/config_env.sh" >> ${SINGULARITY_ENVIRONMENT}

%runscript

    cat <<EOF
Welcome to "$(basename ${SINGULARITY_NAME} .sif)"

#----------------------------------------------------------
# MPI Compilers & Version Details:
#----------------------------------------------------------
$(which mpicc)
$(mpicc --version)

$(which mpif90)
$(mpif90 --version)

$(mpichversion)

#----------------------------------------------------------
# WRF/WPS-Centric Environment:
#----------------------------------------------------------
JASPERINC=${JASPERINC}
JASPERLIB=${JASPERLIB}
HDF5=${HDF5}
NETCDF=${NETCDF}

FLEX_LIB_DIR=${FLEX_LIB_DIR}
YACC=${YACC}
LIB_LOCAL=${LIB_LOCAL}
#----------------------------------------------------------

#----------------------------------------------------------
# Pre-compiled executables:
#----------------------------------------------------------
$(find /container/w*-*/ -name "*.exe" | sort)
#----------------------------------------------------------
EOF
    # start bash, resulting in an interactive shell.
    # but ignore any user ~/.profile etc... which would be coming
    # from the host
    /bin/bash --noprofile --norc

We can build ncar-derecho-wrf-intel.sif from ncar-derecho-wrf-intel.def as described here.

The simple utility script wrf_intel_env allows us to easily interact with the container.

wrf_intel_env
#!/bin/bash

selfdir="$(dirname $(readlink -f ${BASH_SOURCE[0]}))"

module load apptainer || exit 1

singularity \
    --quiet \
    run \
    --cleanenv \
    -B /glade \
    ${selfdir}/ncar-derecho-wrf-intel.sif

For example,

derecho$ ./wrf_intel_env
Welcome to "ncar-derecho-wrf-intel"

#----------------------------------------------------------
# MPI Compilers & Version Details:
#----------------------------------------------------------
/container/mpich/bin/mpicc
icc (ICC) 2021.7.1 20221019
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

/container/mpich/bin/mpif90
ifort (IFORT) 2021.7.1 20221019
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

MPICH Version:      3.4.3
MPICH Release date: Thu Dec 16 11:20:57 CST 2021
MPICH Device:       ch4:ofi
MPICH CC:   /container/intel/compiler/2022.2.1/linux/bin/intel64/icc    -O2
MPICH CXX:  /container/intel/compiler/2022.2.1/linux/bin/intel64/icpc   -O2
MPICH F77:  /container/intel/compiler/2022.2.1/linux/bin/intel64/ifort   -O2
MPICH FC:   /container/intel/compiler/2022.2.1/linux/bin/intel64/ifort   -O2
MPICH Custom Information:

#----------------------------------------------------------
# WRF/WPS-Centric Environment:
#----------------------------------------------------------
JASPERINC=/container/jasper/1.900.1/include
JASPERLIB=/container/jasper/1.900.1/lib
HDF5=
NETCDF=/container/netcdf

FLEX_LIB_DIR=/usr/lib64
YACC=/usr/bin/byacc -d
LIB_LOCAL=
#----------------------------------------------------------

#----------------------------------------------------------
# Pre-compiled executables:
#----------------------------------------------------------
/container/wps-3.9.1/avg_tsfc.exe
/container/wps-3.9.1/calc_ecmwf_p.exe
/container/wps-3.9.1/g1print.exe
/container/wps-3.9.1/g2print.exe
/container/wps-3.9.1/geogrid.exe
/container/wps-3.9.1/height_ukmo.exe
/container/wps-3.9.1/int2nc.exe
/container/wps-3.9.1/metgrid.exe
/container/wps-3.9.1/mod_levs.exe
/container/wps-3.9.1/rd_intermediate.exe
/container/wps-3.9.1/ungrib.exe
/container/wps-4.5/avg_tsfc.exe
/container/wps-4.5/calc_ecmwf_p.exe
/container/wps-4.5/g1print.exe
/container/wps-4.5/g2print.exe
/container/wps-4.5/geogrid.exe
/container/wps-4.5/height_ukmo.exe
/container/wps-4.5/int2nc.exe
/container/wps-4.5/metgrid.exe
/container/wps-4.5/mod_levs.exe
/container/wps-4.5/rd_intermediate.exe
/container/wps-4.5/ungrib.exe
/container/wrf-3.9.1.1/ndown.exe
/container/wrf-3.9.1.1/real.exe
/container/wrf-3.9.1.1/tc.exe
/container/wrf-3.9.1.1/wrf.exe
/container/wrf-4.5.2/ndown.exe
/container/wrf-4.5.2/real.exe
/container/wrf-4.5.2/tc.exe
/container/wrf-4.5.2/wrf.exe
/container/wrf-chem-3.9.1.1/ndown.exe
/container/wrf-chem-3.9.1.1/real.exe
/container/wrf-chem-3.9.1.1/tc.exe
/container/wrf-chem-3.9.1.1/wrf.exe
/container/wrf-chem-4.5.2/ndown.exe
/container/wrf-chem-4.5.2/real.exe
/container/wrf-chem-4.5.2/tc.exe
/container/wrf-chem-4.5.2/wrf.exe
#----------------------------------------------------------
WRF-intel-dev>

produces an interactive shell environment with access to the GLADE file systems and the containerized build environment.

ncar-derecho-wrf-gcc.sif

Bootstrap: docker
From: benjaminkirk/ncar-derecho-wrf-gcc:latest

%post
    echo "source /container/config_env.sh" >> ${SINGULARITY_ENVIRONMENT}

%runscript

    cat <<EOF
Welcome to "$(basename ${SINGULARITY_NAME} .sif)"

#----------------------------------------------------------
# MPI Compilers & Version Details:
#----------------------------------------------------------
$(which mpicc)
$(mpicc --version)

$(which mpif90)
$(mpif90 --version)

$(mpichversion)

#----------------------------------------------------------
# WRF/WPS-Centric Environment:
#----------------------------------------------------------
JASPERINC=${JASPERINC}
JASPERLIB=${JASPERLIB}
HDF5=${HDF5}
NETCDF=${NETCDF}

FLEX_LIB_DIR=${FLEX_LIB_DIR}
YACC=${YACC}
LIB_LOCAL=${LIB_LOCAL}
#----------------------------------------------------------

#----------------------------------------------------------
# Pre-compiled executables:
#----------------------------------------------------------
$(find /container/w*-*/ -name "*.exe" | sort)
#----------------------------------------------------------
EOF
    # start bash, resulting in an interactive shell.
    # but ignore any user ~/.profile etc... which would be coming
    # from the host
    /bin/bash --noprofile --norc

We can build ncar-derecho-wrf-gcc.sif from ncar-derecho-wrf-gcc.def as described here.

The simple utility script wrf_gcc_env allows us to easily interact with the container.

wrf_gcc_env
#!/bin/bash

selfdir="$(dirname $(readlink -f ${BASH_SOURCE[0]}))"

module load apptainer || exit 1

singularity \
    --quiet \
    run \
    --cleanenv \
    -B /glade \
    ${selfdir}/ncar-derecho-wrf-gcc.sif

For example,

Welcome to "ncar-derecho-wrf-gcc"

#----------------------------------------------------------
# MPI Compilers & Version Details:
#----------------------------------------------------------
/container/mpich/bin/mpicc
gcc (SUSE Linux) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

/container/mpich/bin/mpif90
GNU Fortran (SUSE Linux) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

MPICH Version:      3.4.3
MPICH Release date: Thu Dec 16 11:20:57 CST 2021
MPICH Device:       ch4:ofi
MPICH CC:   /usr/bin/gcc    -O2
MPICH CXX:  /usr/bin/g++   -O2
MPICH F77:  /usr/bin/gfortran   -O2
MPICH FC:   /usr/bin/gfortran   -O2
MPICH Custom Information:

#----------------------------------------------------------
# WRF/WPS-Centric Environment:
#----------------------------------------------------------
JASPERINC=/container/jasper/1.900.1/include
JASPERLIB=/container/jasper/1.900.1/lib
HDF5=
NETCDF=/container/netcdf

FLEX_LIB_DIR=/usr/lib64
YACC=/usr/bin/byacc -d
LIB_LOCAL=
#----------------------------------------------------------

#----------------------------------------------------------
# Pre-compiled executables:
#----------------------------------------------------------
/container/wps-3.9.1/avg_tsfc.exe
/container/wps-3.9.1/calc_ecmwf_p.exe
/container/wps-3.9.1/g1print.exe
/container/wps-3.9.1/g2print.exe
/container/wps-3.9.1/geogrid.exe
/container/wps-3.9.1/height_ukmo.exe
/container/wps-3.9.1/int2nc.exe
/container/wps-3.9.1/metgrid.exe
/container/wps-3.9.1/mod_levs.exe
/container/wps-3.9.1/rd_intermediate.exe
/container/wps-3.9.1/ungrib.exe
/container/wps-4.5/avg_tsfc.exe
/container/wps-4.5/calc_ecmwf_p.exe
/container/wps-4.5/g1print.exe
/container/wps-4.5/g2print.exe
/container/wps-4.5/geogrid.exe
/container/wps-4.5/height_ukmo.exe
/container/wps-4.5/int2nc.exe
/container/wps-4.5/metgrid.exe
/container/wps-4.5/mod_levs.exe
/container/wps-4.5/rd_intermediate.exe
/container/wps-4.5/ungrib.exe
/container/wrf-3.9.1.1/ndown.exe
/container/wrf-3.9.1.1/real.exe
/container/wrf-3.9.1.1/tc.exe
/container/wrf-3.9.1.1/wrf.exe
/container/wrf-4.5.2/ndown.exe
/container/wrf-4.5.2/real.exe
/container/wrf-4.5.2/tc.exe
/container/wrf-4.5.2/wrf.exe
/container/wrf-chem-3.9.1.1/ndown.exe
/container/wrf-chem-3.9.1.1/real.exe
/container/wrf-chem-3.9.1.1/tc.exe
/container/wrf-chem-3.9.1.1/wrf.exe
/container/wrf-chem-4.5.2/ndown.exe
/container/wrf-chem-4.5.2/real.exe
/container/wrf-chem-4.5.2/tc.exe
/container/wrf-chem-4.5.2/wrf.exe
#----------------------------------------------------------
WRF-gcc-dev>

produces an interactive shell environment with access to the GLADE file systems and the containerized build environment.

For demonstration purposes only

Testing has shown the nvhpc variant does not offer any performance benefits vs. the Intel or GCC variants, and therefore is not recommended for production runs. It is provided for demonstration purposes only, and in hopes it may prove useful for other projects that have a critical dependency on this particular compiler suite.

ncar-derecho-wrf-nvhpc.sif

Bootstrap: docker
From: benjaminkirk/ncar-derecho-wrf-nvhpc:latest

%post
    echo "source /container/config_env.sh" >> ${SINGULARITY_ENVIRONMENT}

%runscript

    cat <<EOF
Welcome to "$(basename ${SINGULARITY_NAME} .sif)"

#----------------------------------------------------------
# MPI Compilers & Version Details:
#----------------------------------------------------------
$(which mpicc)
$(mpicc --version)

$(which mpif90)
$(mpif90 --version)

$(mpichversion)

#----------------------------------------------------------
# WRF/WPS-Centric Environment:
#----------------------------------------------------------
JASPERINC=${JASPERINC}
JASPERLIB=${JASPERLIB}
HDF5=${HDF5}
NETCDF=${NETCDF}

FLEX_LIB_DIR=${FLEX_LIB_DIR}
YACC=${YACC}
LIB_LOCAL=${LIB_LOCAL}
#----------------------------------------------------------

#----------------------------------------------------------
# Pre-compiled executables:
#----------------------------------------------------------
$(find /container/w*-*/ -name "*.exe" | sort)
#----------------------------------------------------------
EOF
    # start bash, resulting in an interactive shell.
    # but ignore any user ~/.profile etc... which would be coming
    # from the host
    /bin/bash --noprofile --norc

We can build ncar-derecho-wrf-nvhpc.sif from ncar-derecho-wrf-nvhpc.def as described here.

The simple utility script wrf_nvhpc_env allows us to easily interact with the container.

wrf_nvhpc_env
#!/bin/bash

selfdir="$(dirname $(readlink -f ${BASH_SOURCE[0]}))"

module load apptainer || exit 1

singularity \
    --quiet \
    run \
    --cleanenv \
    -B /glade \
    ${selfdir}/ncar-derecho-wrf-nvhpc.sif

For example,

Welcome to "ncar-derecho-wrf-nvhpc"

#----------------------------------------------------------
# MPI Compilers & Version Details:
#----------------------------------------------------------
/container/mpich/bin/mpicc
gcc (SUSE Linux) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

/container/mpich/bin/mpif90

nvfortran 23.9-0 64-bit target on x86-64 Linux -tp znver3
NVIDIA Compilers and Tools
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

MPICH Version:      3.4.3
MPICH Release date: Thu Dec 16 11:20:57 CST 2021
MPICH Device:       ch4:ofi
MPICH CC:   /usr/bin/gcc -fPIC   -O2
MPICH CXX:  /usr/bin/g++ -fPIC  -O2
MPICH F77:  /container/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/bin/nvfortran -fPIC
MPICH FC:   /container/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/bin/nvfortran -fPIC
MPICH Custom Information:

#----------------------------------------------------------
# WRF/WPS-Centric Environment:
#----------------------------------------------------------
JASPERINC=/container/jasper/1.900.1/include
JASPERLIB=/container/jasper/1.900.1/lib
HDF5=
NETCDF=/container/netcdf

FLEX_LIB_DIR=/usr/lib64
YACC=/usr/bin/byacc -d
LIB_LOCAL=
#----------------------------------------------------------

#----------------------------------------------------------
# Pre-compiled executables:
#----------------------------------------------------------
/container/wps-3.9.1/avg_tsfc.exe
/container/wps-3.9.1/calc_ecmwf_p.exe
/container/wps-3.9.1/g1print.exe
/container/wps-3.9.1/g2print.exe
/container/wps-3.9.1/geogrid.exe
/container/wps-3.9.1/height_ukmo.exe
/container/wps-3.9.1/int2nc.exe
/container/wps-3.9.1/metgrid.exe
/container/wps-3.9.1/mod_levs.exe
/container/wps-3.9.1/rd_intermediate.exe
/container/wps-3.9.1/ungrib.exe
/container/wps-4.5/avg_tsfc.exe
/container/wps-4.5/calc_ecmwf_p.exe
/container/wps-4.5/g1print.exe
/container/wps-4.5/g2print.exe
/container/wps-4.5/geogrid.exe
/container/wps-4.5/height_ukmo.exe
/container/wps-4.5/int2nc.exe
/container/wps-4.5/metgrid.exe
/container/wps-4.5/mod_levs.exe
/container/wps-4.5/rd_intermediate.exe
/container/wps-4.5/ungrib.exe
/container/wrf-3.9.1.1/ndown.exe
/container/wrf-3.9.1.1/real.exe
/container/wrf-3.9.1.1/tc.exe
/container/wrf-3.9.1.1/wrf.exe
/container/wrf-4.5.2/ndown.exe
/container/wrf-4.5.2/real.exe
/container/wrf-4.5.2/tc.exe
/container/wrf-4.5.2/wrf.exe
/container/wrf-chem-3.9.1.1/ndown.exe
/container/wrf-chem-3.9.1.1/real.exe
/container/wrf-chem-3.9.1.1/tc.exe
/container/wrf-chem-3.9.1.1/wrf.exe
/container/wrf-chem-4.5.2/ndown.exe
/container/wrf-chem-4.5.2/real.exe
/container/wrf-chem-4.5.2/tc.exe
/container/wrf-chem-4.5.2/wrf.exe
#----------------------------------------------------------
WRF-nvhpc-dev>

produces an interactive shell environment with access to the GLADE file systems and the containerized build environment.

Running the container on Derecho¶

The PBS job script listed below shows the steps required to "bind" the host MPI into the container and launch an executable in a batch environment.

Containerized WRF PBS Script

IntelGCCNVHPC

run_wrf_intel_container.pbs
#!/bin/bash -l
#PBS -q main
#PBS -j oe
#PBS -o wrf_container_job.log
#PBS -l walltime=02:00:00
#PBS -l select=2:ncpus=128:mpiprocs=4

module load ncarenv/23.09
module purge
module load apptainer intel cray-mpich || exit 1
module list

if [ ! -z ${PBS_NODEFILE+x} ]; then
    nnodes=$(cat ${PBS_NODEFILE} | sort | uniq | wc -l)
    nranks=$(cat ${PBS_NODEFILE} | sort | wc -l)
    nranks_per_node=$((${nranks} / ${nnodes}))
fi

container_image="ncar-derecho-wrf-intel.sif"
container_exe="/container/wrf-4.5.2/wrf.exe"

# examine dynamic libraries before & after host MPI injection
echo -e "\nldd, container native:"
singularity \
    --quiet \
    exec \
    ${container_image} \
    ldd ${container_exe}

echo -e "\nldd, host MPI:"
singularity \
    --quiet \
    exec \
    --bind /glade \
    --pwd $(pwd) \
    --bind /run \
    --bind /opt/cray \
    --bind /usr/lib64:/host/usr/lib64 \
    --env LD_LIBRARY_PATH=${CRAY_MPICH_DIR}/lib-abi-mpich:/opt/cray/pe/lib64:${LD_LIBRARY_PATH}:/host/usr/lib64 \
    ${container_image} \
    ldd ${container_exe}

[ -z ${PBS_NODEFILE+x} ] && exit 0

echo "# --> BEGIN execution"; tstart=$(date +%s)

mpiexec \
    --np ${nranks} --ppn ${nranks_per_node} --no-transfer \
    singularity \
    --quiet \
    exec \
    --bind /glade \
    --pwd $(pwd) \
    --bind /run \
    --bind /opt/cray \
    --bind /usr/lib64:/host/usr/lib64 \
    --env LD_LIBRARY_PATH=${CRAY_MPICH_DIR}/lib-abi-mpich:/opt/cray/pe/lib64:${LD_LIBRARY_PATH}:/host/usr/lib64 \
    --env MPICH_SMP_SINGLE_COPY_MODE=NONE \
    ${container_image} \
    ${container_exe}

if [ -x wrf_stats_summary.sh ];  then
    echo && echo
    ./wrf_stats_summary.sh -t -H -d .
    echo && echo
fi

echo "# --> END execution"
echo $(($(date +%s)-${tstart})) " elapsed seconds; $(date)"

run_wrf_gcc_container.pbs
#!/bin/bash -l
#PBS -q main
#PBS -j oe
#PBS -o wrf_container_job.log
#PBS -l walltime=02:00:00
#PBS -l select=2:ncpus=128:mpiprocs=16

module load ncarenv/23.09
module purge
module load apptainer gcc cray-mpich || exit 1
module list

if [ ! -z ${PBS_NODEFILE+x} ]; then
    nnodes=$(cat ${PBS_NODEFILE} | sort | uniq | wc -l)
    nranks=$(cat ${PBS_NODEFILE} | sort | wc -l)
    nranks_per_node=$((${nranks} / ${nnodes}))
fi

container_image="ncar-derecho-wrf-gcc.sif"
container_exe="/container/wrf-4.5.2/wrf.exe"

# examine dynamic libraries before & after host MPI injection
echo -e "\nldd, container native:"
singularity \
    --quiet \
    exec \
    ${container_image} \
    ldd ${container_exe}

echo -e "\nldd, host MPI:"
singularity \
    --quiet \
    exec \
    --bind /glade \
    --pwd $(pwd) \
    --bind /run \
    --bind /opt/cray \
    --bind /usr/lib64:/host/lib64 \
    --env LD_LIBRARY_PATH=${CRAY_MPICH_DIR}/lib-abi-mpich:/opt/cray/pe/lib64:${LD_LIBRARY_PATH}:/host/lib64 \
    ${container_image} \
    ldd ${container_exe}

[ -z ${PBS_NODEFILE+x} ] && exit 0

echo "# --> BEGIN execution"; tstart=$(date +%s)

mpiexec \
    --np ${nranks} --ppn ${nranks_per_node} --no-transfer \
    singularity \
    --quiet \
    exec \
    --bind /glade \
    --pwd $(pwd) \
    --bind /run \
    --bind /opt/cray \
    --bind /usr/lib64:/host/lib64 \
    --env LD_LIBRARY_PATH=${CRAY_MPICH_DIR}/lib-abi-mpich:/opt/cray/pe/lib64:${LD_LIBRARY_PATH}:/host/lib64 \
    --env MPICH_SMP_SINGLE_COPY_MODE=NONE \
    ${container_image} \
    ${container_exe}

if [ -x wrf_stats_summary.sh ];  then
    echo && echo
    ./wrf_stats_summary.sh -t -H -d .
    echo && echo
fi

echo "# --> END execution"
echo $(($(date +%s)-${tstart})) " elapsed seconds; $(date)"

run_wrf_nvhpc_container.pbs
#!/bin/bash -l
#PBS -q main
#PBS -j oe
#PBS -o wrf_container_job.log
#PBS -l walltime=02:00:00
#PBS -l select=2:ncpus=128:mpiprocs=4

module load ncarenv/23.09
module purge
module load apptainer nvhpc cray-mpich || exit 1
module list

if [ ! -z ${PBS_NODEFILE+x} ]; then
    nnodes=$(cat ${PBS_NODEFILE} | sort | uniq | wc -l)
    nranks=$(cat ${PBS_NODEFILE} | sort | wc -l)
    nranks_per_node=$((${nranks} / ${nnodes}))
fi

container_image="ncar-derecho-wrf-nvhpc.sif"
container_exe="/container/wrf-4.5.2/wrf.exe"

# examine dynamic libraries before & after host MPI injection
echo -e "\nldd, container native:"
singularity \
    --quiet \
    exec \
    ${container_image} \
    ldd ${container_exe}

echo -e "\nldd, host MPI:"
singularity \
    --quiet \
    exec \
    --bind /glade \
    --pwd $(pwd) \
    --bind /run \
    --bind /opt/cray \
    --bind /usr/lib64:/host/usr/lib64 \
    --env LD_LIBRARY_PATH=${CRAY_MPICH_DIR}/lib-abi-mpich:/opt/cray/pe/lib64:${LD_LIBRARY_PATH}:/container/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/lib:/host/usr/lib64 \
    ${container_image} \
    ldd ${container_exe}

[ -z ${PBS_NODEFILE+x} ] && exit 0

echo "# --> BEGIN execution"; tstart=$(date +%s)

mpiexec \
    --np ${nranks} --ppn ${nranks_per_node} --no-transfer \
    singularity \
    --quiet \
    exec \
    --bind /glade \
    --pwd $(pwd) \
    --bind /run \
    --bind /opt/cray \
    --bind /usr/lib64:/host/usr/lib64 \
    --env LD_LIBRARY_PATH=${CRAY_MPICH_DIR}/lib-abi-mpich:/opt/cray/pe/lib64:${LD_LIBRARY_PATH}:/container/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/lib:/host/usr/lib64 \
    --env MPICH_SMP_SINGLE_COPY_MODE=NONE \
    ${container_image} \
    ${container_exe}

if [ -x wrf_stats_summary.sh ];  then
    echo && echo
    ./wrf_stats_summary.sh -t -H -d .
    echo && echo
fi

echo "# --> END execution"
echo $(($(date +%s)-${tstart})) " elapsed seconds; $(date)"

Discussion

The PBS script examines the shared library dependencies of the executable using ldd, first within the container and then with the host MPI "injected" (as described here). This process is often tedious, iterative, and error prone. As constructed the PBS script can be executed directly (without qsub) to inspect these results before waiting for batch resources.

The mpiexec command is fairly standard. Note that we are using it to launch singularity, which in turn will start up the WRF executable specified on line 20. note in this case the executable is build into the container, however it could also be resident on GLADE, provided it was complied from this same container development environment.

The singularity exec command lines are complex, so let's deconstruct them here:

We make use of the --bind argument first to mount familiar GLADE file systems within the container,
and again to "inject" the host MPI into the container. The /run directory necessity is not immediately obvious but is used by Cray-MPICH as part of the launching process.
We also need to use the --env to set the LD_LIBRARY_PATH inside the image so that the application can find the proper host libraries. Recall when we built the FastEddy executable in the containerized environment it had no knowledge of these host-specific paths. Similarly, we use --env to set the LD_PRELOAD environment variable inside the container. This will cause a particular Cray-MPICH library to be loaded prior to application initialization. This step is not required for "bare metal" execution.
We set the MPICH_SMP_SINGLE_COPY_MODE environment variable to work around an MPI run-time error that would otherwise appear.
Finally, a note on the --bind /usr/lib64:/host/lib64 argument. Injecting the host MPI requires that some shared libraries from the host's /usr/lib64 directory be visible inside the image. However, this path also exists inside the image and contains other libraries needed by the application. We cannot simply bind the hosts directory into the same path, doing so will mask these other libraries. So we bind the host's /usr/lib64 into the container image at /host/lib64, and make sure this path is set in the LD_LIBRARY_PATH variable as well. Because we want these particular host libraries found as last resort (not taking precedence over similar libraries in the container, we append /host/lib64 to the LD_LIBRARY_PATH search path.

The arguments above were determined iteratively through trial and error. Such is the reality of containerized MPI applications and proprietary host MPI integration. Feel free to experiment with the PBS file, omitting some of the --bind and --env arguments and observing the resulting error message.

Building and running containerized FastEddy under MPI on GPUs¶

Warning

While the result of this demonstration is a functional application, we recommend against using this container for production FastEddy workflows!

It is much easier to simply build FasyEddy "bare metal" when operating inside the NCAR HPC environment!!

This example demonstrates building a containerized version of FastEddy from the open-source variant hosted on GitHub. It is provided for demonstration purposes because it demonstrates several common issues encountered when running GPU-aware MPI applications inside containers across multiple nodes, particularly when binding the host MPI into the container, and the source code is open for any interested user to follow along and adapt.

About FastEddy¶

FastEddy is a large-eddy simulation (LES) model developed by the Research Applications Laboratory (RAL) here at NCAR. The fundamental premise of FastEddy model development is to leverage the accelerated and more power efficient computing capacity of graphics processing units (GPU)s to enable not only more widespread use of LES in research activities but also to pursue the adoption of microscale and multiscale, turbulence-resolving, atmospheric boundary layer modeling into local scale weather prediction or actionable science and engineering applications.

Containerization approach¶

The container is built off-premises with docker from three related images, each providing a foundation for the next. We begin with a

Rockylinux version 8 operating system with OpenHPC version 2 installed, then add
a CUDA development environment and a CUDA-aware MPICH installation on top, and finally add
the FastEddy source and compiled program.

A benefit of this layered approach is that the intermediate images created in steps 1 and 2 can be beneficial in their own right, providing base layers for other projects with similar needs. Additionally, by building the image externally with Docker we are able to switch user IDs within the process (discussed further below), which has some benefits when using containers to enable development workflows.

Building the image¶

Build framework

For complete details of the build process, see the Docker-based container build framework described here.

The image was built external to the HPC environment and then pushed to Docker Hub. (For users only interested in the details of running such a container, see instructions for running the container below.)

In this case a simple Mac laptop with git, GNU make, and docker all installed locally was used and the process takes about an hour; any similarly configured system should suffice. No GPU devices are required to build the image.

The base layer¶

The Rockylinx 8 + OpenHPC base layer

For the base layer we deploy an OpenHPC v2 installation on top of a Rocklinux v8 base image. OpenHPC provides access to many pre-complied scientific libraries and applications, and supports a matrix of compilers and MPI permutations. and we will select one that works well with Derecho. Notably, at present OpenHPC does not natively support CUDA installations, however we will address this limitation in the subsequent steps.

rocky8/OpenHPC-mpich/Dockerfile

FROM docker.io/rockylinux/rockylinux:8

ADD extras/docker-clean /usr/bin/docker-clean

ARG COMPILER_VERSION=gnu9
ARG MPI_FAMILY=mpich
ARG MPI_FAMILY_VARIANT=mpich-ofi
ARG MPICH_VERSION=3.4.3
ARG OSU_VERSION=7.2

# Basic OpenHPC development environment setup, derived from Install_guide-Rocky8-Warewulf-SLURM-2.4
RUN echo "yum/dnf config" \
    && set -x \
    && adduser plainuser \
    && chmod a+rx /usr/bin/docker-clean && docker-clean \
    && yum -y update \
    && yum -y install which git tar curl xz bzip2 patch \
    && yum -y install http://repos.openhpc.community/OpenHPC/2/EL_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm \
    && yum -y install dnf-plugins-core \
    && yum config-manager --set-enabled powertools \
    && yum -y install ohpc-base \
    && yum -y install lmod-ohpc nhc-ohpc ohpc-autotools \
    && yum -y install ${COMPILER_VERSION}-compilers-ohpc \
    && yum -y install hwloc-ohpc valgrind-ohpc \
    && yum -y install ${MPI_FAMILY_VARIANT}-${COMPILER_VERSION}-ohpc \
    && yum -y install lmod-defaults-${COMPILER_VERSION}-${MPI_FAMILY_VARIANT}-ohpc \
    && docker-clean

# Prevent mpicxx from linking -lmpicxx, which we do not need, and cannot use on our Cray-EX
RUN sed -i 's/cxxlibs="-lmpicxx"/cxxlibs= #"-lmpicxx"/g' /opt/ohpc/pub/mpi/${MPI_FAMILY_VARIANT}-${COMPILER_VERSION}-ohpc/3.4.2/bin/mpicxx

RUN mkdir -p /opt/local \
    && chown -R plainuser: /home/plainuser/ /opt/local

COPY extras/hello_world_mpi.C /home/plainuser/

USER plainuser
SHELL ["/bin/bash", "-lc"]

RUN echo "Installing MPI benchmark applications" \
    && whoami && module avail && module list \
    && echo "OSU:" \
    && cd /tmp && curl -Sl https://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-${OSU_VERSION}.tar.gz | tar xz \
    && cd osu-micro-benchmarks-${OSU_VERSION} \
    && ./configure --help \
    && ./configure --prefix=/opt/local/osu-micro-benchmarks-${OSU_VERSION} \
                   CXX=$(which mpicxx) CC=$(which mpicc) FC=$(which mpif90) F77=$(which mpif77) \
    && make -j 8 V=0 && rm -rf /opt/local/osu-micro-benchmarks-${OSU_VERSION} && make install \
    && cd && rm -rf /tmp/osu-micro-benchmarks-${OSU_VERSION} \
    && cd /opt/local && mpicxx -o hello_world_mpi /home/plainuser/hello_world_mpi.C -fopenmp \
    && echo "IMB:" \
    && cd /opt/local && rm -rf imb-2021.3 && git clone https://github.com/intel/mpi-benchmarks.git imb-2021.3 \
    && cd /opt/local/imb-2021.3 && git checkout 8ba5d968272b6e7b384f91b6597d1c4590faf3db \
    && CXX=$(which mpicxx) CC=$(which mpicc) make \
    && make -C src_cpp -f Makefile TARGET=MPI1 clean \
    && make -C src_cpp -f Makefile TARGET=NBC clean\
    && make -C src_cpp -f Makefile TARGET=RMA clean \
    && make -C src_cpp -f Makefile TARGET=EXT clean \
    && make -C src_cpp -f Makefile TARGET=IO clean \
    && make -C src_cpp -f Makefile TARGET=MT clean \
    && make -C src_c/P2P -f Makefile TARGET=P2P clean \
    && docker-clean

# make our build ARGs available to derived containers a ENV vars
ENV COMPILER_VERSION   ${COMPILER_VERSION}
ENV MPI_FAMILY         ${MPI_FAMILY}
ENV MPI_FAMILY_VARIANT ${MPI_FAMILY_VARIANT}

# Local Variables:
# mode: sh
# End:

Dockerfile Steps

The image begins with a minimal Rockylinux v8 image, and adds a utility script docker-clean copied from the build host.
We parameterize several variables with build arguments using the ARG instructions. (Build arguments are available within the image build process as environment variables, but not when running the resulting container image; rather ENV instructions can be used for those purposes. For a full discussion of Dockerfiles and supported instructions see here.)
We then perform a number of RUN steps. When running docker, each RUN step creates a subsequent layer in the image. (We follow general Docker guidance and also strive to combine related commands inside a handful of RUN instructions.)
1. The first RUN instruction takes us from the very basic Rockylinux 8 source image to a full OpenHPC installation. We add a non-privileged user plainuser to leverage later, update the OS image with any available security patches, and then generally follow an OpenHPC installation recipe to add compilers, MPI, and other useful development tools.
2. The second RUN step works around an issue we would find later when attempting to run the image on Derecho. Specifically, the OpenHPC mpich-ofi package provides support for the long-deprecated MPI C++ interface. This is not present on Derecho with the cray-mpich implementation we will ultimately use to run the container. Since we do not need this support, here we hack the mpicxx wrapper so that it does not link in -lmpicxx, the problematic library.
3. The third and following RUN instructions steps create a directory space /opt/local we can use from our unprivileged plainuser account, copy in some more files, and then switch to plainuser to test the development environment by installing some common MPI benchmarks.

Discussion

OpenHPC v2 supports both OpenSUSE and Rocklinux 8 as its base OS. It would be natural to choose OpenSUSE for similarity to Casper and Derecho, however by choosing instead Rocklinux we gain access to a different build environment, which has benefits for developers looking to improve portability. This process followed here can also be thought of as a "roadmap" for deploying the application at similarly configured external sites.
OpenHPC supports openmpi and mpich MPI implementations, with the latter in two forms: mpich-ucx and mpich-ofi. In this example we intentionally choose mpich-ofi with prior knowledge of the target execution environment. On Derecho the primary MPI implementation is cray-mpich (itself forked from mpich) which uses an HPE-proprietary libfabric interface to the Slingshot v11 high-speed communication fabric.
Notice that each RUN step is finalized with a docker-clean command. This utility script removes temporary files and cached data to minimize the size of the resulting image layers. One consequence is that the first dnf package manager interaction in a RUN statement will re-cache these data. Since cached data are not relevant in the final image - especially when run much later on - we recommend removing it to reduce image bloat.
In this example we are intentional switching between root (the default user in the build process) and our unprivileged plainuser account. Particularly in development workflows, we want to be sure compilation and installation steps work properly as an unprivileged user, and tools such as the lmod module system and mpiexec often are intended not to be used as root.
Since MPI container runtime inregration can be a pain point at execution, we install OSU's and Intel's MPI benchmark suites to aid in deployment testing, independent of any user application.

Building the image

docker build --tag <dockerhub_username>/rocky8-openhpc-mpich:latest .

Adding CUDA & CUDA-aware MPICH¶

Adding CUDA + CUDA-aware MPICH

Next we add CUDA and add a CUDA-aware MPI installation. We choose a specific version of the open-source MPICH library (both to closely match what is provided by OpenHPC and for Derecho compatibility) and configure it to use the pre-existing OpenHPC artifacts (hwloc, libfabric) as dependencies. For both cuda and the new mpich we also install "modulefiles" so the new additions are available in the typical module environment. Finally, we re-install one of the MPI benchmark applications, this time with CUDA support.

rocky8/OpenHPC-cuda/Dockerfile

FROM benjaminkirk/rocky8-openhpc-mpich:latest

ARG COMPILER_VERSION=gnu9
ARG MPI_FAMILY=mpich
ARG MPI_FAMILY_VARIANT=mpich-ofi
ARG MPICH_VERSION=3.4.3
ARG OSU_VERSION=7.2

USER root

# https://developer.nvidia.com/cuda-11-7-1-download-archive
RUN echo "Cuda" \
    && curl -O https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-rhel8-11-7-local-11.7.1_515.65.01-1.x86_64.rpm \
    && rpm -i cuda-repo-rhel8-11-7-local-11.7.1_515.65.01-1.x86_64.rpm && rm -f cuda-repo-rhel8-11-7-local-11.7.1_515.65.01-1.x86_64.rpm \
    && dnf -y install cuda && rm /var/cuda-repo-rhel8-11-7-local/*.rpm && dnf config-manager --disable cuda-rhel8-11-7-local \
    && echo "RDMA prereqs" \
    && dnf -y install libibverbs-devel libpsm2-devel \
    && docker-clean

RUN mkdir /opt/ohpc/pub/moduledeps/${COMPILER_VERSION}/cuda
COPY extras/cuda-11.7 /opt/ohpc/pub/moduledeps/${COMPILER_VERSION}/cuda/11.7
COPY extras/mpich-${MPICH_VERSION}-ofi-cuda /opt/ohpc/pub/moduledeps/${COMPILER_VERSION}/mpich/${MPICH_VERSION}-ofi-cuda
COPY extras/hello_world.cu /home/plainuser

RUN mkdir -p /opt/local \
    && chown -R plainuser: /home/plainuser/ /opt/local

USER plainuser
SHELL ["/bin/bash", "-lc"]

RUN whoami && module avail \
    && module load -mpich +hwloc +libfabric +cuda && module list \
    && cd /opt/local && nvcc -o hello_cuda /home/plainuser/hello_world.cu --cudart shared \
    && cd /tmp && curl -sSL https://www.mpich.org/static/downloads/${MPICH_VERSION}/mpich-${MPICH_VERSION}.tar.gz | tar xz \
    && cd mpich-${MPICH_VERSION} \
    && ./configure --help \
    && ./configure --prefix=/opt/local/mpich-${MPICH_VERSION}-cuda \
                   CC=$(which gcc) CXX=$(which g++) FC=$(which gfortran) F77=$(which gfortran) \
                   --enable-fortran \
                   --with-libfabric=${LIBFABRIC_DIR} \
                   --with-hwloc-prefix=${HWLOC_DIR} \
                   --with-cuda=${CUDA_HOME} \
    && make -j 8 && make install \
    && docker-clean

# Prevent mpicxx from linking -lmpicxx, which we do not need, and cannot use on our Cray-EX
RUN sed -i 's/cxxlibs="-lmpicxx"/cxxlibs= #"-lmpicxx"/g' /opt/local/mpich-${MPICH_VERSION}-cuda/bin/mpicxx

RUN echo "Installing MPI benchmark applications" \
    && whoami && module avail \
    && module load mpich/${MPICH_VERSION}-ofi-cuda && module list \
    && echo "OSU:" \
    && cd /tmp && curl -Sl https://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-${OSU_VERSION}.tar.gz | tar xz \
    && cd osu-micro-benchmarks-${OSU_VERSION} \
    && ./configure --help \
    && ./configure --prefix=/opt/local/osu-micro-benchmarks-${OSU_VERSION} \
                   CXX=$(which mpicxx) CC=$(which mpicc) FC=$(which mpif90) F77=$(which mpif77) LIBS="-L${CUDA_HOME}/targets/x86_64-linux/lib -lcudart" \
                   --enable-cuda --with-cuda=${CUDA_HOME} \
    && make -j 8 V=0 && rm -rf /opt/local/osu-micro-benchmarks-${OSU_VERSION} && make install \
    && cd && rm -rf /tmp/osu-micro-benchmarks-${OSU_VERSION} \
    && cd /opt/local && mpicxx -o hello_world_mpi /home/plainuser/hello_world_mpi.C -fopenmp \
    && docker-clean

# Local Variables:
# mode: sh
# End:

Dockerfile Steps

We switch back to the root user so we can modify the operating system installation within the image.
The first RUN instruction installs a full CUDA development environment and some additional development packages required to build MPI later.
The next RUN instructions install modulefiles into the image so we can access the CUDA and (upcoming) MPICH installation, and clean up file permissions. The remaining steps are executed again as our unprivileged plainuser.
The fourth RUN instruction downloads, configures, and installs MPICH. The version is chosen to closely match the baseline MPICH already installed in the image and uses some of its dependencies, and we also enable CUDA support.
In the final RUN instruction we re-install one of the MPI benchmark applications, this time with CUDA support.

Discussion

There are several ways to install CUDA, here we choose a "local repo" installation because it allows us to control versions, but are careful also to remove the downloaded packages after installation, freeing up 3GB+ in the image.
The CUDA development environment is very large and it is difficult to separate unnecessary components, so is step increases the size of the image from ~1.2GB to 8.8GB. We leave all components in the development image, including tools we will very likely not need inside a container such as nsight-systems and nsight-compute. For applications built on top of this image, a user could optionally remove these components later to decrease their final image size (demonstrated next).

Building the image

docker build --tag <dockerhub_username>/rocky8-openhpc-mpich-cuda:latest .

Building FastEddy¶

Adding FastEddy

rocky8/OpenHPC-FastEddy/Dockerfile

FROM benjaminkirk/rocky8-openhpc-mpich-cuda:latest

ARG MPICH_VERSION=3.4.3

USER root

RUN echo "netcdf: serial netcdf from the base OS" \
    && yum -y install \
           netcdf-devel \
    && echo "removing unnecessary NVIDIA components to shrink container image" \
    && rm -rf /opt/local/nvidia /usr/local/cuda-11.7/targets/x86_64-linux/lib/*_static.a \
    && docker-clean

USER plainuser
RUN echo "FastEddy - source" \
    && cd /opt/local && git clone https://github.com/NCAR/FastEddy-model.git && git clone https://github.com/NCAR/FastEddy-tutorials.git \
    && cd FastEddy-model/SRC/FEMAIN \
    && sed -i 's/TEST_LIBS = -lm -lmpi -lstdc++ -lcurand/TEST_LIBS = -lm -lmpi -lstdc++ $(LIBS)/g' Makefile \
    && sed -i 's/TEST_CU_LIBS = -lm -lmpi -lcurand/TEST_CU_LIBS = -lm -lmpi $(LIBS)/g' Makefile \
    && docker-clean

RUN echo "FastEddy - build" \
    && module avail && module load mpich/${MPICH_VERSION}-ofi-cuda && module list \
    && cd /opt/local/FastEddy-model/SRC && fe_inc= && for d in */ */*/ ; do fe_inc="-I$(pwd)/${d} ${fe_inc}" ; done \
    && cd FEMAIN && make \
                        INCLUDES="${fe_inc} -I${MPI_DIR}/include/ -I${CUDA_HOME}/targets/x86_64-linux/include/" \
                        LIBS="-L${CUDA_HOME}/targets/x86_64-linux/lib -lcurand -lcudart -lcuda -L/usr/lib64 -lnetcdf" \
    && ldd ./FastEddy \
    && docker-clean

# Local Variables:
# mode: sh
# End:

Dockerfile Steps

Again we switch back to root for performing operating system level tasks, as our base image left us as plainuser.
The first RUN instruction installs the development package for NetCDF - an additional application dependency not already satisfied. We also remove some particularly large CUDA components from the development image not required in the final application image.
Then again as plainuser, the next RUN instruction downloads the FastEddy open-source variant. We make some changes to the definition of a few hard-coded make variables so that we can specify installation paths during linking later.
The final RUN instruction then builds FastEddy. We build up and use custom INCLUDE and LIBS variables, specifying some unique paths for the particular build environment.

Discussion

When building the image locally with Docker, the space savings from step (2) are not immediately apparent. This is a result of the Docker "layer" approach: the content still exists in the base layer and is only "logically" removed by the commands listed above. The space savings is realized on the HPC system when we "pull" the image with singularity.
If an even smaller container image is desired, even more components could be stripped: CUDA numerical libraries the application does not need, or even the containerized MPIs after we are done with them. As we will see next, we replace the container MPI with the host MPI at run-time, so technically no MPI is required inside the container when we are done using it for compilation.

Building the image

docker build --tag <dockerhub_username>/rocky8-openhpc-fasteddy:latest .

Pushing the image to Docker Hub

docker push <dockerhub_username>/rocky8-openhpc-fasteddy:latest

Running the container on Derecho¶

With the container built from the steps above (or simply pulling the resulting image from Docker Hub), we are now ready to run a sample test case on Derecho. We choose Example02_CBL.in from the FastEddy Tutorial and modify it to run on 24 GPUs (full steps listed here). The PBS job script listed below shows the steps required to "bind" the host MPI into the container.

Containerized FastEddy PBS Script

run_fasteddy_container.pbs

#!/bin/bash
#PBS -q main
#PBS -j oe
#PBS -o fasteddy_job.log
#PBS -l walltime=02:00:00
#PBS -l select=6:ncpus=64:mpiprocs=4:ngpus=4

module load ncarenv/23.09
module load apptainer gcc cuda || exit 1
module list

nnodes=$(cat ${PBS_NODEFILE} | sort | uniq | wc -l)
nranks=$(cat ${PBS_NODEFILE} | sort | wc -l)
nranks_per_node=$((${nranks} / ${nnodes}))

container_image="./rocky8-openhpc-fasteddy.sif"

singularity \
    --quiet \
    exec \
    ${container_image} \
    ldd /opt/local/FastEddy-model/SRC/FEMAIN/FastEddy

singularity \
    --quiet \
    exec \
    --bind ${SCRATCH} \
    --bind ${WORK} \
    --pwd $(pwd) \
    --bind /run \
    --bind /opt/cray \
    --bind /usr/lib64:/host/lib64 \
    --env LD_LIBRARY_PATH=${CRAY_MPICH_DIR}/lib-abi-mpich:/opt/cray/pe/lib64:${LD_LIBRARY_PATH}:/host/lib64 \
    --env LD_PRELOAD=/opt/cray/pe/mpich/${CRAY_MPICH_VERSION}/gtl/lib/libmpi_gtl_cuda.so.0 \
    ${container_image} \
    ldd /opt/local/FastEddy-model/SRC/FEMAIN/FastEddy



echo "# --> BEGIN execution"; tstart=$(date +%s)

mpiexec \
    --np ${nranks} --ppn ${nranks_per_node} --no-transfer \
    set_gpu_rank \
    singularity \
    --quiet \
    exec \
    --bind ${SCRATCH} \
    --bind ${WORK} \
    --pwd $(pwd) \
    --bind /run \
    --bind /opt/cray \
    --bind /usr/lib64:/host/lib64 \
    --env LD_LIBRARY_PATH=${CRAY_MPICH_DIR}/lib-abi-mpich:/opt/cray/pe/lib64:${LD_LIBRARY_PATH}:/host/lib64 \
    --env LD_PRELOAD=/opt/cray/pe/mpich/${CRAY_MPICH_VERSION}/gtl/lib/libmpi_gtl_cuda.so.0 \
    --env MPICH_GPU_SUPPORT_ENABLED=1 \
    --env MPICH_GPU_MANAGED_MEMORY_SUPPORT_ENABLED=1 \
    --env MPICH_SMP_SINGLE_COPY_MODE=NONE \
    ${container_image} \
    /opt/local/FastEddy-model/SRC/FEMAIN/FastEddy \
    ./Example02_CBL.in

echo "# --> END execution"
echo $(($(date +%s)-${tstart})) " elapsed seconds; $(date)"

Discussion

The mpiexec command is fairly standard. Note that we are using it to launch singularity, which in turn will start up the containerized FastEddy executable.

The singularity exec command line is complex, so let's deconstruct it here:

We make use of the --bind argument first to mount familiar GLADE file systems within the container,
and again to "inject" the host MPI into the container (as described here). The /run directory necessity is not immediately obvious but is used by Cray-MPICH as part of the launching process.
We also need to use the --env to set the LD_LIBRARY_PATH inside the image so that the application can find the proper host libraries. Recall when we built the FastEddy executable in the containerized environment it had no knowledge of these host-specific paths. Similarly, we use --env to set the LD_PRELOAD environment variable inside the container. This will cause a particular Cray-MPICH library to be loaded prior to application initialization. This step is not required for "bare metal" execution.
We set some important Cray-MPICH specific MPICH_* environment variables as well to enable CUDA-awareness (MPICH_GPU_*) and work around an MPI run-time error (MPICH_SMP_SINGLE_COPY_MODE) that will otherwise appear.
Finally, a note on the --bind /usr/lib64:/host/lib64 argument. Injecting the host MPI requires that some shared libraries from the host's /usr/lib64 directory be visible inside the image. However, this path also exists inside the image and contains other libraries needed by the application. We cannot simply bind the hosts directory into the same path, doing so will mask these other libraries. So we bind the host's /usr/lib64 into the container image at /host/lib64, and make sure this path is set in the LD_LIBRARY_PATH variable as well. Because we want these particular host libraries found as last resort (not taking precedence over similar libraries in the container, we append /host/lib64 to the LD_LIBRARY_PATH search path.

The arguments above were determined iteratively through trial and error. Such is the reality of containerized MPI applications and proprietary host MPI integration. Feel free to experiment with the PBS file, omitting some of the --bind and --env arguments and observing the resulting error message, however do NOT modify the MPICH_GPU_* variables, doing so may trigger a very unfortunate kernel driver bug and render the GPU compute nodes unusable.

Pulling the image

We begin with pulling the image from Docke Hub and constructing a SIF. (If you want to test your own built/pushed image, replace benjaminkirk with your own <dockerhub_username> as specified in the tag/push operations listed above.)

derecho$ singularity pull rocky8-openhpc-fasteddy.sif docker://benjaminkirk/rocky8-openhpc-fasteddy:latest
[...]

derecho$ ls -lh rocky8-openhpc-fasteddy.sif
-rwxr-xr-x 1 someuser ncar 3.1G Dec  5 17:08 rocky8-openhpc-fasteddy.sif

Running the job

derecho$ mkdir ./output
derecho$ qsub -A <account> run_fasteddy_container.pbs
derecho$ tail -f fasteddy_job.log

"Faking" a native installation of containerized applications¶

Occasionally it can be beneficial to "hide" the fact that a particular application is containerized, typically to simplify the user interface and usage experience. In this section we follow a clever approach deployed by the NIH Biowulf team and outlined here to enable users to interact transparently with containerized applications without needing to know any details of the run-time (singularity, ch-run, etc...).

The basic idea is to create a wrapper.sh shell script that

Infers the name of the containerized command to run,
Invokes the chosen run-time transparently to the user, and
Passes along any command-line arguments to the containerized application.

Consider the following directory tree structure, taken from a production deployment:

Directory tree for 'faking' native installation of containerized applications

/glade/u/apps/opt/leap-container/15/
├── bin/
│   ├── eog -> ../libexec/wrap_singularity.sh
│   ├── evince -> ../libexec/wrap_singularity.sh
│   ├── gedit -> ../libexec/wrap_singularity.sh
│   ├── geeqie -> ../libexec/wrap_singularity.sh
│   ├── gimp -> ../libexec/wrap_singularity.sh
│   ├── gv -> ../libexec/wrap_singularity.sh
│   ├── smplayer -> ../libexec/wrap_singularity.sh
│   ├── vlc -> ../libexec/wrap_singularity.sh
│   └── xfig -> ../libexec/wrap_singularity.sh
└── libexec/
    ├── Makefile
    ├── ncar-casper-gui_tools.sif
    └── wrap_singularity.sh

At the top level, we simply have two directories: ./bin/ (which likely will go into the user's PATH) and ./libexec/ (where we will hide implementation details).

Constructing the bin directory

The ./bin/ directory contains symbolic links to the wrap_singularity.sh script, where the name of the symbolic link is the containerized application to run. For the example above, when a user runs ./bin/gv for example, it will invoke the wrap_singularity.sh "behind the scenes." In general there can be many application symbolic links in the ./bin/ directory, so long as the desired application exists within the wrapped container image.

The wrap_singularity.sh wrapper script

The wrap_singularity.sh script is written such that whatever symbolic links you create to it will run inside of the container, inferring the application name from that of the symbolic link.

wrap_singularity.sh
#!/bin/bash

#----------------------------------------------------------------------------
# environment
topdir="$(pwd)"
selfdir="$(dirname $(readlink -f ${BASH_SOURCE[0]}))"
requested_command="$(basename ${0})"
container_img="ncar-casper-gui_tools.sif"
#----------------------------------------------------------------------------

type module >/dev/null 2>&1 || source /etc/profile.d/z00_modules.sh
module load apptainer || exit 1

cd ${selfdir} || exit 1
[ -f ${container_img} ] || { echo "Cannot locate ${container_img}"; exit 1; }

cd ${topdir} || exit 1

singularity \
    --quiet \
    exec \
    -B /glade/campaign \
    -B /glade/derecho/scratch \
    -B /glade/work \
    -B /glade/u \
    ${selfdir}/${container_img} \
    ${requested_command} ${@}

Specifically:

The command to execute is inferred from the shell argument ${0} - the name of the script being executed. Here is where the symbolic links from ./bin are important: If the symbolic link ./bin/gv is invoked, for example, the script above will execute with the name gv. This is accessible within the script as ${0}, and is stored in the requested_command variable on line 7.
Any command-line arguments passed to the executable are captured in the ${@} environment variable, and are passed directly through as command-line arguments to the containerized application (line 27).
We bind-mount the usual GLADE file systems so that expected data are accessible (lines 22-25).
In this example we execute all commands in the same base container ncar-casper-gui_tools.sif (specified on line 8). This is the simplest approach, however strictly not required. (A more complex treatment could "choose" different base containers for different commands using a bash case statement, for example, if desired.)
The container is launched with the users' directory topdir as the working directory. This is required so that any relative paths specified are handled properly.
In order to robustly access the required apptainer module, we first check to see if the module command is recognized and if not initialize the module environment (line 11), then load the apptainer module (line 12). This allows the script to function properly even when the user does not have the module system initialized in their environment - a rare but an occasional issue.

While the example above wraps the Apptainer run-time, a similar approach works for Charliecloud and Podman as well if desired.

→