Derecho¶
Installed in 2023, Derecho is NCAR's latest supercomputer. Derecho features 2,488 compute nodes with 128 AMD Milan cores per node and 82 nodes with four NVIDIA A100 GPUs each. The HPE Cray EX cluster is a 19.87-petaflops system that is expected to deliver about 3.5 times the scientific throughput of the Cheyenne system. Additional hardware details are available below.
Derecho Monthly Maintenance Windows
The first Tuesday of each month is generally be reserved for Derecho system maintenance, as required to incorporate updates to Derecho's configuration and software during the early phase of the system lifespan. Scheduled outages may occasionally be extended (or canceled) as necessary on a case by case basis.
NCAR/CISL works closely with the system vendor to determine the availability and criticality of software updates, and will re-evaluate the necessity of these monthly outages as the system and supporting software matures.
Estimating Derecho Allocation Needs
Derecho users can expect to see a 1.3x improvement over the Cheyenne system's performance on a core-for-core basis. Therefore, to estimate how many CPU core-hours will be needed for a project on Derecho, multiply the total for a Cheyenne project by 0.77.
When requesting an allocation for Derecho GPU nodes, please make your request in terms of GPU-hours (number of GPUs used x wallclock hours). We encourage researchers to estimate GPU-hour needs by making test/benchmark runs on Casper GPUs, but will accept estimates based on runs on comparable non-NCAR, GPU-based systems.
Quick Start¶
Logging in¶
Once you have an account, have reviewed the Derecho Use Policies, and have a Derecho resource allocation you can log in and run jobs on the Derecho data analysis and visualization cluster.
To log in, start your terminal or Secure Shell client and run an ssh command as shown here:
Some users (particularly on Macs) need to use -Y
instead of -X
when calling SSH to enable X11 forwarding.
You can omit username
in the command above if your Derecho username is the same as your username on your local computer.
After running the ssh
command, you will be asked to authenticate to finish logging in.
Derecho has full access to NCAR storage resources, including GLADE. Users can transfer data to and from Derecho.
To run data analysis and visualization jobs on the Derecho system's nodes, follow the procedures described here. There is no need to transfer output files from Derecho for this since Derecho and Casper mount the same GLADE
file systems.
Don’t run sudo
on NCAR systems!
If you need help with tasks that you think require sudo
privileges, or if you aren’t sure, please contact HPC User Support before trying to run sudo
yourself. The command fails when unauthorized users run it and sends a security alert to system administrators.
Environment¶
The Derecho HPC system uses a Cray variant of SUSE Enterprise Linux and supports widely used shells on its login and compute nodes. Users also have several compiler and MPI library choices.
Shells¶
The default login shell for new Derecho users is bash
. You can change the default after logging in to the Systems Accounting Manager (SAM). It may take several hours for a change you make to take effect. You can confirm which shell is set as your default by entering echo $SHELL
on your Derecho command line.
Environment modules¶
The Derecho module
utility enables users to easily load and unload compilers and compatible software packages as needed, and to create multiple customized environments for various tasks. See the Environment modules page for a general discussion of module
usage. Derecho's default module environment is listed here.
Accessing software and compiling code¶
Derecho users have access to Intel, NVIDIA, and GNU compilers. The Intel compiler and OpenMPI modules are loaded by default and provide access to pre-compiled HPC Software and Data Analysis and Visualization Resources.
See this page for a full discussion of compiling on Derecho.
Many Derecho data analysis and AI/ML workflows benefit instead from using Conda, especially NCAR's Python Library (NPL) or to gain access to several Machine Learning Frameworks.
Running jobs on Derecho¶
Users can run a variety of types of jobs on Derecho, including both traditional batch jobs submitted through PBS and also interactive and/or graphics-intensive analysis, often through remote desktops on Derecho.
Job scripts¶
Job scripts are discussed broadly here. Users already familiar with PBS and batch submission may find Derecho-specific PBS job scripts helpful in porting their work.
Derecho Hardware¶
323,712 processor cores | 3rd Gen AMD EPYC™ 7763 Milan processors |
---|---|
2,488 CPU-only computation nodes | Dual-socket nodes, 64 cores per socket 256 GB DDR4 memory per node |
82 GPU nodes | Single-socket nodes, 64 cores per socket 512 GB DDR4 memory per node 4 NVIDIA 1.41 GHz A100 Tensor Core GPUs per node 600 GB/s NVIDIA NVLink GPU interconnect |
328 total A100 GPUs | 40GB HBM2 memory per GPU 600 GB/s NVIDIA NVLink GPU interconnect |
6 CPU login nodes | Dual-socket nodes with AMD EPYC™ 7763 Milan CPUs 64 cores per socket 512 GB DDR4-3200 memory |
2 GPU development and testing nodes | Dual-socket nodes with AMD EPYC™ 7543 Milan CPUs 32 cores per socket 2 NVIDIA 1.41 GHz A100 Tensor Core GPUs per node 512 GB DDR4-3200 memory |
692 TB total system memory | 637 TB DDR4 memory on 2,488 CPU nodes 42 TB DDR4 memory on 82 GPU nodes 13 TB HBM2 memory on 82 GPU nodes |
HPE Slingshot v11 high-speed interconnect | Dragonfly topology, 200 Gb/sec per port per direction 1.7-2.6 usec MPI latency CPU-only nodes - one Slingshot injection port GPU nodes - 4 Slingshot injection ports per node |
~3.5 times Cheyenne computational capacity | Comparison based on the relative performance of CISL's High Performance Computing Benchmarks run on each system. |
> 3.5 times Cheyenne peak performance | 19.87 peak petaflops (vs 5.34) |