Casper cluster¶

The Casper cluster is a system of specialized data analysis and visualization resources; large-memory, multi-GPU nodes; and high-throughput computing nodes.

Casper is composed of over 100 nodes featuring a mixture of Intel and AMD processors, with a variety of NVIDIA General Purpose Graphical Processing Units.

Please refer to the hardware summary table below for detailed specifications.

Quick Start¶

Logging in¶

Once you have an account, have reviewed the Casper Use Policies, and have a Casper resource allocation you can log in and run jobs on the Casper data analysis and visualization cluster.

To log in, start your terminal or Secure Shell client and run an ssh command as shown here:

ssh -X username@casper.hpc.ucar.edu

Some users (particularly on Macs) need to use -Y instead of -X when calling SSH to enable X11 forwarding.

You can omit username in the command above if your Casper username is the same as your username on your local computer.

After running the ssh command, you will be asked to authenticate to finish logging in.

Casper has full access to NSF NCAR storage resources, including GLADE. Users can transfer data to and from Casper.

To run data analysis and visualization jobs on the Casper system's nodes, follow the procedures described here. There is no need to transfer output files from Derecho for this since Derecho and Casper mount the same GLADE file systems.

Don’t run sudo on NSF NCAR systems!

If you need help with tasks that you think require sudo privileges, or if you aren’t sure, please contact HPC User Support before trying to run sudo yourself. The command fails when unauthorized users run it and sends a security alert to system administrators.

Environment¶

The Casper HPC system uses OpenSUSE Linux Version 15 and supports widely used shells on its login and compute nodes. Users also have several compiler and MPI library choices.

Shells¶

The default login shell for new Casper users is bash. You can change the default after logging in to the Systems Accounting Manager (SAM). It may take several hours for a change you make to take effect. You can confirm which shell is set as your default by entering echo $SHELL on your Casper command line.

Environment modules¶

The Casper module utility enables users to easily load and unload compilers and compatible software packages as needed, and to create multiple customized environments for various tasks. See the Environment modules page for a general discussion of module usage. Casper's default module environment is listed here.

Accessing software and compiling code¶

Casper users have access to Intel, NVIDIA, and GNU compilers. The Intel compiler and OpenMPI modules are loaded by default and provide access to pre-compiled HPC Software and Data Analysis and Visualization Resources.

See this page for a full discussion of compiling on Casper.

Many Casper data analysis and AI/ML workflows benefit instead from using Conda, especially NSF NCAR's Python Library (NPL) or to gain access to several Machine Learning Frameworks.

Running jobs on Casper¶

Users can run a variety of types of jobs on Casper, including both traditional batch jobs submitted through PBS and also interactive and/or graphics-intensive analysis, often through remote desktops on Casper.

Job scripts¶

Job scripts are discussed broadly here. Users already familiar with PBS and batch submission may find Casper-specific PBS job scripts helpful in porting their work.

Casper hardware¶

Data Analysis & Visualization nodes	22 Supermicro 7049GP-TRT SuperWorkstation nodes Up to 384 GB DDR4-2666 memory per node 2 18-core 2.3-GHz Intel Xeon Gold 6140 (Skylake) processors per node 2 TB local NVMe Solid State Disk 1 Mellanox ConnectX-4 100Gb Ethernet connection (GLADE, Campaign Storage, external connectivity) 1 Mellanox ConnectX-6 HDR100 InfiniBand link 1 NVIDIA Quadro GP100 GPU 16GB PCIe on each of 9 nodes 1 NVIDIA Ampere A100 GPU 40 GB PCIe on each of 3 nodes
Machine Learning/Deep Learning & General Purpose GPU (GPGPU) nodes	4 Supermicro SuperServer nodes with 4 V100 GPUs 768 GB DDR4-2666 memory per node 2 18-core 2.6-GHz Intel Xeon Gold 6240 (Cascade Lake) processors per node 2 TB local NVMe Solid State Disk 1 Mellanox ConnectX-4 100Gb Ethernet connection (GLADE, Campaign Storage, external connectivity) 2 Mellanox ConnectX-6 HDR200 InfiniBand adapters. HDR100 link on each CPU socket 4 NVIDIA Tesla V100 32GB SXM2 GPUs with NVLink 6 Supermicro SuperServer nodes with 8 V100 GPUs 1152 GB DDR4-2666 memory per node 2 18-core 2.6-GHz Intel Xeon Gold 6240 (Cascade Lake) processors per node 2 TB local NVMe Solid State Disk 1 Mellanox ConnectX-4 100Gb Ethernet connection (GLADE, Campaign Storage, external connectivity) 2 Mellanox ConnectX-6 HDR200 InfiniBand adapters, HDR100 link on each CPU socket 8 NVIDIA Tesla V100 32GB SXM2 GPUs with NVLink 8 Supermicro nodes with 4 A100 GPUs 1024 GB memory per node 2 64-core 2.45-GHz AMD EPYC Milan 7763 processors per node 1.5 TB local NVMe Solid State Disk 4 Mellanox ConnectX-6 network adapters 4 NVIDIA Ampere A100 80GB SXM4 GPUs with NVLink
High-Throughput Computing nodes	62 small-memory workstation nodes 384 GB DDR4-2666 memory per node 2 18-core 2.6-GHz Intel Xeon Gold 6240 (Cascade Lake) processors per node 1.6 TB local NVMe Solid State Disk 1 Mellanox ConnectX-5 100Gb Ethernet VPI adapter (GLADE, Campaign Storage, external connectivity) 1 Mellanox ConnectX-6 HDR200 InfiniBand VPI adapter. HDR100 link on each CPU socket 2 large-memory workstation nodes 1.5 TB DDR4-2666 memory per node 2 18-core 2.3-GHz Intel Xeon Gold 6240 (Cascade Lake) processors per node 1.6 TB local NVMe Solid State Disk 1 Mellanox ConnectX-5 100Gb Ethernet VPI adapter (GLADE, Campaign Storage, external connectivity) 1 Mellanox ConnectX-6 HDR200 InfiniBand VPI adapter, HDR100 link on each CPU socket
Research Data Archive nodes (reserved for RDA use)	4 Supermicro Workstation nodes 94 GB DDR4-2666 memory per node 2 16-core 2.3-GHz Intel Xeon Gold 5218 (Cascade Lake) processors per node 1.92 TB local Solid State Disk 1 Mellanox ConnectX-6 VPI 100Gb Ethernet connection (GLADE, Campaign Storage, internal connectivity)

Casper cluster¶

Quick Start¶

Logging in¶

Environment¶

Shells¶

Environment modules¶

Accessing software and compiling code¶

Running jobs on Casper¶

Job scripts¶

Casper hardware¶

Status¶

Nodes¶

GPU Usage¶

Queues¶