Skip Nav

ARL DSRC Introductory Guide

Table of Contents

1. Introduction

This document provides a system overview and introduction on usage of the ARL DSRC HPCMP unclassified and classified HPC systems. There is one unclassified system available for user access, Pershing, and two classified systems for user access, MRAP and Hercules. There is also an unclassified and classified Utility Server.

2. Accessing ARL DSRC Systems

The ARL DSRC unclassified systems are accessible through DREN to all active customers via standard Kerberos commands. Customers may access any of the interactive login nodes on Pershing, and the Utility Server with Kerberized versions of rlogin and ssh. The login nodes are available for users to edit and submit jobs, and review completed job output.

File transfers between local and remote systems can be accomplished via the scp or the mpscp commands.

Kerberos binaries can be downloaded from HPC Centers: Kerberos & Authentication.

3. Obtaining an Account

The process of getting an account on the HPC systems at any of the DSRCs begins with getting an account on the HPCMP Portal to the Information Environment, commonly called a "pIE User Account". If you do not yet have a pIE User Account, please visit HPC Centers: Obtaining An Account and follow the instructions there. Once you have an active pIE User Account, visit the ARL accounts page for instructions on how to request accounts on the ARL DSRC HPC systems. If you need assistance with any part of this process, please contact CCAC at accounts@ccac.hpc.mil.

4. System Overviews

4.1. Unclassified Systems

Pershing

pershing.arl.hpc.mil
IBM iDataPlex - 420 TFLOPS
Login Nodes Compute Nodes
Standard Memory Large Memory
Total Nodes 8 1092 168
Operating System RedHat Linux RedHat Linux
Cores/Node 16 16
Core Type Intel 8-core Sandy Bridge Intel 8-core Sandy Bridge
Core Speed 2.6 GHz 2.6 GHz
Memory/Node 32 GBytes 32 GBytes 64 GBytes
Accessible Memory/Node 2 GBytes 29 GBytes 56 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type 10 Gigabit Ethernet FDR-10 Infiniband

Utility Server

us.arl.hpc.mil
Appro Xtreme-X Series - 27 TFLOPS
Login Nodes Compute Nodes
Total Nodes 2 44
Operating System RHEL 6.4 RHEL 6.4
Cores/Node 16 16
Core Type AMD Opteron 6134 Magny-Cours (x2) AMD Opteron 6134 Magny-Cours (x2)
Core Speed 2.3 GHz 2.3 GHz
GPU Type N/A N/A
Memory/Node 64 GBytes 128 GBytes
Accessible Memory/Node 62 GBytes 126 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type QDR Infiniband QDR Infiniband
Graphics Nodes Large Memory Nodes
Total Nodes 22 22
Operating System RHEL 6.4 RHEL 6.4
Cores/Node 16 32
Core Type AMD Opteron 6134 Magny-Cours (x2) AMD Opteron 6134 Magny-Cours (x4)
Core Speed 2.3 GHz 2.3 GHz
GPU Type NVIDIA Tesla M2050 N/A
Memory/Node 128 GBytes 256 GBytes
Accessible Memory/Node 126 GBytes 250 GBytes
Memory Model Shared on node.
Distributed across cluster.
Shared on node.
Distributed across cluster.
Interconnect Type QDR Infiniband QDR Infiniband

Utility Server 2

us2
Appro Xtreme-X Series
Login Nodes Compute Nodes
Total Nodes 2 44
Operating System RHEL 6.4 RHEL 6.4
Cores/Node 16 16
Core Type AMD Opteron 6134 Magny-Cours (x2) AMD Opteron 6134 Magny-Cours (x2)
Core Speed 2.3 GHz 2.3 GHz
GPU Type N/A N/A
Memory/Node 64 GBytes 128 GBytes
Accessible Memory/Node 62 GBytes 123 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type QDR Infiniband QDR Infiniband
Graphics Nodes Large Memory Nodes
Total Nodes 8 4
Operating System RHEL 6.4 RHEL 6.4
Cores/Node 16 32
Core Type AMD Opteron 6134 Magny-Cours (x2) AMD Opteron 6134 Magny-Cours (x4)
Core Speed 2.3 GHz 2.3 GHz
GPU Type NVIDIA Tesla M2050 N/A
Memory/Node 128 GBytes 256 GBytes
Accessible Memory/Node 123 GBytes 250 GBytes
Memory Model Shared on node.
Distributed across cluster.
Shared on node.
Distributed across cluster.
Interconnect Type QDR Infiniband QDR Infiniband

HTL

htutil.arl.hpc.mil
Appro Xtreme-X Series
Login Nodes Compute Nodes
Total Nodes 1 6
Operating System RHEL 6.4 RHEL 6.4
Cores/Node 16 16
Core Type AMD Opteron Interlagos AMD Opteron Interlagos
Core Speed 2.3 GHz 2.3 GHz
GPU Type N/A N/A
Memory/Node 64 GBytes 128 GBytes
Accessible Memory/Node 62 GBytes 126 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type QDR Infiniband QDR Infiniband
Graphics Nodes Large Memory Nodes
Total Nodes 3 3
Operating System RHEL 6.4 RHEL 6.4
Cores/Node 16 32
Core Type AMD Opteron Interlagos AMD Opteron Interlagos
Core Speed 2.3 GHz 2.3 GHz
GPU Type NVIDIA Tesla M2050 N/A
Memory/Node 128 GBytes 256 GBytes
Accessible Memory/Node 126 GBytes 250 GBytes
Memory Model Shared on node.
Distributed across cluster.
Shared on node.
Distributed across cluster.
Interconnect Type QDR Infiniband QDR Infiniband

4.2. Classified Systems

MRAP

mrap
Cray XT5 - 162 TFLOPS
Login Nodes esLogin Nodes Compute Nodes
Total Nodes 8 4 1300
Operating System SLES 10.3 SLES 10.3 CLE 2.2.02
Cores/Node 2 16 12
Core Type One 64-bit AMD Opteron (Barcelona) Quad 64-bit AMD Opteron (Barcelona) Six-core 64-bit AMD Opteron (Istanbul)
Core Speed 2.6 GHz 2.4 GHz 2.6 GHz
Memory/Node 4 GBytes 128 GBytes 32 GBytes
Accessible Memory/Node 4 GBytes 120 GBytes 25 GBytes
Memory Model Shared on node. Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type 100/1000 Ethernet 100/1000 Ethernet SeaStar

Hercules

hercules
IBM iDataPlex - 360 TFLOPS
Login Nodes Compute Nodes
Total Nodes 8 1092
Operating System RedHat Enterprise Linux RedHat Enterprise Linux
Cores/Node 16 16
Core Type Intel 8-core Sandy Bridge Intel 8-core Sandy Bridge
Core Speed 2.6 GHz 2.6 GHz
Memory/Node 64 GBytes 64 GBytes
Accessible Memory/Node 2 GBytes 60 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type 10 Gigabit Ethernet FDR-10 Infiniband

5. Login Files

When an account is created at the ARL DSRC, a default .cshrc, and/or .profile file is placed into your home directory. This file contains the default modules setup to configure modules, PBS and other system defaults. We suggest you customize the following: .cshrc.pers, or .profile.pers for your shell with any paths, aliases or libraries you may need to load. The files should be sourced at the end of your .cshrc, and/or .profile file as necessary. For example:

if (-f $HOME/.cshrc.pers) then
source $HOME/.cshrc.pers
endif

If you need to connect to other Kerberized systems within the program, you should use krlogin or /usr/brl/bin/ssh. If you use Kerberized ssh often, you may want to add an alias in your .cshrc.pers, or .profile.pers files in $HOME , as follows:

alias ssh /usr/brl/bin/ssh # .cshrc.pers - csh/tcsh
alias ssh=/usr/brl/bin/ssh # .profile.pers - sh/ksh/bash

6. File Systems

All users will be given a new home directory on the login and compute nodes, named /usr/people/username. When you login, you will automatically be placed in your local /usr/people home directory. In addition, all users are given accounts on the archive storage system, /home/service/username on the classified systems and /archive/service/username on the unclassified systems. Users are also given space on the center-wide file system, /p/cwfs/username. While the login nodes of ARL DSRC systems have the same connectivity (over NFS) to the /home, /archive, and /p/cwfs file systems, the compute nodes do not. When your job script runs on a compute node, it will not be able to access your /home, /archive, or /p/cwfs directories. Therefore, you will need to pre-stage your input files from a login node to the scratch area, /usr/var/tmp ($WORKDIR), before submitting your jobs. Similarly, output files will need to be staged back to the archive system and center-wide file system. This may be done manually, or through the "transfer" PBS queue which runs serial jobs on a login node. It is recommended that all important files in your /usr/people home area be copied to /archive or /home as well.

The scratch filesystem /usr/var/tmp should be used for active temporary data storage and batch processing. A system "scrubber" will monitor utilization of the scratch space and files not accessed within fifteen days on the unclassified scratch file system (or thirty days on the classified scratch file system) are subject to removal, but may remain longer if the space permits. There are no exceptions to this policy. Customers who wish to keep files for long-term storage should copy files selected for retention back into their /home or /archive directories to avoid data loss by the "scrubber." Customers are responsible for archiving files from the scratch filesystems. This filesystem is considered volatile working storage and no automated backups will be performed.

Please do not use /tmp or /var/tmp for temporary storage!

7. Software

For a complete list of all the application, programming, system tools and scientific visualization software available at the ARL DSRC see our Software List.

8. Batch Processing

Batch queuing systems are used to control access to the compute nodes of large scale clusters, such as the systems deployed at the ARL DSRC. Without a queuing system, users could overload systems, resulting in tremendous performance degradation. It is the job of a queuing system to regulate processing on each system to maximize job throughput while not overloading the system. The queuing system will run your job as soon as it can while honoring the following:

  • Meets your resource requests
  • Does not overload systems
  • Runs higher priority jobs first

Batch jobs for all HPC systems at the ARL DSRC are submitted utilizing the PBS Professional queuing system. The PBS module should be automatically loaded at startup/login, allowing you access to the PBS commands.

For information on using PBS, please see the appropriate system user guide or system PBS guide listed at the end of this page.

9. Advance Reservation Service (ARS)

In addition to normal batch processing users may request date/time specific access to a dedicated number of nodes through the Advance Reservation Service (ARS). The ARS provides a web-based interface to batch schedulers on computational resources across the HPCMP program. This system allows allocated users to reserve resources for later usage, at specific times, for specific durations, and works in tandem with selected schedulers to allow restricted access to those reserved resources. Once you are logged in, an ARS User's Guide is available on the ARS home page to assist you in using the system.

10. Contacting the Consolidated Customer Assistance Center

Questions, comments and suggestions are always welcome. If you have questions about this guide, or any of the ARL DSRC's assets, please contact the DoD HPCMP's Consolidated Customer Assistance Center in any of the following ways: