Skip Nav

IBM iDataPlex (Pershing)
User Guide

Table of Contents

1. Introduction

1.1. Document Scope and Assumptions

This document provides an overview and introduction to the use of the IBM iDataPlex (Pershing) located at the ARL DSRC,along with a description of the specific computing environment on Pershing. The intent of this guide is to provide information that will enable the average user to perform computational tasks on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:

  • Use of the UNIX operating system
  • Use of an editor (e.g., vi or emacs)
  • Remote usage of computer systems via network or modem access
  • A selected programming language and its related tools and libraries

1.2. Policies to Review

Users are expected to be aware of the following policies for working on Pershing.

1.2.1. Login Node Abuse Policy

Memory or CPU intensive programs running on the login nodes can significantly affect all users of the system. Therefore, only small applications requiring less than 10 minutes of runtime and less than 2 GBytes of memory are allowed on the login nodes. Any job running on the login nodes that exceeds these limits may be unilaterally terminated.

1.2.2. Workspace Purge Policy

The /usr/var/tmp directory is subject to a fifteen-day purge policy. A system "scrubber" monitors scratch space utilization, and if available space becomes low, files not accessed within fifteen days are subject to removal, although files may remain longer if the space permits. There are no exceptions to this policy.

1.3. Obtaining an Account

The process of getting an account on the HPC systems at any of the DSRCs begins with getting an account on the HPCMP Portal to the Information Environment, commonly called a "pIE User Account". If you do not yet have a pIE User Account, please visit HPC Centers: Obtaining An Account and follow the instructions there. Once you have an active pIE User Account, visit the ARL accounts page for instructions on how to request accounts on the ARL DSRC HPC systems. If you need assistance with any part of this process, please contact CCAC at

1.4. Requesting Assistance

The Consolidated Customer Assistance Center (CCAC) is available to help users with unclassified problems, issues, or questions. Analysts are on duty 8:00 a.m. - 11:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).

You can contact the ARL DSRC directly in any of the following ways for support services not provided by CCAC:

For more detailed contact information, please see our Contact Page.

2. System Configuration

2.1. System Summary

Pershing is an IBM iDataPlex. The login and compute nodes are populated with two Intel Sandy Bridge 8-core processors. Pershing uses the FDR 10 Infiniband interconnect in a Fat Tree configuration as its high-speed network for MPI messages and IO traffic. Pershing uses IBM's General Parallel File System (GPFS) to manage its parallel file system that targets IBM's IS4600 (Infinite Storage) RAID arrays. Pershing has 1,260 compute nodes that share memory only on the node; memory is not shared across the nodes. Each compute node has two 8-core processors (16 cores) with its own Red Hat Enterprise Linux OS, sharing 32 GBytes of memory, with no user-accessible swap space. Pershing is rated at 420 peak TFLOPS and has 2.5 PBytes (formatted) of disk storage.

Pershing is intended to be used as a batch-scheduled HPC system. Its login nodes are not to be used for large computational (memory, IO, long executions) work. All executions that require large amounts of system resources must be sent to the compute nodes by batch job submission.

Node Configuration
Login Nodes Compute Nodes
Standard Memory Large Memory
Total Nodes 8 1092 168
Operating System RedHat Linux
Cores/Node 16
Core Type Intel 8-core Sandy Bridge
Core Speed 2.6 GHz
Memory/Node 32 GBytes 64 GBytes
Accessible Memory/Node 2 GBytes 29 GBytes 56 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type 10 Gigabit Ethernet FDR-10 InfiniBand
File Systems on Pershing
Path Capacity Type
/usr/var/tmp2.2 PBytesGPFS
/usr/people102 TBytesGPFS
/usr/cta15 TBytesGPFS
/archive160 TBytesNFS

2.2. Processors

Pershing uses 2.6-GHz Intel Sandy Bridge processors on its login and compute nodes. There are 2 processors per node, each with 8 cores, for a total of 16 cores per node. In addition, these processors have 8x256 KBytes of L2 cache, and 12x6 MBytes of L3 cache.

2.3. Memory

Pershing uses both shared and distributed memory models. Memory is shared among all the cores on a node, but is not shared among the nodes across the cluster.

Each login node contains 32 GBytes of main memory. All memory and cores on the node are shared among all users who are logged in. Therefore, users should not use more than 2 GBytes of memory at any one time.

1092 compute nodes contain 28 GBytes of user-accessible shared memory, and 168 compute nodes contain 56 GBytes of user-accessible shared memory.

2.4. Operating System

The operating system on Pershing is RedHat Linux. The operating system supports 64-bit software.

2.5. File Systems

Pershing has the following file systems available for user storage:

2.5.1. /usr/people

This file system is locally mounted from Pershing's GPFS file system. It has a formatted capacity of 102 TBytes. All users have a home directory located on this file system which can be referenced by the environment variable $HOME.

2.5.2. /usr/var/tmp and /usr/cta

These directories share Pershing's locally mounted GPFS file system. It has a formatted capacity of 2.2 PBytes. All users have a work directory located on /usr/var/tmp which can be referenced by the environment variable $WORKDIR. All center-managed COTS packages are stored in /usr/cta. In addition, users may request space in this area under /usr/cta/unsupported to store user-managed software packages that they wish to make available to other owner-designated users. To have space allocated in /usr/cta/unsupported a user should submit a request to the ARL DSRC helpdesk, either through CCAC or by directly contacting the ARL DSRC helpdesk. Requests will be processed on a first-come, first-served basis.

2.5.3. /archive

This NFS mounted file system is accessible from the login nodes on Pershing. Files in this file system are subject to migration to tape and access may be slower due to the overhead of retrieving files from tape. It has a formatted capacity of 160 TBytes with a petascale archival tape storage system. The disk portion of the file system is automatically backed up. Users should migrate all large input and output files to this area for long-term storage. Users should also migrate all important smaller files from their home directory area in /usr/people to this area for long-term storage. All users have a directory located on this file system which can be referenced by the environment variable $ARCHIVE_HOME.

2.5.4. /tmp or /var/tmp

Never use /tmp or /var/tmp for temporary storage! These directories are not intended for temporary storage of user data, and abuse of these directories could adversely affect the entire system.

2.5.5. /p/cwfs

This path is directed to the Center-Wide File System (CWFS) which is meant for short-term storage (no longer than 30 days). All users have a directory defined in this file system. The environment variable for this is $CENTER. This is accessible from both the Utility Server login and compute nodes and the HPC systems login nodes. The CWFS has a formatted capacity of 800 TBytes and is managed by Panasas PanFS.

2.6. Peak Performance

Pershing is rated at 420 peak TFLOPS.

3. Accessing the System

3.1. Kerberos

A Kerberos client kit must be installed on your desktop to enable you to get a Kerberos ticket. Kerberos is a network authentication tool that provides secure communication by using secret cryptographic keys. Only users with a valid HPCMP Kerberos authentication can gain access to Pershing. More information about installing Kerberos clients on your desktop can be found at HPC Centers: Kerberos & Authentication.

3.2. Logging In

  • Kerberized SSH
    % ssh
  • Kerberized rlogin and telnet are also allowed.

3.3. File Transfers

File transfers to DSRC systems (except for those to the local archive server) must be performed using Kerberized versions of the following tools: scp, mpscp, sftp, ftp, and kftp. Before using any Kerberized tool, you must use a Kerberos client to obtain a Kerberos ticket. Information about installing and using a Kerberos client can be found at HPC Centers: Kerberos & Authentication.

The command below uses secure copy (scp) to copy a single local file into a destination directory on a Pershing login node. The mpscp command is similar to the scp command, but has a different underlying means of data transfer, and may enable greater transfer rate. The mpscp command has the same syntax as scp.

% scp local_file (# = 1 to 7)

Both scp and mpscp can be used to send multiple files. This command transfers all files with the .txt extension to the same destination directory. More information about mpscp can be found on the mpscp man page.

% scp *.txt (# = 1 to 7)

The example below uses the secure file transfer protocol (sftp) to connect to Pershing, then uses the sftp cd and put commands to change to the destination directory and copy a local file there. The sftp quit command ends the sftp session. Use the sftp help command to see a list of all sftp commands.

% sftp (# = 1 to 7)

sftp> cd target_dir
sftp> put local_file
sftp> quit

The Kerberized file transfer protocol (kftp) command differs from sftp in that your username is not specified on the command line, but given later when prompted. The kftp command may not be available in all environments.

% kftp (# = 1 to 7)

username> user
kftp> cd target_dir
kftp> put local_file
kftp> quit

Windows users may use a graphical file transfer protocol (ftp) client such as Filezilla.

4. User Environment

4.1. User Directories

4.1.1. Home Directory

When you log on to Pershing, you will be placed in your home directory, /usr/people/username. The environment variable $HOME is automatically set for you and refers to this directory. $HOME is visible to both the login and compute nodes, and may be used to store small user files, but it has limited capacity and is not backed up on a daily basis and therefore should not be used for long-term storage.

4.1.2. Work Directory

The path for your working directory on Pershing's scratch file system is /usr/var/tmp/username. The environment variable $WORKDIR is automatically set for you and refers to this directory. $WORKDIR is visible to both the login and compute nodes, and should be used for temporary storage of active data related to your batch jobs.

Note: Although the $WORKDIR environment variable is automatically set for you, the directory itself is not created. You can create your $WORKDIR directory as follows:

mkdir $WORKDIR

The scratch file system provides 2.2 PBytes of formatted disk space. This space is not backed up, however, and is subject to a purge policy.

REMEMBER: This file system is considered volatile working space. You are responsible for archiving any data you wish to preserve. To prevent your data from being "scrubbed," you should copy files that you want to keep into your /archive directory (see below) for long-term storage.

4.1.3. /archive Directory

In addition to $HOME and $WORKDIR, each user is also given a directory on the /archive file system. This file system is visible to the login nodes (not the compute nodes) and is the preferred location for long-term file storage. All users have an area defined in /archive for their use. This area can be accessed using the $ARCHIVE_HOME environment variable. We recommend that you keep large computational files and more frequently accessed files in the $ARCHIVE_HOME directory. We also recommend that any important files located in $HOME should be copied into $ARCHIVE_HOME as well.

Because the compute nodes are unable to see $ARCHIVE_HOME, you will need to pre-stage your input files to your $WORKDIR from a login node before submitting jobs. After jobs complete, you will need to transfer output files from $WORKDIR to $ARCHIVE_HOME from a login node. This may be done manually or through the transfer queue, which executes serial jobs on login nodes.

4.1.4. Center-Wide File System Directory

The path for your working directory on the Center-Wide file system is /p/cwfs/username. The environment variable $CENTER is automatically set to point to this directory. The main purpose of this area is as a staging area for production system output files that require post-processing using the Utility Server.

Because the compute nodes are unable to see /p/cwfs on Pershing, you will need to transfer output files from $WORKDIR to /p/cwfs from a login node. This may be done manually or through the transfer queue, which executes serial jobs on login nodes.

4.2. Shells

The following shells are available on Pershing: csh, bash, ksh, tcsh, zsh, and sh. To request a change of your default shell, contact the Consolidated Customer Assistance Center.

4.3. Environment Variables

A number of environment variables are provided by default on all HPCMP HPC systems. We encourage you to use these variables in your scripts where possible. Doing so will help to simplify your scripts and reduce portability issues if you ever need to run those scripts on other systems.

4.3.1. Login Environment Variables

The following environment variables are common to both the login and batch environments:

Common Environment Variables
Variable Description
$ARCHIVE_HOME Your directory on the archive server
$ARCHIVE_HOST The host name of the archive server
$BC_HOST The generic (not node specific) name of the system.
$CC The currently selected C compiler. This variable is automatically updated when a new compiler environment is loaded.
$CENTER Your directory on the Center-Wide File System (CWFS)
$COST_HOME This variable contains the path to the base directory of the default installation of the Common Open Source Tools (COST) installed on a particular compute platform. (See BC policy FY13-01 for COST details.)
$CSI_HOME The directory containing the following list of heavily used application packages: ABAQUS, Accelrys, ANSYS, CFD++, Cobalt, EnSight, Fluent, GASP, Gaussian, LS-DYNA, MATLAB, and TotalView, formerly known as the Consolidated Software Initiative (CSI) list. Other application software may also be installed here by our staff.
$CXX The currently selected C++ compiler. This variable is automatically updated when a new compiler environment is loaded.
$DAAC_HOME The directory containing the ezViz visualization software
$F77 The currently selected Fortran 77 compiler. This variable is automatically updated when a new compiler environment is loaded.
$F90 The currently selected Fortran 90 compiler. This variable is automatically updated when a new compiler environment is loaded.
$HOME Your home directory on the system
$JAVA_HOME The directory containing the default installation of JAVA
$KRB5_HOME The directory containing the Kerberos utilities
$PET_HOME The directory containing the tools formerly installed and maintained by the PETTT staff. This variable is deprecated and will be removed from the system in the future. Certain tools will be migrated to $COST_HOME, as appropriate.
$PROJECTS_HOME A common directory where group-owned and supported applications and codes may be maintained for use by members of a group. Any project may request a group directory under $PROJECTS_HOME.
$SAMPLES_HOME The Sample Code Repository. This is a collection of sample scripts and codes provided and maintained by our staff to help users learn to write their own scripts. There are a number of ready-to-use scripts for a variety of applications.
$WORKDIR Your work directory on the local temporary file system (i.e., local high-speed disk).
4.3.2. Batch-Only Environment Variables

In addition to the variables listed above, the following variables are automatically set only in your batch environment. That is, your batch scripts will be able to see them when they run. These variables are supplied for your convenience and are intended for use inside your batch scripts.

Batch-Only Environment Variables
Variable Description
$BC_CORES_PER_NODE The number of cores per node for the compute node on which a job is running.
$BC_MEM_PER_NODE The approximate maximum user-accessible memory per node (in integer MBytes) for the compute node on which a job is running.
$BC_MPI_TASKS_ALLOC The number of MPI tasks allocated for a job.
$BC_NODE_ALLOC The number of nodes allocated for a job.

4.4. Modules

Software modules are a very convenient way to set needed environment variables and include necessary directories in your path so commands for particular applications can be found. We strongly encourage you to use modules. For more information on using modules, see the Modules User Guide.

4.4.1. Large Memory Modules

Codes that run into out-of-memory (OOM) conditions while utilizing Intel-MPI should switch to the large Intel-MPI modules to handle this OOM case. There are two modules that can handle this: mpi/intelmpi/4.1.0.large and mpi/intelmpi/4.0.3.large.

4.5. Archive Usage

Archive storage is provided through the /archive NFS-mounted file system. All users are automatically provided a directory under this file system. However, it is only accessible from the login nodes. Since space in a user's login home area in /usr/people is limited, all large data files requiring permanent storage should be placed in /archive. Also, it is recommended that all important smaller files in /usr/people for which a user requires long-term access be copied to /archive as well. For more information on using the archive system, see the Archive System User Guide.

4.6. Login Files

When an account is created on Pershing, a default .cshrc, and/or .profile file is placed into your home directory. This file contains the default modules setup to configure modules, PBS and other system defaults. We suggest you customize the following: .cshrc.pers, or .profile.pers for your shell with any paths, aliases or libraries you may need to load. The files should be sourced at the end of your .cshrc, and/or .profile file as necessary. For example:

if (-f $HOME/.cshrc.pers) then
source $HOME/.cshrc.pers

If you need to connect to other Kerberized systems within the program, you should use krlogin or /usr/brl/bin/ssh. If you use Kerberized ssh often, you may want to add an alias in your .cshrc.pers, or .profile.pers files in $HOME, as follows:

alias ssh /usr/brl/bin/ssh # .cshrc.pers - csh/tcsh
alias ssh=/usr/brl/bin/ssh # .profile.pers - sh/ksh/bash

5. Program Development

5.1. Programming Models

Pershing supports two programming models: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP). A hybrid (MPI/OpenMP) programming model is also supported. MPI is an example of a message- or data-passing model. OpenMP only uses shared memory on a node by spawning threads. And, the hybrid model combines both models.

5.1.1. Message Passing Interface (MPI)

Pershing has four MPI-2.0 standard library suites: IntelMPI, OpenMPI, MPICH2, and IBM PE. The modules for these MPI libraries are mpi/intelmpi/x.x.x, mpi/openmpi/x.x.x, mpi/mpich2/x.x.x, and mpi/ibmpe/x.x.x.x.

5.1.2. Open Multi-Processing (OpenMP)

OpenMP is available in Intel's Software Development suite for C, C++ and Fortran. Use the "-openmp" flag.

5.1.3. Hybrid Processing (MPI/OpenMP)

In hybrid processing, all intranode parallelization is accomplished using OpenMP, while all internode parallelization is accomplished using MPI. Typically, there is one MPI task assigned per node, with the number of OpenMP threads assigned to each node set at the number of cores available on the node.

5.2. Available Compilers

Pershing has two compiler suites:

  • Intel
  • GNU

All versions of MPI share a common base set of compilers that are available on both the login and compute nodes:

Common Compiler Commands
Compiler Intel GNU Serial/Parallel
C icc gcc Serial/Parallel
C++ icc g++ Serial/Parallel
Fortran 77 ifort gfortran Serial/Parallel
Fortran 90 ifort gfortran Serial/Parallel

The following additional compiler wrapper scripts are available under Intel MPI, OpenMPI, and MPICH2:

Intel MPI, OpenMPI, and MPICH 2 Compiler Wrapper Scripts
Compiler Intel GNU Serial/Parallel
MPI C mpicc mpicc Parallel
MPI C++ mpicc mpicc Parallel
MPI F77 mpif77 mpif77 Parallel
MPI F90 mpif90 mpif90 Parallel

The following compiler wrapper scripts are available under IBM PE on both the login and compute nodes.

IBM PE Compiler Wrapper Scripts
Compiler Intel GNU Serial/Parallel
MPI C mpcc mpcc Parallel
MPI C++ mpCC mpCC Parallel
MPI F77 mpfort mpfort Parallel
MPI F90 mpfort mpfort Parallel

To select one of these compilers for use, load its associated module. See Relevant Modules (below) for more details.

5.2.1. Intel C, C++, and Fortran Compiler

Intel's latest compiler suite improves performance for large memory and F90 applications over the previous version of this product. Intel's latest Fortran compiler, ifort, includes the code-generation and optimization power of the Intel compiler and the language features of the Compaq Visual Fortran front-end. The standard Intel Fortran compiler tools continue to be available as well. The latest Intel C++ compiler now has a full binary mix and match operability with gcc 3.2 and greater. The compiler also includes support for the gcc Standard Template Library (libstdC++) and allows precompiled headers for Linux compilation.

Several optimizations and tuning options are available for code developed with all Intel compilers on the Nehalem quad-core processor. For more information see Code Profiling and Optimization. The table below shows some compiler options that may help with optimization.

Useful Intel Compiler Options
-O0 disable optimization
-g create symbols for tracing and debugging
-O1 optimize for speed with no loop unrolling and no increase in code size
-O2 or -default default optimization, optimize for speed with inline intrinsic and loop unrolling
-O3 level -O2 optimization plus memory optimization (allows compiler to alter code)
-ipo interprocedural optimization, inline functions in separate files, partial inlining, dead code elimination, etc.

The following tables contain examples of serial, MPI, IBM PE, and OpenMP compile commands for C, C++ and Fortran.

Example C Compile Commands
Programming ModelCompile Command
Serial icc -O3 my_code.c -o my_code.x
IntelMPI mpicc -O3 my_code.c -o my_code.x
OpenMPI mpicc -O3 my_code.c -o my_code.x
MPICH2 mpicc -O3 my_code.c -o my_code.x
IBM PE mpcc -O3 my_code.c -o my_code.x
OpenMP icc -O3 my_code.c -o my_code.x -openmp
Example C++ Compile Commands
Programming ModelCompile Command
Serial icc -O3 my_code.C -o my_code.x
IntelMPI mpicxx -O3 my_code.C -o my_code.x
OpenMPI mpicxx -O3 my_code.C -o my_code.x
MPICH2 mpicxx -O3 my_code.C -o my_code.x
IBM PE mpCC -O3 my_code.C -o my_code.x
OpenMP icc -O3 my_code.C -o my_code.x -openmp
Example Fortran Compile Commands
Programming ModelCompile Command
Serial ifort -O3 my_code.f90 -o my_code.x
IntelMPI mpif90 -O3 my_code.f90 -o my_code.x
OpenMPI mpif90 -O3 my_code.f90 -o my_code.x
MPICH2 mpif90 -O3 my_code.f90 -o my_code.x
IBM PE mpfort -O3 my_code.f90 -o my_code.x
OpenMP ifort -O3 my_code. f90 -o my_code.x -openmp

For more information on the Intel compilers, please consult Intel's Software Documentation Library.

5.2.2. GNU Compiler

The default GNU compilers are good for compiling utility programs, but are probably not appropriate for computationally intensive applications. The primary selling point of using GNU compilers is the compatibility between different architectures. The GNU compilers are available when the compiler/gcc/4.4 module is loaded. Once the module is loaded, they can be executed using the commands in the table above. For GNU compilers, the "-O" flag is the basic optimization setting.

More GNU compiler information can be found in the GNU gcc 4.4.2 manual.

5.2.3. Pershing Default Compiler Environment

By default, all users will have the default Intel compiler module loaded in their environment at login. Users who wish to have the gcc compiler or a non-default Intel compiler as the default upon login should add the following in their .csh.pers or .profile.pers files:

module unload compiler/intel/x.x.x
module load compiler/gcc/x.x

or module load compiler/intel/x.x.x
where x.x.x is different from the default.

5.3. Relevant Modules

If you compile your own codes, you will need to select which compiler and MPI version you want to use. If you are using your environment's default compiler, then you need only load the desired MPI module. If you require a compiler module other than the one loaded by default, you must unload the default compiler module and load the new compiler module, followed by the MPI module before compiling. For example:

module load mpi/intelmpi/4.0.3
module unload compiler/intel/12.1.0
module load compiler/gcc/4.4 mpi/intelmpi/4.0.3

These same module commands should be executed in your batch script before executing your program.

Pershing provides individual modules for each compiler and MPI version. To see the list of currently available modules use the "module avail" command. You can use any of the available MPI versions with each compiler by pairing them together when you load the modules.

The table below shows the naming convention used for various modules.

Module Naming Conventions
Module Module Name
Intel Compilerscompiler/intel/##.#
Gnu Compilerscompiler/gcc/#.#
IBM MPI Librarympi/ibmpe/#.#.#.#
Intel MPI Librarympi/intelmpi/#.#
OpenMPI MPI Librarympi/openmpi/#.#
MPICH2 MPI Librarympi/mpich2/#.#.#

For more information on using modules, see the Modules User Guide.

5.4. Libraries

5.4.1. BLAS

The Basic Linear Algebra Subprogram (BLAS) library is a set of high quality routines for performing basic vector and matrix operations. There are three levels of BLAS operations:

  • BLAS Level 1: vector-vector operations
  • BLAS Level 2: matrix-vector operations
  • BLAS Level 3: matrix-matrix operations

More information on the BLAS library can be found at

5.4.2. Intel Math Kernel Library (MKL)

The Intel Math Kernel Library (MKL) is a library of numerical processing functions that have been optimized for math, scientific and engineering applications. The MKL includes the following:

  • LAPACK plus BLAS (Levels 1, 2, 3)
  • Discrete Fourier Transforms (DFTs)
  • Vector Statistical Library functions (VSL)
  • Vector Transcendental Math functions (VML)

The MKL can be loaded into your path using the following command:

module load compiler/intel/12.1.0

Add "-L $MKLPATH -l library_name" to the compilation options on your code to use these libraries. The $MKLPATH environment variable is set when the module is loaded. More information on Intel's MKL can be found at

5.4.3. Additional Math Libraries

There is also an extensive set of Math libraries available in the $PET_HOME/MATH directory on Pershing. Information about these libraries may be found on the Baseline Configuration Web site at BC policy FY06-01.

5.5. Debuggers

5.5.1. gdb

The GNU Project Debugger (gdb) is a debugger that works similarly to dbx and can be invoked either with a program for execution or a running process id. To use gdb to debug a program during execution, use:

gdb a.out corefile

To debug a process that is currently executing on this node, use:

gdb a.out pid

For more information, the GDB manual can be found at

5.5.2. idb

The Intel Debugger (idb) is a symbolic debugger that implements a stop and examine model to help locate run-time errors in code. It can also attach to running processes to perform kernel debugging, and it has the ability to manage several processes at once as well as multi-threaded applications. To use idb, the code to be debugged must be compiled and linked with the "-Od", "-Oy" and "-Zi" options. By default, idb begins in dbx mode, but can be run in gdb mode by specifying the "-gdb" option. A graphical version of idb can be invoked using the "-gui" option. The Intel Debugger Manual can be found at .

Note: the user must first load the Intel module to access IDB.

5.5.3. TotalView

TotalView is a debugger that supports threads, MPI, OpenMP, C/C++, and Fortran, mixed-language codes, advanced features like on-demand memory leak detection, other heap allocation debugging features, and the Standard Template Library Viewer (STLView). Unique features like dive, a wide variety of breakpoints, the Message Queue Graph/Visualizer, powerful data analysis, and control at the thread level are also available.

Currently on Pershing, to display the source code, you must limit your debug job to 1 node (16 cores). Debug jobs using multiple nodes will display only assembler instructions.

Follow these steps to use TotalView on Pershing via a UNIX X-Windows interface:

  1. Open a console window to Pershing. "ssh pershing".
  2. Start a 16-core, interactive session with X-Windows forwarding enabled:

    qsub -V -I -X -N iPershing -q standard -l walltime=07:00:00 -l select=1:ncpus=16:mpiprocs=16 -l place=scatter:excl -A Project_ID -r n -j oe

    Remember to provide a valid project id in the line above. Wait for a compute node to be given to you. You'll see a new prompt similar to "pershingn1234". The last four numbers after the n will be different.

  3. To make sure you can open an X-window, type "xclock". A clock should be displayed in your window. It may take a second or two, so be patient. If you do not see a clock, verify that you used the "-X" qsub option as shown above.
  4. Load the modules that you used to compile your code and load the TotalView module.

    module load compiler/intel/12.1.0 mpi/intelmpi/4.0.3 totalview

  5. Now start TotalView: type "totalview" and wait a minute or so for the TotalView windows to pop up.
  6. Under the TotalView Window named "New Program" select "Browse" button and select your program executable.
  7. Click the "Parallel" tab and select the appropriate MPI suite from the drop down list.
  8. In the same tab, click the "up" arrow on the "Tasks" to 16. This will allow a 16-MPI-task job. Also, increase the node button to "1" from "0".
  9. Click OK. Your source code should pop up, allowing you to enter stop points, watch points, etc.

If you are using Cygwin, please log onto Pershing and cd to /usr/cta/SCR/MPI_Totalview to view the appropriate document for your system.

For more information on using TotalView, see the TotalView Documentation page.

5.5.4. DDT

DDT is an intuitive, scalable, graphical debugger capable of debugging a wide variety of scenarios found in today's development environments. With DDT, it is possible to debug:

  • Single-process and multithreaded software
  • OpenMP
  • Parallel (MPI) software
  • Heterogeneous software such as that written to use GPUs
  • Hybrid codes mixing paradigms such as MPI + OpenMP, or MPI + CUDA
  • Multi-process software of any form, including client-server applications.

The tool can do many tasks beyond the normal capabilities of a debugger - for example the memory debugging feature is able to detect some errors before they have caused a program crash by verifying usage of the system allocator functions, and the message queue integration with MPI can show the current state of communication between processes in the system. DDT supports all of the compiled languages that are found in mainstream and high-performance computing including:

  • C, C++, and all derivatives of Fortran, including F90.
  • Parallel languages/models including MPI, UPC, and Fortran 2008 Coarrays.
  • GPU languages such as HMPP, OpenMP Accelerators, CUDA and CUDA Fortran.
  • Follow these steps to use DDT on Pershing via a UNIX X-Windows interface:

    1. Open a console window to Pershing. "ssh pershing".
    2. Start a 16-core, interactive session:

      qsub -X -V -I -N ipershing -q standard -l walltime=07:00:00 -l select=2:ncpus=16:mpiprocs=16 -l place=scatter:excl -A ARLAP96090ARL -r n -j oe

      and wait for a compute node to be given to you.

    3. Load the same modules that you used to compile your code and the ddt module:

      module load compiler/intel/12.1.0 mpi/intelmpi/4.0.3
      module load ddt

    4. Now start DDT: type "ddt" and wait a minute or so for the DDT windows to pop up.

      Follow this section ONLY if running DDT for the first time!

      --FIRST TIME---> If this is your FIRST TIME running DDT, a window will pop up titled "DDT Configuration Wizard"

      --FIRST TIME---> Select "Create a New Configuration File", then select the appropriate MPI suite from the pull down list. If your code is compiled with Intel-MPI, then select "intel-mpi".

      --FIRST TIME---> Click Next. You'll get an error about a mismatched machine name, followed by a prompt about "Do these host names refer to the same machine". Click "YES"

      --FIRST TIME---> On the next window, click Skip this step

      --FIRST TIME---> AGAIN. On the next window, click Skip this step

      --FIRST TIME---> Click Finish.

    5. Under the "DDT - Welcome" Window select "Run and Debug a Program" button and then select your program. Your source code should pop up allowing you to enter stop points, watchpoints, etc.

    5.6. Code Profiling and Optimization

    Profiling is the process of analyzing the execution flow and characteristics of your program to identify sections of code that are likely candidates for optimization, which increases the performance of a program by modifying certain aspects for increased efficiency.

    We provide two profiling tools: gprof and codecov to assist you in the profiling process. A basic overview of optimization methods with information about how they may improve the performance of your code can be found in Performance Optimization Methods (below).

    5.6.1. gprof

    The GNU Project Profiler (gprof) is a profiler that shows how your program is spending its time and which functions calls are made. To profile code using gprof, use the "-pg" option during compilation. For more information, the gprof manual can be found at

    5.6.2. codecov

    The Intel Code Coverage Tool (codecov) can be used in numerous ways to improve code efficiency and increase application performance. The tool leverages Profile-Guided optimization technology (discussed below). Coverage can be specified in the tool as file-level, function-level or block-level. Another benefit to this tool is the ability to compare the profiles of two application runs to find where the optimizations are making a difference. More detailed information on this tool can be found at

    5.6.3. Program Development Reminders

    If an application is not programmed for distributed memory, then only the cores on a single node can be used. This is limited to 16 cores on Pershing.

    Check the utilization of the nodes your application is running on to see if it is taking advantage of all the resources available to it. This can be done by finding the nodes assigned to your job by executing "qstat -f", logging into one of the nodes using the ssh command, and then executing the top command to see how many copies of your executable are being executed on the node.

    Keep the system architecture in mind during code development. For instance, if your program requires more memory than is available on a single node, then you will need to parallelize your code so that it can function across multiple nodes.

    5.6.4. Performance Optimization Methods

    Optimization generally increases compilation time and executable size, and may make debugging difficult. However, it usually produces code that runs significantly faster. The optimizations that you can use will vary depending on your code and the system on which you are running.

    Note: Before considering optimization, you should always ensure that your code runs correctly and produces valid output.

    In general, there are five main categories of optimization:

    • Global Optimization
    • Loop Optimization
    • Interprocedural Analysis and Optimization(IPA)
    • Function Inlining
    • Profile-Guided Optimizations
    Global Optimization

    A technique that looks at the program as a whole and may perform any of the following actions:

    • Performed on code over all its basic blocks
    • Performs control-flow and data-flow analysis for an entire program
    • Detects all loops, including those formed by IF and GOTOs statements and performs general optimization.
    • Constant propagation
    • Copy propagation
    • Dead store elimination
    • Global register allocation
    • Invariant code motion
    • Induction variable elimination
    Loop Optimization

    A technique that focuses on loops (for, while, etc.) in your code and looks for ways to reduce loop iterations or parallelize the loop operations. The following types of actions may be performed:

    • Vectorization - rewrites loops to improve memory access performance. With the Intel compilers, loops can be automatically converted to utilize the MMX/SSE/SSE2/SSE3 instructions and registers if they meet certain criteria.
    • Loop unrolling - (also known as "unwinding") replicates the body of loops to reduce loop branching overhead and provide better opportunities for local optimization.
    • Parallelization - divides loop operations over multiple processors where possible.
    Interprocedural Analysis and Optimization (IPA)

    A technique that allows the use of information across function call boundaries to perform optimizations that would otherwise be unavailable.

    Function Inlining

    A technique that seeks to reduce function call and return overhead.

    • Used with functions that are called numerous times from relatively few locations.
    • Allows a function call to be replaced by a copy of the body of that function.
    • May create opportunities for other types of optimization
    • May not be beneficial. Improper use may increase code size and actually result in less efficient code.
    Profile-Guided Optimizations

    Profile-Guided optimizations are available which allow the compiler to make data driven decisions during compilation on branch predictions, increased parallelism, block ordering, register allocation, function ordering, and more. The build for this option takes about three steps though and uses a representative data set to come up with the optimizations.

    For example:

    • Step 1: Instrumentation, Compilation, and Linking

      ifort -prof-gen -prof-dir ${HOME}/profdata -O2 -c a1.f a2.f a3.f
      ifort -o a1 a1.o a2.o a3.o

    • Step 2: Instrumentation Execution


    • Step 3: Feedback Compilation

      ifort -prof-use -prof-dir ${HOME}/profdata -ipo a1.f a2.f a3.f

    6. Batch Scheduling

    6.1. Scheduler

    The Portable Batch System (PBS) is currently running on Pershing. It schedules jobs and manages resources and job queues, and can be accessed through the interactive batch environment or by submitting a batch request. PBS is able to manage both single-processor and multiprocessor jobs. The PBS module is automatically loaded by the Master module on Pershing at login.

    6.2. Queue Information

    The following table describes the PBS queues available on Pershing:

    Queue Descriptions and Limits
    Priority Queue
    Max Wall
    Clock Time
    Max Cores
    Per Job
    Highest debug Debug 1 Hour 512 User diagnostic jobs
    Down Arrow for decreasing priority transfer N/A 24 Hours 1 Data transfer for user jobs
    urgent Urgent 96 Hours N/A Designated urgent jobs by DoD HPCMP
    staff N/A 368 Hours N/A ARL DSRC staff testing only. System testing and user support.
    high High 96 Hours N/A Designated high-priority jobs by DoD HPCMP
    challenge Challenge 168 Hours N/A Challenge projects only
    cots Standard 96 Hours N/A Abaqus, Fluent, and Cobalt jobs
    interactive Standard 12 Hours N/A Interactive jobs
    standard-long Standard 200 Hours N/A ARL DSRC permission required
    standard Standard 168 Hours N/A Non-Challenge user jobs
    Lowest background Background 24 Hours N/A User jobs that will not be charged against the project allocation.

    6.3. Interactive Logins

    When you log in to Pershing, you will be running in an interactive shell on a login node. The login nodes provide login access for Pershing and support such activities as compiling, editing, and general interactive use by all users. Please note the Login Node Abuse policy. The preferred method to run resource intensive executions is to use an interactive batch session.

    6.4. Interactive Batch Sessions

    An interactive session on a compute node is possible using a proper PBS command line syntax from a login node. Once PBS has scheduled your request on the compute pool, you will be directly logged into a compute node, and this session can last as long as your requested wall time.

    To submit an interactive batch job, use the following submission format:

    qsub -I -X -l walltime=HH:MM:SS -l select=#_of_nodes:ncpus=16:mpiprocs=16 (line continues...)
    -l place=scatter:excl -A proj_id -q interactive -V

    Your batch shell request will be placed in the interactive queue and scheduled for execution. This may take a few minutes or a long time depending on the system load. Once your shell starts, you will be logged into the first compute node of the compute nodes that were assigned to your interactive batch job. At this point, you can run or debug applications interactively, execute job scripts, or start executions on the compute nodes you were assigned. The "-X" option enables X-Windows access, so it may be omitted if that functionality is not required for the interactive job.

    6.5. Batch Request Submission

    PBS batch jobs are submitted via the qsub command. The format of this command is:

    qsub [ options ] batch_script_file

    qsub options may be specified on the command line or embedded in the batch script file by lines beginning with "#PBS".

    For a more thorough discussion of PBS Batch Submission, see the Pershing PBS Guide.

    6.6. Batch Resource Directives

    A listing of the most common batch Resource Directives is available in the Pershing PBS Guide.

    6.7. Launch Commands

    There are different commands for launching MPI executables from within a batch job depending on which MPI implementation your script uses.

    To launch an IntelMPI executable, use the mpirun command as follows:

    mpirun ./mympijob.exe

    To launch an OpenMPI executable, use the openmpirun.pbs command as follows:

    openmpirun.pbs ./mympijob.exe

    To launch an MPICH2 executable, use the mpiexec command as follows:

    mpiexec -launcher -n #_of_MPI_tasks ssh -f $PBS_NODEFILE ./mympijob.exe

    To launch an IBM PE MPI executable, use the mpiexec command as follows:

    mpiexec ./mympijob.exe

    For OpenMP executables, no launch command is needed.

    6.8. Sample Script

    The following script is a basic example. More thorough examples are available in the Pershing PBS Guide and in the Sample Code Repository ($SAMPLES_HOME) on Pershing.

    #  Specify job name.
    #PBS -N myjob
    #  Specify queue name.
    #PBS -q standard
    # select = # of nodes
    # ncpus is ALWAYS set to 16!
    # mpiprocs is the number of cores on each node to use
    # This run will use (select)x(mpiprocs) cores = 8*16=128 cores
    #PBS -l select=8:ncpus=16:mpiprocs=16
    #  Specify how MPI processes should be distributed across nodes.
    #PBS -l place=scatter:excl
    #  Specify maximum wall clock time.
    #PBS -l walltime=24:00:00
    #  Specify Project ID to use. ID may have the form ARLAP96090RAY.
    #  Uncomment the below PBS option to request the job be run on
    #  Large-memory nodes
    ##PBS -l bigmem=1
    #  Specify that environment variables should be passed to master MPI process.
    #PBS -V
    # Uncomment the below PBS option if you want your job to be rerunnable
    ##PBS -r y
    set JOBID=`echo #PBS_JOBID | cut -f1 d.`
    #  Create a temporary working directory within $WORKDIR for this job run.
    set TMPD=${WORKDIR}/${JOBID}
    mkdir -p $TMPD
    # Change directory to submit directory
    # and copy executable and input file to scratch space
    cp mpicode.x $TMPD
    cp input.dat $TMPD
    cd $TMPD
    # The following two lines provide an example of setting up and running
    #  an IBM PE MPI parallel code built with the default compiler.
    module load mpi/ibmpempi/
    mpiexec ./mpicode.x > out.dat
    # The following two lines provide an example of setting up and running
    #  an IntelMPI MPI parallel code built with the default compiler.
    module load mpi/intelmpi/4.0.3
    mpirun ./mpicode.x > out.dat
    # The following two lines provide an example of setting up and running
    #  an openMPI MPI parallel code built with the non-default gcc compiler.
    module unload compiler/intel12.1.0
    module load compiler/gcc/4.4 mpi/openmpi/1.6.0
    openmpirun.pbs ./mpicode.x > out.dat
    # The following two lines provide an example of setting up and running
    #  an MPICH2 MPI parallel code built with the default Intel compiler.
    module unload compiler/gcc/4.4
    module load compiler/intel12.1.0 mpi/mpich2/1.4.1
    mpiexec -launcher ssh -n 128 -f $PBS_NODEFILE ./mpicode.x > out.dat
    cp out.dat $PBS_O_WORKDIR

    6.9. PBS Commands

    The following commands provide the basic functionality for using the PBS batch system:

    qsub: Used to submit jobs for batch processing.
    qsub [ options ] my_job_script

    qstat: Used to check the status of submitted jobs.
    qstat PBS_JOBID ## check one job
    qstat -u my_user_name ## check all of user's jobs

    qdel: Used to kill queued or running jobs.
    qdel PBS_JOBID

    A more complete list of PBS commands is available in the Pershing PBS Guide.

    6.10. Advance Reservations

    An Advance Reservation Service (ARS) is available on Pershing for reserving cores for use, starting at a specific date/time, and lasting for a specific number of hours. The specific number of reservable cores changes frequently, but is displayed on the reservation page for each system in the ARS. The ARS is accessible via most modern web browsers at Authenticated access is required. An ARS User's Guide is available online once you have logged in.

    7. Software Resources

    7.1. Application Software

    All Commercial Off The Shelf (COTS) software packages can be found in the $CSI_HOME (/usr/cta) directory. A complete listing of software on Pershing with installed versions can be found on our software page. The general rule for all COTS software packages is that the two latest versions will be maintained on our systems. For convenience, modules are also available for most COTS software packages.

    7.2. Useful Utilities

    The following utilities are available on Pershing:

    Useful Utilities
    check_license Checks the status of ten HPCMP shared applications grouped into two distinct categories: Software License Buffer (SLB) applications and non-SLB applications. check_license package
    node_use Displays memory-use and load-average information for all login nodes of the system on which it is executed. node_use -a
    qpeek Returns the standard output (STDOUT) and standard error (STDERR) messages for any submitted PBS job from the start of execution. qpeek PBS_JOB_ID
    qview Lists the status and current usage of all PBS queues on Pershing. "qview -h" shows all the qview options available.
    show_queues Lists the status and current usage of all PBS queues on Pershing. show_queues
    show_storage Provides quota and usage information for the storage areas in which the user owns data on the current system. show_storage
    show_usage Lists the project ID and total hours allocated / used in the current FY for each project you have on Pershing. show_usage

    7.3. Sample Code Repository

    The Sample Code Repository is a directory that contains examples for COTS batch scripts, building and using serial and parallel programs, data management, and accessing and using serial and parallel math libraries. The $SAMPLES_HOME environment variable contains the path to this area, and is automatically defined in your login environment. Below is a listing of the examples provided in the Sample Code Repository on Pershing.

    Sample Code Repository on Pershing
    Application-specific examples; interactive job submit scripts; use of the application name resource; software license use.
    abaqusBasic batch script and input deck for an Abaqus application.
    abinitBasic batch script for an ABINIT application.
    adfBasic batch script and input deck for an ADF application.
    ale3dBasic batch script and input deck for an ALE3D application.
    amberBasic batch script for an AMBER9 application.
    ansysBasic batch script for an ANSYS application.
    castepBasic batch script and input deck for an CASTEP application.
    cfd++Basic batch script and input deck for an CFD++ application.
    cobaltBasic batch script and input deck for an COBALT application.
    comsolBasic batch script and input deck for an COMSOL application.
    cthBasic batch script and input deck for an CTH application.
    discoverBasic batch script and input deck for an DISCOVER application.
    dmol3Basic batch script and input deck for an DMOL3 application.
    epicBasic batch script and input deck for an EPIC application.
    espressoBasic batch script for an ESPRESSO application.
    fluentBasic batch script and input deck for a FLUENT (now ACFD) application.
    GAMESSauto_submit script and input deck for a GAMESS application.
    gaspBasic batch script for a GASP application.
    gaussianInput deck for a GAUSSIAN application and automatic submission script for submitting a Gaussian job.
    gromacsBasic batch script and input deck for a GROMACS application
    gulpBasic batch script and input deck for a GULP application
    lammpsBasic batch script and input deck for a LAMMPS application.
    ls-dynaBasic batch script and input deck for a LS-DYNA application.
    lsoptBasic batch script and input deck for using LSOPT to optimize an LS-DYNA application.
    matlabBasic batch script and sample m file for a MATLAB application.
    mcnpxBasic batch script for a MCNPX application.
    mesodynBasic batch script for a MESODYN application.
    MOLPROBasic batch script and input deck for a MOLPRO application.
    namdBasic batch script for a NAMD application.
    nwchemBasic batch script and input deck for a NWCHEM application.
    OPENFOAMBasic batch script for an OPENFOAM application.
    overflowBasic batch script for an OVERFLOW application.
    picalcBasic PBS example batch script.
    STARCCM+Basic batch script and input deck for a STARCCM+
    velodyneBasic batch script and input deck for a VELODYNE application.
    XpatchBasic batch script and input deck for a Xpatch application.
    Archiving and retrieving files; Lustre striping; file searching; $WORKDIR use.
    pre_post_ExampleSample batch script showing how to stage data out after a job executes using the prepost queue.
    transfer_ExampleSample batch script showing how to stage data out after a job executes using the transfer queue.
    Transfer_Queue_with_Archive_CommandsSample directory containing sample batch scripts demonstrating how to use the transfer queue to retrieve input data for a job, chain a job that uses that data to run a parallel computation, then chain that job to another that uses the transfer queue to plut the data back in archive or long term storage.
    MPI, OpenMP, and hybrid examples; single-core jobs; large memory jobs; running multiple applications within a single batch job.
    HybridSimple MPI/OpenMP hybrid example and batch script.
    MPI_PBS_sampleSimple MPI examples and batch scripts for IntelMPI, OpenMPI and IBM/PE.
    MPI_picalcSimple MPI example and batch script.
    OpenMPSimple Open MP example and batch script.
    Serial_runSimple batch script to run a single core job.
    Basic code compilation; debugging; use of library files; static vs. dynamic linking; Makefiles; Endian conversion.
    BLACS_ExampleSample ScaLAPACK Fortran program, compile sscript and PBS submission scripts.
    Core_FilesProvides Examples of three core file viewers.
    DSL_ExampleSimple example for compiling and running a DSL (Dynamic Shared Library) executable on the compute nodes. Also allows static arrays to exceed 2 GBytes.
    Endian_ConversionInstructions on how to manage data created on a machine with different Endian format.
    Memory_UsageSample build and script that shows how to determine the amount of memory being used by a process.
    MPI_ddtInstructions on how to use the DDT debugger to debug MPI code.
    MPI_CompilationInstructions and sample scripts for using the versions of MPI
    MPI_ExamplesInstructions on how to build parallel codes with each compiler/MPI suite combination available on the system.
    MPI_Totalview Instructions on how to use the TotalView debugger to debug MPI code.
    ScaLAPACK_ExampleSample ScaLAPACK Fortran program, compile sscript and PBS submission scripts.
    Serial_TotalviewInstructions on how to use the TotalView debugger to debug serial code.
    SO_CompileSimple example of creating a SO (Shared Object) library and using it to compile and running against it on the compute nodes.
    Timers_FortranSerial Timers using Fortran Intrinsics f77 and f90/95.
    Use of modules; customizing the login environment.
    Module_Swap_ExampleInstructions for using module swap command.
    Basic batch scripting; use of the transfer queue; job arrays; job dependencies; Secure Remote Desktop; job monitoring.
    BatchScript_ExampleBasic PBS batch script example.
    Hybrid_ExamplesSimple MPI/OpenMP hybrid example and batch script.
    Interactive_ExampleInstructions on how to submit an interactive PBS job.
    Job_Array_ExampleInstructions and example job script for using job arrays.
    MPI_ExampleSample scripts for running MPI jobs under the C and Bash shells.
    OpenMP_ExampleSample script for running OpenMP jobs.
    PBSDocumentation Microsoft Word version of the PBS User's Guide.
    Serial_ExampleSample script for running multiple sequential jobs.
    Transfer_QueuePBS batch script example for data transfer.

    8. Links to Vendor Documentation

    IBM Home:
    IBM iDataPlex:

    RedHat Home:

    GNU Home:
    GNU Compiler:

    Intel Home:
    Intel Sandy Bridge Processor:
    Intel Software Document Library:

    Linux High Performance Technical Computing: