Skip Nav

Archival System User Guide

Table of Contents

1. Introduction

This document provides a system overview of the Archival Storage capability and usage at the ARL DSRC.

2. System Configuration

2.1. Unclassified Archive Systems

The unclassified Mass Storage Server system consists of seven Sun Fire X4600 servers and one Sun Fire X4600-M2 server that have been configured to provide high availability clustered failover service to the tape subsystems on the MSAS. Each of the systems is configured with four 10-core Xeon E7-4860 processors and 128 GBytes of RAM. These systems have access to 160 TBytes of online disk storage for recently-accessed user data. The Mass Storage Library system consists of multiple Sun StorageTek SL8500 tape libraries that have been configured to provide over 2.8 PBytes of available data storage. All data located in the /archive file system is archived to tape and eventually to a remote disaster recovery center.

2.2. Classified Archive Systems

The classified Mass Storage Server system consists of six Sun Fire X4600-M2 servers that have been configured to provide high availability clustered failover service to the tape subsystems on the MSAS. Each of the systems is configured with either two dual‑core 3.0 GHz or two quad‑core AMD Opteron processors and 32 GBytes of RAM. These systems have access to 170 TBytes of online disk storage for recently-accessed user data. The Mass Storage Library system consists of multiple Sun StorageTek SL8500 tape libraries that have been configured to provide over 2.5 PBytes of available data storage. All data located in the /home file system is archived to tape, and eventually to a remote disaster recovery center.

3. Accessing The Archive Systems

While the login nodes of the ARL unclassified (Harold and Pershing) and classified (MRAP and Hercules) production systems have connectivity (over NFS) to the /archive and /home file systems respectively, the compute nodes do not. All users will be given a new home directory on the login and compute nodes, named /usr/people/username. When you login, you will automatically be placed in your local /usr/people home directory, but you can still access the NFS /home and /archive file systems from the login nodes. When your job script runs on a compute node, it will not be able to access your /home or /archive directories. Therefore, you must pre-stage your input files to the scratch file system (/usr/var/tmp) from a login node before submitting your jobs. After the job completes, you must transfer output files from /usr/var/tmp to /home or /archive from a login node as well. This may be done manually or through the transfer queue, which executes serial jobs on login nodes.

Diagram of Unclassified System Connectivity to Local and Remote Storage
Unclassified System Connectivity to Local and Remote Storage


Diagram of Classified System Connectivity to Local and Remote Storage
Classified System Connectivity to Local and Remote Storage

4. Environment Variables (unclassified systems only)

The following environment variables are automatically set in your login environment:

$ARCHIVE_HOME

This is an individual user's directory on the permanent file system that serves a given compute platform.

$ARCHIVE_HOST

This is the hostname of the archival system serving a particular compute platform.

5. Sample PBS Script with Transfer Job Submission


#!/bin/csh

#  Request maximum wallclock time for job
#PBS -l walltime=01:00:00

#  select=number_ nodes,ncpus=cores/node,mpiprocs=MPI procs/node
#  Total cores requested = number of nodes X MPI procs/node
#  For Harold 
#PBS -l select=2:ncpus=8:mpiprocs=8

#  For Pershing and Hercules
#PBS -l select=2:ncpus=16:mpiprocs=16

#  FOR MRAP
#PBS -l mppwidth=24
#PBS -l mppnppn=12


# Specify how MPI processes are distributed on nodes
#PBS -l place=scatter:excl

#  Request job name
#PBS -N linux_transfer

#  Request PBS job queue for job
#PBS -q debug     

# Indicate Project ID
#PBS -A ARLAP96090ARL

#  Request environment variables be exported from script
#PBS -V

set cdir=`pwd`
set JOBID=`echo $PBS_JOBID | cut -f1 -d.`
set TMPD=/usr/var/tmp/$LOGNAME/$JOBID
if (! -e $TMPD ) mkdir -p $TMPD

# change directory to temporary directory on /usr/var/tmp

cd $TMPD

cp ../picalc.exe . || echo "*ERROR* CANNOT COPY ../picalc.exe !"
#  For Harold, Pershing and Hercules
module load compiler/intel/intel12.1 mpi/intelmpi/4.0.3

echo Job $JOBID starting at `date` on `hostname`
echo starting in `pwd`

# Copy any files you need from /home/userid to TMPD
cp ${cdir}/picalc.exe $TMPD || echo "***ERROR*** Problem with copy ${cdir}/picalc.exe $TMPD !"

#  Following line will run your program!!!
#  For Harold, Pershing and Hercules
mpirun ./picalc.exe >& picalc.out

# For MRAP
aprun -n 24 -N 12 ./picalc.exe

set st=$status

echo "Execution ended at `date` with status $st"
if ( $st != 0 ) then
   echo exiting with bad status
   exit $st
endif

#############################################################################
#
# YOU CAN USE THE SAMPLE POST PROCESSING PHASE CODE BELOW IN YOUR OWN SCRIPT
#  JUST BE SURE ARCHIVE_HOME IS set (usually /archive/armyc/your_user_id )
#   and TMPD is set to your /usr/var/tmp/${LOGNAME}/${JOBID}
#    area containing your run data.
#
#############################################################################
echo "ENTERING POST-PROCESSING PHASE...."
echo "JOBID=${JOBID}"
echo "ARCHIVE_HOME=${ARCHIVE_HOME}"
set REQUEST=picalc
set MAIN_JOBID=$JOBID

#------------Post Processing Phase------------------------------

qsub -l walltime=00:15:00 -N transfer-${JOBID} -q transfer -l select=1:ncpus=1:mpiprocs=1:clustertype=transfer -l place=scatter:excl -A ARLAP96090ARL -W depend=afterany:${MAIN_JOBID} -r n  << EOF
#!/bin/csh

cd $TMPD
ls -l
if ( ! -d ${ARCHIVE_HOME} ) then
   echo "****WARNING!**** The archive directory ${ARCHIVE_HOME} cannot be located!"
endif

echo post job submitted on host `hostname` at `date`
cd ..
tar cvf ${JOBID}.tar ${JOBID} || echo "ERROR: Tar of files failed, look in $TMPD for output files!"
cp ${JOBID}.tar $ARCHIVE_HOME || echo "ERROR: Copy of ${JOBID}.tar to $ARCHIVE_HOME  failed...."
set st=\$status
echo "output file copied to archive for job $JOBID ended at `date` with status \$st"
exit \$st
EOF