Archival System User Guide
Table of Contents
1. Introduction
This document provides a system overview of the Archival Storage capability and usage at the ARL DSRC.
2. System Configuration
2.1. Unclassified Archive Systems
The unclassified Mass Storage Server system consists of seven Sun Fire X4600 servers and one Sun Fire X4600-M2 server that have been configured to provide high availability clustered failover service to the tape subsystems on the MSAS. Each of the systems is configured with four 10-core Xeon E7-4860 processors and 128 GBytes of RAM. These systems have access to 160 TBytes of online disk storage for recently-accessed user data. The Mass Storage Library system consists of multiple Sun StorageTek SL8500 tape libraries that have been configured to provide over 2.8 PBytes of available data storage. All data located in the /archive file system is archived to tape and eventually to a remote disaster recovery center.
2.2. Classified Archive Systems
The classified Mass Storage Server system consists of six Sun Fire X4600-M2 servers that have been configured to provide high availability clustered failover service to the tape subsystems on the MSAS. Each of the systems is configured with either two dual‑core 3.0 GHz or two quad‑core AMD Opteron processors and 32 GBytes of RAM. These systems have access to 170 TBytes of online disk storage for recently-accessed user data. The Mass Storage Library system consists of multiple Sun StorageTek SL8500 tape libraries that have been configured to provide over 2.5 PBytes of available data storage. All data located in the /home file system is archived to tape, and eventually to a remote disaster recovery center.
3. Accessing The Archive Systems
While the login nodes of the ARL unclassified (Harold and Pershing) and classified (MRAP, Hercules, and TOW) production systems have connectivity (over NFS) to the /archive and /home file systems respectively, the compute nodes do not. All users will be given a new home directory on the login and compute nodes, named /usr/people/username. When you login, you will automatically be placed in your local /usr/people home directory, but you can still access the NFS /home and /archive file systems from the login nodes. When your job script runs on a compute node, it will not be able to access your /home or /archive directories. Therefore, you must pre-stage your input files to the scratch file system (/usr/var/tmp) from a login node before submitting your jobs. After the job completes, you must transfer output files from /usr/var/tmp to /home or /archive from a login node as well. This may be done manually or through the transfer queue, which executes serial jobs on login nodes.

Unclassified System Connectivity to Local and Remote Storage

Classified System Connectivity to Local and Remote Storage
4. Environment Variables (unclassified systems only)
The following environment variables are automatically set in your login environment:
This is an individual user's directory on the permanent file system that serves a given compute platform.
This is the hostname of the archival system serving a particular compute platform.
5. Sample PBS Script with Transfer Job Submission
#!/bin/csh
# Request maximum wallclock time for job
#PBS -l walltime=01:00:00
# select=number_ nodes,ncpus=cores/node,mpiprocs=MPI procs/node
# Total cores requested = number of nodes X MPI procs/node
# For Harold and TOW
#PBS -l select=2:ncpus=8:mpiprocs=8
# For Pershing and Hercules
#PBS -l select=2:ncpus=16:mpiprocs=16
# FOR MRAP
#PBS -l mppwidth=24
#PBS -l mppnppn=12
# Specify how MPI processes are distributed on nodes
#PBS -l place=scatter:excl
# Request job name
#PBS -N linux_transfer
# Request PBS job queue for job
#PBS -q debug
# Indicate Project ID
#PBS -A ARLAP96090ARL
# Request environment variables be exported from script
#PBS -V
set cdir=`pwd`
set JOBID=`echo $PBS_JOBID | cut -f1 -d.`
set TMPD=/usr/var/tmp/$LOGNAME/$JOBID
if (! -e $TMPD ) mkdir -p $TMPD
# change directory to temporary directory on /usr/var/tmp
cd $TMPD
cp ../picalc.exe . || echo "*ERROR* CANNOT COPY ../picalc.exe !"
# For Harold, TOW, Pershing and Hercules
module load compiler/intel/intel12.1 mpi/intelmpi/4.0.3
echo Job $JOBID starting at `date` on `hostname`
echo starting in `pwd`
# Copy any files you need from /home/userid to TMPD
cp ${cdir}/picalc.exe $TMPD || echo "***ERROR*** Problem with copy ${cdir}/picalc.exe $TMPD !"
# Following line will run your program!!!
# For Harold, TOW, Pershing and Hercules
mpirun ./picalc.exe >& picalc.out
# For MRAP
aprun -n 24 -N 12 ./picalc.exe
set st=$status
echo "Execution ended at `date` with status $st"
if ( $st != 0 ) then
echo exiting with bad status
exit $st
endif
#############################################################################
#
# YOU CAN USE THE SAMPLE POST PROCESSING PHASE CODE BELOW IN YOUR OWN SCRIPT
# JUST BE SURE ARCHIVE_HOME IS set (usually /archive/armyc/your_user_id )
# and TMPD is set to your /usr/var/tmp/${LOGNAME}/${JOBID}
# area containing your run data.
#
#############################################################################
echo "ENTERING POST-PROCESSING PHASE...."
echo "JOBID=${JOBID}"
echo "ARCHIVE_HOME=${ARCHIVE_HOME}"
set REQUEST=picalc
set MAIN_JOBID=$JOBID
#------------Post Processing Phase------------------------------
qsub -l walltime=00:15:00 -N transfer-${JOBID} -q transfer -l select=1:ncpus=1:mpiprocs=1:clustertype=transfer -l place=scatter:excl -A ARLAP96090ARL -W depend=afterany:${MAIN_JOBID} -r n << EOF
#!/bin/csh
cd $TMPD
ls -l
if ( ! -d ${ARCHIVE_HOME} ) then
echo "****WARNING!**** The archive directory ${ARCHIVE_HOME} cannot be located!"
endif
echo post job submitted on host `hostname` at `date`
cd ..
tar cvf ${JOBID}.tar ${JOBID} || echo "ERROR: Tar of files failed, look in $TMPD for output files!"
cp ${JOBID}.tar $ARCHIVE_HOME || echo "ERROR: Copy of ${JOBID}.tar to $ARCHIVE_HOME failed...."
set st=\$status
echo "output file copied to archive for job $JOBID ended at `date` with status \$st"
exit \$st
EOF

