Skip Nav

Frequently Asked Questions

Table of Contents

1. Logging In

Q. How do I change my password?

A.Use the Kerberos kpasswd command on your local host or on HPC servers. The UNIX command, passwd, is not valid on systems using Kerberos for secure login. Windows users may also use the "Change Password" button when using KRB5/kinit.

Q. After entering my password and pressing my YubiKey button, I am still unable to get a ticket. What else can be wrong?

A.Several things could cause your password and YubiKey to stop working. These things include a YubiKey that is out of sync, system outages, your YubiKey not working, or multiple incorrect login attempts. The quickest way to resolve the problem is to contact the HPC Help Desk so the YubiKey can be checked.

Q. I want to install the latest Kerberos software. Where do I get it?

A.The official DoD HPCMP Kerberos software can be downloaded from HPC Centers: Kerberos & Authentication.

Q. How do I register my CAC with pIE?

A.Perform the following steps to register you CAC with pIE:

  1. Go to https://ieapp.erdc.hpc.mil
  2. Click the "Accept Conditions and Continue" button on the US Government Notice and Consent Banner.
  3. Click the "OpenID Login" button, and complete the OpenID login process.
  4. Once logged in, click on the tab titled "Register CAC to Your Existing Account," and follow the instructions to register your CAC.

    It usually takes a day for your CAC information to be entered into the Kerberos KDC. You may or may not receive a notice after this has been done.

Q. How can I view my account usage and remaining allocation?

A.The command "show_usage" is available on all computational systems and reports allocations and usage. This output is displayed at login, but users can run the command: show_usage at any time to see the output.

Q. How do I login to an ARL DSRC machine using Kerberos on a PC?

A.Follow these steps:

  1. Run KRB5.EXE. The following dialog will appear:
    Kinit interface
  2. Enter your HPCMP userid in the Name field.
    Enter HPCMP.HPC.MIL (in all caps) in the Realm field.
    If using a CAC card and PKINIT, leave the Password field blank and press the <ENTER> key.
    If using a YubiKey, enter your HPCMP Kerberos password into the Password field and press the button on the YubiKey in USB port.
    After a few seconds, you will get a green Kerberos ticket that is valid for 10 hours.
  3. Open PuTTY and enter the information below:
    Putty interface showing how to save a session.

    Replace the username with your userid.
    After you have done this, name the connection (for instance, Excalibur) under "Saved Sessions" and click Save (red circle).

  4. Now, whenever you want to connect to that system, double-click on the system name, as shown below:
    Putty interface showing how to open a saved session.

Q. I get the following error when getting a ticket through my YubiKey: "kinit: Preauthentication failed while getting initial credentials". How do I solve this?

A.For example:

smith@blue> kinit smith@HPCMP.HPC.MIL
Password for smith@ARL.HPC.MIL:
Passcode:
kinit: Preauthentication failed while getting initial credentials

There are three possible causes:

  • A bad password.
  • A bad passcode. This could be due to a malfunctioning YubiKey.
  • You don't use the kdc_timesync = 1 and ccache_type = 4 in your Kerberos configuration, and the time on the machine has drifted. If this is the case, try checking your configuration file and adjusting your system clock.

Try to kinit a couple times to make sure you have entered the "correct" password and passcode. If you are still having the same problem, please send an email to help@helpdesk.hpc.mil.

Q. What do I do when I get: "Can't send request (send_to_kdc)"?

A. Your system can't get the correct address for the Kerberos at your principal realm. If you are a new Kerberos user, you need to check your Kerberos configuration file (krb5.ini or krb5.conf). Check the entries under [default_realm] and [domain_realm].

#Kerberos Services
klogin          543/tcp         # Kerberos authenticated rlogin
kshell          544/tcp   cmd   # and remote shell
eklogin        2105/tcp         # Kerberos encrypted rlogin
Kerberos         88/udp   kdc   # Kerberos authentication--udp
Kerberos         88/tcp   kdc   # Kerberos authentication--tcp
kerberos-sec    750/udp         # Kerberos authentication--udp
kerberos-sec    750/tcp         # Kerberos authentication--tcp
kerberos_master 751/udp         # Kerberos authentication
kerberos_master 751/tcp         # Kerberos authentication
kerberos_adm    752/tcp         # Kerberos 5 admin/changepw
passwd_server   752/udp         # Kerberos passwd server
kpop           1109/tcp         # Pop with Kerberos
kshell          544/tcp   cmd   # and remote shell
klogin          543/tcp         # Kerberos  authenticated rlogin
eklogin        2105/tcp         # Kerberos encrypted rlogin
kftp            765/tcp         # Kerberized ftp 
krb_prop        754/tcp         # Kerberos slave propagation

Q. Why am I getting the error message "clock skew too great" when requesting a Kerberos ticket using the Windows client?

A.This indicates that the clock on your computer has the wrong time.

"Clock skew" is the range of time allowed for a server to accept Kerberos authenticators from a client. In order for Kerberos authentication to work, your Windows client and the Kerberos server's time need to be within 5 minutes of each other. If they are too far off you will receive the "clock skew too great" error message and you will not be able to get a Kerberos ticket.

To resolve this issue you must manually set the clock on your system to the correct time.

To avoid this problem in the future there are several free time synchronization programs available to use under Windows. Here are a couple of easy to use programs for keeping the time current on your Windows system:

They both have install/uninstallers and require no knowledge of NTP (Network Time Protocol).

If you are running Windows XP, Vista, or Windows 7 you will need to have administrator privileges in order to set your time, either manually or using an automated program.

For Admins of XP, Vista, or Win7 boxes there is a Windows port of the Unix NTP code. This might be useful if your site already has NTP servers setup, and you want to sync your Domain controller to it (and have your clients sync to Domain controller using Microsoft utils): http://www.five-ten-sg.com/util/ntp4172.zip.

Q. What firewall settings should I have set locally to access the ARL DSRC?

A.If there is a Firewall between your machine using Kerberos and the ARL DSRC and you are unable to connect, provide the following information to your firewall administrator:

Different Kerberos clients sometimes contact different ports for the same services. Kerberos servers know how to respond to the various clients. Random client ports usually run from 1024 to 65536, but some ssh clients use priviledged ports 1023, 1022, 1021, ... for each successive simultaneous ssh connection.

A site should open all the ports listed below:

Service                 TCP/UDP         Server Port     Client Port
kinit/krb5.exe          tcp             88              random
kinit/krb5.exe          udp             88              88
kpasswd                 tcp             749             random
kdc-a.afrl.hpc.mil      tcp             749             random
kshell/rcp/rsh          tcp             544             random
kshell/rcp/rsh          tcp             1023,1022,...   random
encrypted rlogin -x     tcp             2105            random

A site should also open the ports for kftp. This uses the standard ftp ports:

Service                 TCP/UDP         Server Port     Client Port
Ssh                     tcp&udp         23              random
ftp data                tcp             random          random

The ARL DSRC hosts accept ssh from ssh clients that know how to use Kerberos credentials. Windows, Linux, and Mac versions are available on HPC Centers: Kerberos & Authentication. SSH should then be used to tunnel X11 sessions securely as well as for regular ssh connections.

Sites should also open this port:

Service                 TCP/UDP         Server Port     Client Port
ssh                     tcp             22              random

DoD networks may block standard X11 ports now or in the near future. These ports generally start at 6000 and work up for additional Xdisplays. One DoD suggestion is to block ports often used by X11 from 6000 tcp/udp to 6063 tcp/udp. This would have an adverse affect of most tcp/udp protocols. All tcp/udp protocols choose random ports from the range 1024-65536. When they get a failure, they often increment the port number by 1. So a valid process unrelated to X11 which happens to choose random port number 6000 could have to retry 64 times before getting an unblocked port number.

If the site permits X11, and X11 tunneled via SSH is not available, open the following ports (note, here the server is on the machine where the X11 display is running - at the user-site rather than at the ARL DSRC):

Service                 TCP/UDP         Server Port     Client Port
X11                     tcp             6000            random
X11                     tcp             6001            random
X11                     tcp             6002            random
X11                     tcp             6003            random

(Allow the most common X11 ports. Permit numbers greater than 6003 if firewall/filter logs show valid traffic getting blocked.)

These ports may be used by some Kerberos clients. Only open these if filter/firewall logs show that valid traffic to these ports is getting blocked:

Service                 TCP/UDP         Server Port     Client Port
kftp                    tcp             765             random
ftp-data                tcp             20              random
Example filter configuration for a Cisco router

These filters would be installed at a user site and applied to traffic coming inbound. Without comments, the configuration looks like this:

access-list 101 permit tcp any eq 88 any
access-list 101 permit udp any eq 88 any
access-list 101 permit tcp any eq 749 any established
access-list 101 permit tcp any eq 544 any established
access-list 101 permit tcp 140.31.64.0 0.0.63.255 range 1015 1023 any
access-list 101 permit tcp any eq 2105 any established
access-list 101 permit tcp 140.31.64.0 0.0.63.255 eq 23 any established
access-list 101 permit tcp 140.31.64.0 0.0.63.255 eq 21 any established
access-list 101 permit tcp 140.31.64.0 0.0.63.255 any established
access-list 101 permit tcp any eq 22 any established

Q. My login node response time is sluggish? How do I find a less busy node?

A.You can use the node_use command to determine the current least busy node available. The command format is:

node_use

It provides the memory usage and load average for all the login nodes. For example:

excalibur11> node_use 

Login Node Memory Status

Node Name    Total (Kb)   Used (Kb)   Free (Kb)   Pct. Free   Load Avg.
=========    ==========  ==========  ==========  ==========  ==========
excalibur01   264048092   209417860    54630232      20.69%        2.03
excalibur02   264048092   223599888    40448204      15.32%        2.08
excalibur03   264048092   241123520    22924572       8.68%        3.04
excalibur04   264048092   249096880    14951212       5.66%        2.40
excalibur05   264048092   212127420    51920672      19.66%        3.00
excalibur06   264048092   237886668    26161424       9.91%        1.59
excalibur07   264048092   245951968    18096124       6.85%        2.72
excalibur08   264048092   223738044    40310048      15.27%        0.93
excalibur09   264048092    44951540   219096552      82.98%        1.67
excalibur10   264048092   140508956   123539136      46.79%        1.91
excalibur11   264048092   246910616    17137476       6.49%        2.64
excalibur12   264048092   249469508    14578584       5.52%        1.84
excalibur13   264048092   201677076    62371016      23.62%        4.77
excalibur14   264048092   222584228    41463864      15.70%        1.74
excalibur15   264048092    61338964   202709128      76.77%        1.07

Q. What local shell variables are automatically defined in my login environment?

A.The Baseline Configuration Team has determined a set of environment variables to be defined at all centers. Details can be found at http://centers.hpc.mil/consolidated/bc/policies.php?choice=environment

Q. How do I log into an ARL DSRC machine using my YubiKey?

A. Please review http://centers.hpc.mil/users/yubi.html.

2. Machine Configuration

Q. What is the configuration of your Cray XC40 (Excalibur)?

A.A thorough configuration summary for Excalibur is available in the System Configuration section of the Excalibur User Guide.

3. Manuals

Q. Where can I find the Excalibur User Guide?

A.On the ARL DSRC Web site at http://www.arl.hpc.mil/docs/excaliburUserGuide.html

Q. Where can I find the Utility Server User Guide?

A.On HPC Centers at http://centers.hpc.mil/users/heue/USUserGuide.html.

Q. Where can I find the PBS User Guide?

A. Hercules: http://www.arl.hpc.mil/docs/pbsUserGuide.html
Excalibur: http://www.arl.hpc.mil/docs/excaliburPbsGuide.html

Q. Where can I find the Modules User Guide?

A.On the ARL DSRC Web site at http://www.arl.hpc.mil/docs/modulesUserGuide.html

4. SSH

Q. What do I do when I receive a "Host Key Verification Error"?

A.There are several possible causes for this problem:

  1. The known_hosts file in your home directory has been corrupted. To correct this execute the following commands on Excalibur:

    rm -R ${HOME}/.ssh
    exit
    (log back into Excalibur from your desktop)
  2. You are using the wrong version of ssh. There are two versions available on Excalibur, /usr/bin/ssh and /usr/brl/bin/ssh. Your default version should be /usr/bin/ssh. You can determine which is your default version by executing "which ssh". If /usr/brl/bin/ssh is your current default version, then you can change it by adding the following line to your .cshrc or .profile file:

    .cshrc:
    setenv PATH /usr/bin:$PATH

    .profile:
    set PATH="/usr/bin:$PATH";export $PATH

  3. The access to your home directory (/usr/people/username) is too open. For security reasons, if "group" or "world" have write access to your home directory then ssh will not work. Remove the group/world write access from your home directory to correct this problem.
  4. The access to your .ssh directory (/usr/people/username/.ssh) is too open. For security reasons, if "group" or "world" have write access to this directory then ssh will not work. Remove the group/world write access from this directory to correct this problem.
  5. The compute node you are trying to access is not yet in your known_hosts file. This is only a problem when running batch jobs. To avoid this problem add the following to your run script before you invoke the parallel executable:

    Excalibur - csh, tcsh shells
    #===================================================================
    foreach host (`cat $PBS_NODEFILE`)
      echo "Working on $host ...."
      /usr/bin/ssh -o StrictHostKeyChecking=no $host pwd
    end
    #===================================================================
    Excalibur - sh, ksh, bash shells
    #===================================================================
    host=""
    for new_host in `cat $PBS_NODEFILE`
      do
      if [ "$new_host" != "$host" ]
       then
        host=$new_host
        echo "Working on $host ...."
        /usr/bin/ssh -o StrictHostKeyChecking=no $host pwd
      fi
    #===================================================================

5. Job Failures

Q. Why is my job not finding my input file and crashing?

A.Since the compute nodes of Excalibur are not able to access your files in /archive or on the Center-Wide File System ($CENTER), you must pre-stage (manually copy) your input files to your /usr/people area ($HOME), space permitting, or to your /usr/var/tmp area ($WORKDIR) before submitting your jobs. Job scripts will need to be modified to pick up input files from these file systems.

Q. Why am I getting a "Segmentation Violation" error in my job?

A.This error generally means that your application has exceeded the stack space limit for the shell in which the job script is running. By default, this value is set to relatively small amount of memory. To correct this problem for csh and tcsh scripts the "unlimit" command should be placed in the .cshrc file in your home directory. For sh and ksh scripts, the "ulimit" command should be placed in the .profile file in your home directory. For bash scripts, the "ulimit" command should be placed in the .bashrc file in your home directory.

Q. Why am I getting a "PBS: job killed: mem NNNNNNNNkb exceeded limit NNNNNNNkb" error in my job?

A.This error means your job is exceeding the maximum available user memory on one or more of the nodes being used by your job. For Excalibur, this value is 126 GBytes on the standard memory nodes and 508 GBytes for large memory nodes. For parallel jobs the first option for correcting this problem is to run on more nodes while using fewer processes on each node. For example:

On Excalibur, change the PBS option
"select=8,ncpus=32,mpiprocs=32"
to
"select=16,ncpus=32,mpiprocs=16"

If the problem persists, even while using 1 process per node, then you need to redefine your problem to use a smaller memory footprint.

6. COTS Software

Q. I am trying to request a piece of software that is not available, but which has a price for downloading and licensing. What is the process for software of this kind?

A.For all COTS software that we do not have on our systems which will require a purchase, please submit a software request form at: https://reservation.hpc.mil/index-sw_request.html

Q. What software is available on each system at ARL DSRC?

A.Please refer to the software listing at: http://www.arl.hpc.mil/software. Also, performing a "module avail" on the console will display a listing of all COTS applications.

Q. Where can I find an example script for running a COTS package?

A.The $SAMPLES_HOME environment variable points to the directory containing the Sample Code Repository. Execute "ls $SAMPLES_HOME" to see all the sample scripts available for that system. The actual sample scripts are contained within the subdirectories listed. There is also an index file in the main directory explaining the contents of each subdirectory.

Q. How can modules help me?

A. Modules software is recommended for convenience in accessing ARL DSRC COTS software and is available on all our systems. Modules should already be initialized by default for all new users. If you do not see any modules or you get a "module: Command not found." error, you must establish the modules software. To do this, copy the commands, depending on your login shell, from the /usr/cta/modules/samples/ directory into your shell's startup file (.cshrc or .profile and/or .bashrc). Once you have done this, and sourced your .cshrc (or .profile), you can use the module command "module avail" to see what software is available:

excalibur11> module avail

--------------------- /usr/cta/modules/3.2.10.1/modulefiles ----------------------
Master

--------------------- /usr/cta/modules/3.2.10.1/unsupported ----------------------
MOLPRO/2012.1(default) franc3d/6.0.5          nwchem/6.5(default) silo/4.8(default)
cmake/3.2              gromacs/5.0.4(default) python/2.7          xv/3.10a(default)
create                 lammps/dec14(default)  qe/5.1.1(default)
csd/2015(default)      mpi/openmpi/1.8.4      qe/5.1.2

------------------------- /usr/cta/modules/3.2.10.1/COTS -------------------------
VASP/5.3(default)         ensight/10.1.2a             ls-prepost/4.0(default)
VASP/5.3.5                ensight/10.1.4a(default)    mathematica/10.0(default)
abaqus/6.14-2(default)    ensight/10.1.4b             matlab/8.4.0(default)
accelrys/8.0              epic/13(default)            matlab/8.5.0
adf/2014.06               fluent/150(default)         mesodyn/8.0(default)
ale3d/4.24(default)       fluent/160                  paraview/4.3.1(default)
amber/12(default)         g09/a01                     paraview/4.3.1_osmesa
ansys/150                 g09/b01                     starccm/10.02.010
ansys/160                 g09/c01(default)            starccm/10.02.010-R8
castep/8.0(default)       g09/d01                     starccm/9.06.009(default)
cfd++/14.1.1(default)     g09/d01_bigmem              starccm/9.06.009-R8
cfd++/15.1.1              gamess/dec14(default)       swtestsuite/v1(default)
comsol/5.0(default)       gaussian                    tecplot/360ex_2015r1
costinit                  gaussview/5.0.9(default)    totalview/8.13
cseinits                  gulp/8.0                    visit/2.8.2
cseinit-devel             ls-dyna/971_7.1.1(default)  visit/2.9.0(default)
cth/10.3(default)         ls-dyna/971_8.0             visit/2.9.1
dmol3/8.0                 ls-opt/4.2(default)

----------------------------- /opt/cray/modulefiles ------------------------------
PrgEnv-cray/5.2.40(default)            cray-netcdf/4.3.2(default)
PrgEnv-gnu/5.2.40(default)             cray-netcdf-hdf5parallel/4.3.2(default)
PrgEnv-intel/5.2.40(default)           cray-parallel-netcdf/1.5.0(default)
PrgEnv-pgi/5.2.40(default)             cray-parallel-netcdf/1.6.0
atp/1.7.5(default)                     cray-petsc/3.5.2.0(default)
atp/1.8.0                              cray-petsc/3.5.2.1
capmc/1.0-1.0000.35027.21.1(default)   cray-petsc-complex/3.5.2.0(default)
ccm/2.2.0-1.0502.55200.8.95(default)   cray-petsc-complex/3.5.2.1
cray-ccdb/1.0.4(default)               cray-shmem/7.0.5
cray-ccdb/1.0.5                        cray-shmem/7.1.0(default)
cray-ga/5.1.0.5(default)               cray-shmem/7.1.3
cray-ga/5.3.0.1                        cray-tpsl/1.4.2(default)
cray-hdf5/1.8.13(default)              cray-tpsl/1.4.3
cray-hdf5-parallel/1.8.13(default)     cray-trilinos/11.10.1.0(default)
cray-lgdb/2.3.2(default)               cray-trilinos/11.12.1.1
cray-lgdb/2.4.1                        craype/2.2.1(default)
cray-libsci/13.0.1(default)            craypkg-gen/1.2.1(default)
cray-libsci/13.0.3                     craypkg-gen/1.3.0
cray-libsci_acc/3.0.2(default)         cudatoolkit/5.5.22-1.0502.7944.3.1(default)
cray-mpich/7.0.5                       fftw/2.1.5.7
cray-mpich/7.1.0(default)              fftw/3.3.4.1(default)
cray-mpich/7.1.3                       fftw/3.3.4.2
cray-mpich-abi/7.0.5                   iobuf/2.0.5(default)
cray-mpich-abi/7.1.0(default)          papi/5.3.2.1(default)
cray-mpich-abi/7.1.3                   perftools/6.2.2(default)
cray-mpich-compat/v6                   perftools-lite/6.2.2(default)
cray-mpich-compat/v7                   stat/2.1.0.1(default)

---------------------- /opt/cray/craype/default/modulefiles ----------------------
craype-abudhabi         craype-hugepages32M    craype-mc8
craype-abudhabi-cu      craype-hugepages4M     craype-network-aries
craype-accel-host       craype-hugepages512M   craype-network-gemini
craype-accel-nvidia20   craype-hugepages64M    craype-network-infiniband
craype-accel-nvidia35   craype-hugepages8M     craype-network-none
craype-barcelona        craype-intel-knc       craype-sandybridge
craype-haswell          craype-interlagos      craype-shanghai
craype-hugepages128M    craype-interlagos-cu   craype-target-compute_node
craype-hugepages16M     craype-istanbul        craype-target-local_host
craype-hugepages256M    craype-ivybridge       craype-target-native
craype-hugepages2M      craype-mc12            craype-xeon

--------------------------- /opt/cray/ari/modulefiles ----------------------------
alps/5.2.1-2.0502.9041.11.6.ari(default)
configuration/1.0-1.0502.53348.1.16.ari(default)
dmapp/7.0.1-1.0502.9501.5.219.ari(default)
dvs/2.5_0.9.0-1.0502.1873.1.145.ari(default)
flexnet-publisher/11.12.1-1.0000.9037.2.1.ari
gni-headers/3.0-1.0502.9684.5.2.ari(default)
hosts/1.0-1.0502.53297.1.142.ari(default)
krca/1.0.0-2.0502.53880.4.104.ari(default)
lbcd/2.1-1.0502.53290.1.15.ari(default)
logcb/1.0-1.0502.53286.1.1.ari(default)
nodehealth/5.1-1.0502.56494.9.2.ari(default)
nodestat/2.2-1.0502.53712.3.109.ari(default)
pdsh/2.26-1.0502.53339.1.1.ari(default)
pmi/5.0.6-1.0000.10439.140.2.ari(default)
rca/1.0.0-2.0502.53711.3.127.ari(default)
sdb/1.0-1.0502.55976.5.27.ari(default)
shared-root/1.0-1.0502.53337.1.119.ari(default)
switch/1.0-1.0502.54233.2.96.ari(default)
sysutils/1.0-1.0502.53306.1.1.ari(default)
udreg/2.3.2-1.0502.9275.1.12.ari(default)
ugni/5.0-1.0502.9685.4.24.ari(default)
wlm_detect/1.0-1.0502.53341.1.1.ari(default)
wlm_trans/1.0-1.0502.55978.2.29.ari(default)
xpmem/0.1-2.0502.55507.3.2.ari(default)
-------------------------------- /opt/modulefiles --------------------------------
cce/8.3.5                             intel/15.0.1.133
cce/8.3.6(default)                    java/jdk1.7.0_45(default)
cce/8.3.9                             modules/3.2.10.1(default)
chapel/1.9.0.1(default)               modules/3.2.10.2
ddt/4.2.2.6_39982(default)            modules/3.2.6.7
ddt-memdebug/4.2.2.6_39982(default)   pbs
eswrap/1.1.0-1.020200.1231.0(default) pgi/14.10.0(default)
gcc/4.8.1                             pgi/14.10.0-acc
gcc/4.9.2(default)                    xc-sysroot/5.2.40(default)
intel/14.0.2.144

----------------------------- /cm/local/modulefiles ------------------------------
cluster-tools/6.1   freeipmi/1.2.6      module-info         shared
cmd                 ipmitool/1.8.12     null                use.own
dot                 module-git          openldap            version
----------------------------- /cm/shared/modulefiles -----------------------------
cmgui/6.1                  hdf5/1.6.10                openblas/istanbul/0.2.6
default-environment        openblas/bulldozer/0.2.6   openblas/nehalem/0.2.6
gcc/4.8.1                  openblas/dynamic/0.2.6     openblas/sandybridge/0.2.6

You can also use the command "module list" to see what modules are currently loaded:

excalibur11> module list
Currently Loaded Modulefiles:
    1) modules   2) pbs       3) Master

Use the command "module load" to load a new module (i.e. select a certain package of software):

excalibur11> module load gaussian
excalibur11> module list
Currently Loaded Modulefiles:
   1) modules   2) pbs       3) Master    4) Gaussian

A thorough discussion of modules, module commands, and their usage is available in the Modules User Guide. Additional information is also found in each HPC System User Guide about the specific modules that are found on each system.

Q. How do I transfer my data files between the scratch, archive, and CWFS file systems for a batch job if the archive file system is not accessible from the compute nodes?

A.A special PBS queue on Excalibur has been setup for this purpose, called the transfer queue. This queue allows serial jobs to run up to 24 hours on a login node, which has access to the scratch ($WORKDIR), the CWFS ($CENTER), and /archive ($ARCHIVE_HOME) directories. Jobs may be submitted directly or within computational job scripts after the computation part of the job has completed. An example of how use the transfer queue can be found in $SAMPLES_HOME/Workload_Management/Transfer_Queue.

Q. How can I transfer my ARL data to another DSRC?

A.The mpscp command is provided at all DSRCs to facilitate transferring data files between DSRC sites. The mpscp command provides a parallel multi-streaming capability to significantly decrease transfer times for very large files. Below are examples of how to use this command.

  1. Transferring data from Excalibur to Haise at Navy DSRC using 4 parallel streams:

    mpscp -w 4 ${WORKDIR}/output_data.tgz haise.navo.hpc.mil:/scratch/username/output_data.tgz

  2. Transferring data from Haise at Navy to Excalibur using 2 parallel streams:

    mpscp -w 2 haise.navo.hpc.mil:/scratch/username/output_data.tgz ${ARCHIVE_HOME}/output_data.tgz

Q. I can't see the stdout/stderr from my job until after the job completes. Is there any way to check on this output during the run?

A.The qpeek command provides this capability. Its usage is as follows:

qpeek JOB_ID

Q. How can I check on the license usage status of COTS packages?

A.The check_license command provides this capability. Below is a description of its use:

The invocation of the command "check_license application-name" will display the featured application, the number of unused licenses, and information for all current and pending license reservations. For Example, the command:

check_license abaqus

produces the following output:

                        Available abaqus Licenses:
       Reservable abaqus Licensed features (avail licenses):
abaqus: 19
                Requestable but Non-reservable abaqus Licensed features (avail licenses):
aqua: 236
cae: 7
design: 236
===================================================
                Reservations for abaqus:
Display window is from Thu May 28 13:24:06 2015 (1432833846)
                    to Fri May 27 13:24:06 2016 (1464369846)
                        All abaqus Reservations:
id=XJ15615 user=user1 host=predator ts=1432751556 te=1433039556 tok:abaqus=21
             Wed May 27 14:32:36 2015 through Sat May 30 22:32:36 2015
id=SJ1188110 user=user2 host=spirit ts=1432759233 te=1432975233 tok:abaqus=5
             Wed May 27 16:40:33 2015 through Sat May 30 04:40:33 2015

Q. How do I check the status of the PBS batch queues?

A.The show_queues command provides this capability.

excalibur11> show_queues
QUEUE INFORMATION for EXCALIBUR:
               Maximum      Maximum    Jobs    Jobs    Cores     Cores   Queue
Queue Name     Wall Time    Cores      Running Pending Running   Pending Running
---------------------------------------------------------------------------------
transfer         48:00:00    n/a        0        0        0        0       Y
gpu              24:00:00    n/a        0        0        0        0       N
mem              24:00:00    n/a        0        0        0        0       N
staff            24:00:00    n/a        0        0        0        0       Y
frontier        168:00:00    n/a        7        0     8704        0       Y
high             96:00:00    n/a        3        0     1536        0       Y
urgent           96:00:00    n/a        0        0        0        0       Y
cots             96:00:00    n/a        0        0        0        0       Y
workq            24:00:00    n/a        0        0        0        0       Y
standard-long   200:00:00    n/a        0        0        0        0       Y
challenge       168:00:00    n/a        0        0        0        0       Y
tardec          504:00:00    n/a        0        0        0        0       Y
standard        168:00:00    n/a      207       17    65574    36672       Y
background       24:00:00    n/a        3        7     2752    10112       Y
debug            01:00:00    n/a        0        0        0        0       Y
interactive      12:00:00    n/a        0        0       21        0       Y
R14449           04:00:00      4        0        0        0        0       N
R14454           04:00:00      4        0        0        0        0       N
R14458           04:00:00      4        0        0        0        0       N
R14465           04:00:00      4        0        0        0        0       N
R14470           04:00:00      4        0        0        0        0       N
R16440           01:00:00     32        0        0        0        0       N
R16441           01:00:00     32        0        0        0        0       N
R16452           01:00:00     32        0        0        0        0       N
R16464           01:00:00     32        0        0        0        0       N
R16472           01:00:00     32        0        0        0        0       N
R16474           01:00:00     32        0        0        0        0       N
R16493           01:00:00     32        0        0        0        0       N
R16494           01:00:00    160        0        0        0        0       N
R16723          168:00:00   2048        1        0     2048        0       Y
R17825          168:00:00    256        1        0      256        0       Y
R18295          152:00:00   1024        1        0     1024        0       Y
R19698          168:00:00   4000        2        0     1600        0       Y
R23175           48:00:00   5760        0        0        0        0       Y
R23217           48:00:00   1024        7        0      416        0       Y
R27853          108:00:00    128        3        0      128        0       Y
R28899           04:00:00      4        0        0        0        0       N
R28901           04:00:00      4        0        0        0        0       N
R28902           04:00:00      4        0        0        0        0       N
R28903           04:00:00      4        0        0        0        0       N
R28905           04:00:00      4        0        0        0        0       N

NODE USAGE INFORMATION:
   Node Type    Cores Available  Cores Running    Cores Free
       Batch        99008            84059          14949
     Graphic         1024                0           1024
  Big.Memory         1024              640            384

Node-Type       Nodes       Cores    Physical-Memory  Available-Memory
              Available    Per Node     Per Node         Per Node
LOGIN             16          32         256Gb            254Gb
Compute         3098          32         128Gb            126Gb
Reservation *    648          32         128Gb            126Gb
Graphic           32          32         256Gb            254Gb
Big Memory        32          32         512Gb            500Gb

     * All Reservation nodes are part of the Compute,
       Graphic and Big Memory pools of nodes.

7. Compilers and MPI Suites

Q. What compilers are available on the Linux Clusters?

A.Please type "module avail" on the system's console and look at the modules listed under /usr/cta/modules/3.2.7/devel on the Utility Server and /opt/cray/modulefiles on Excalibur. If you get "module: Command not found", please see the question How can modules help me? to determine how to correct this. Detailed compilation, code optimization, and compiler settings can be found in the HPC User Guides on the documentation page: http://www.arl.hpc.mil/docs.

Q. What MPI Suites are available on the Linux Clusters?

A.Please type "module avail" on the system's console and look on the Utility Server at the modules listed under /usr/cta/modules/3.2.7/devel with "MPI" as part of the name. Only CRAY MPICH is available on Excalibur. Note that if you get "module: Command not found", please see the question How can modules help me? to determine how to correct this. Detailed compilation, code optimization and compiler settings can be found in the HPC User Guides on the documentation page: http://www.arl.hpc.mil/docs.

8. Miscellaneous

Q. How do I remove characters caused by a FTP session from a PC to a Unix System?

A.Using FTP to transfer a text file in binary mode or a tar file from a PC to a Unix system can cause your file to contain many "^M" characters, representing carriage returns, and "^Z" characters. This can cause problems when compiling that file.

To remove these "^M"s and "^Z"s, use the dos2unix command as follows:

dos2unix file_name

Q. Where should I send back my broken or unwanted YubiKey?

A.Please mail it back to the following address:

AFRL DSRC
ATTN: HPC HELP DESK ACCOUNTS
2435 Fifth St.
WPAFB, OH 45433

Q. How can I reach the help desk?

A.Complete contact information is available on the Contact Us page.

Please Note: All unclassified Kerberos and systems support questions should be directed to the HPCMP HPC Help Desk. For all other inquiries, you may contact the ARL DSRC Help Desk.

Q. How do I change permission to a file or a directory?

A. There are three basic modes to files and directories:

  • (r)eadable - Value: 4
  • (w)ritable - Value: 2
  • e(x)ecutable - Value: 1

Additionally, each of these modes can be applied to the

  • (u)ser
  • (g)roup
  • (o)thers

The user means you, the person who owns the file or directory.

The group refers to the unix group associated with the file or directory. To find out what groups you belong to type groups at the unix prompt.

The modes follow a hierarchy of user, group, and then others. Using this we can assign three numerical values.

For example: to make a file readable to all, executable to the group and writable to the user, just add the permission values.

  • user = readable + writable = 4 + 2 = 6
  • group = readable + executable = 4 + 1 = 5
  • other = readable = 4

So to change the permission, type:

% chmod 654 filename

The word "other" means everyone else (aka world), and we do not advise users to open this up for security purposes.

If you need to change permission to an entire directory and its files and its subdirectories, you may use the "-R" option. If you need more information you may review the man pages: man chmod.

Q. What is wrong with my backspace key?

A.All systems use different escape characters to Map your keyboard type to the key. The most common way to work around this problem is to put in your .cshrc or .profile a line "stty erase ^h" in most cases and "stty erase ^?" in others. If this does not work please contact the HPC Help Desk.

Q. What shells are available?

A.The Bourne shell (sh), Korn shell (ksh), C-shell (csh), T-shell (tcsh), Bash-shell (bash), and Z shell (zsh) are all available as default shells.

The shell establishes your user environment. Your shell functions as both a command interpreter and a programming language. The shell that you are specified to use resides in the /etc/passwd file.

All shells read the /etc/profile file at start up to set system wide environmental variables. Next your system reads individual user environment files depending on your default shell.

For Bourne shell, Korn shell, and Z shell logins, the shell executes /etc/profile and $HOME/.profile, if it exists.

For C shell logins T shell logins, the shell executes /etc/cshrc, $HOME/.cshrc, and $HOME/.login.

For Bash shell logins, the shell executes /etc/profile, $HOME/.profile, if it exists, and the $HOME/.bashrc, if it exists.

The default /etc/profile and /etc/cshrc files print /etc/motd and check for mail.

Shell Startup Files
ScriptDescription
$HOME/.cshrc Initial commands for each csh
$HOME/.login User's login commands for csh
$HOME/.profile User's login commands for sh and ksh
$HOME/.bashrc User's login commands for bash

.kshrc, .bashrc, and .cshrc are read at each invocation of a new shell.

Q. How do I use the Advance Reservation Service to reserve nodes for future jobs?

A.Information on accessing the Advance Reservation Service (ARS) may be found on the Advance Reservations page. An ARS User Guide is also available once you are logged in to the system.

Q. Where can I find instructions for downloading and using the Utility Server SRD software?

A. On the DAAC website at http://daac.hpc.mil/software/SRD/.