ARL DSRC: SCOUT LSF Guide

Guide to the LSF Queuing System
on SCOUT

1. Introduction
2. Anatomy of a Batch Script
2.1. Specify Your Shell
2.2. Required LSF Directives
2.2.1. Number of Cores
2.2.2. Number of Processes per Node
2.2.3. How Nodes Should Be Allocated
2.2.4. How Long to Run
2.2.5. Which Queue to Run In
2.2.6. Your Project ID
2.3. The Execution Block
3. Submitting Your Job
4. Simple Batch Script Example
5. Job Management Commands
6. Optional LSF Directives
6.1. Job Identification Directives
6.1.1. Job Name
6.2. Job Environment Directives
6.2.1. Interactive Batch Shell
6.2.2. Export Environment Variables
6.2.3. Request a Node Type
6.3. Reporting Directives
6.3.1. Redirecting Stdout and Stderr
6.3.2. Setting up E-mail Alerts
6.4. Job Dependency Directives
7. Environment Variables
7.1. LSF Environment Variables
7.2. Other Important Environment Variables
8. Example Scripts
8.1. MPI Script
8.2. MPI Script (accessing more memory per process)
8.3. OpenMP Script
8.4. Hybrid MPI/OpenMP Script
8.5. Hybrid MPI/OpenMP Script (Alternative Example)
9. Additional Resources

1. Introduction

On large-scale computers, many users must share available resources. Because of this, you cannot just log on to one of these systems, upload your programs, and start running them. Essentially, your programs (called batch jobs) have to "get in line" and wait their turn. And, there is more than one of these lines (called queues) from which to choose. Some queues have a higher priority than others (like the express checkout at the grocery store). The queues available to you are determined by the projects that you are involved with.

The jobs in the queues are managed and controlled by a batch queuing system, without which, users could overload systems, resulting in tremendous performance degradation. The queuing system will run your job as soon as it can while still honoring the following:

Meeting your resource requests
Not overloading systems
Running higher priority jobs first
Maximizing overall throughput

SCOUT uses the Load Sharing Facility (LSF) queuing system. The LSF commands should be loaded automatically into your path at login, allowing you access to the commands.

2. Anatomy of a Batch Script

A batch script is simply a small text file that can be created with a text editor such as vi or notepad. You may create your own from scratch, or start with one of the sample batch scripts available in $SAMPLES_HOME. Although the specifics of a batch script will differ slightly from system to system, a basic set of components are always required, and a few components are just always good ideas. The basic components of a batch script must appear in the following order:

Specify Your Shell
Required LSF Directives
The Execution Block

IMPORTANT: Not all applications on Linux systems can read DOS-formatted text files. LSF does not handle ^M characters well, nor do some compilers. To avoid complications, please remember to convert all DOS-formatted ASCII text files with the dos2unix utility before use on any HPC system. Users are also cautioned against relying on ASCII transfer mode to strip these characters, as some file transfer tools do not perform this function.

2.1. Specify Your Shell

First of all, remember that your batch script is a script. It's a good idea to specify which shell your script is written in. Unless you specify otherwise, LSF will use your default login shell to run your script. To tell LSF which shell to use, start your script with a line similar to the following, where shell is either bash, sh, ksh, csh, or tcsh:

#!/bin/shell

2.2. Required LSF Directives

The next block of your script will tell LSF about the resources that your job needs by including LSF directives. These directives are actually a special form of comment, beginning with "#BSUB". As you might suspect, the # character tells the shell to ignore the line, but LSF reads these directives and uses them to set various values.

IMPORTANT: All LSF directives MUST come before the first line of executable code in your script, otherwise they will be ignored.

Every script must include directives for the following:

The number of cores per node
The number of nodes and processes per node you are requesting
How nodes should be allocated
The maximum amount of time your job should run
Which queue you want your job to run in
Your Project ID

LSF also provides additional optional directives. These are discussed in Optional LSF Directives, below.

2.2.1. Number of Cores

Before LSF can schedule your job, it needs to know how many cores are required. This is accomplished with the "-n" option.

Example 1: Serial code requiring 1/160 of the node's memory, and that will allow other jobs to run on its assigned node.

#BSUB -n 160
#BSUB -gpu "num=1:mode=shared" 
#BSUB -R "span[ptile=1]"

Example 2: Serial code requiring 2/160 of the node's memory, and that will allow other jobs to run on its assigned node.

#BSUB -n 160
#BSUB -gpu "num=1:mode=shared"
#BSUB -R "span[ptile=2]"

Example 3: Serial or OpenMP code requiring all of the node's memory, and that will NOT allow other jobs to run on its assigned node.

#BSUB -n 160
#BSUB -gpu "num=1:mode=exclusive "
#BSUB -R "span[ptile=160]"
#BSUB -x

2.2.2. Number of Processes per Node

Before LSF can schedule your job, it needs to know how many processes you want to run on each of those nodes. In general and by default, your job will run with one process per core, but you might want more or fewer processes depending on the programming model you are using. The "-R span[ptile=n]" option indicates the number of processes on each node that should be allocated. See Example Scripts (below) for alternate use cases.

#BSUB -R "span[ptile=160]"

2.2.3. How Nodes Should Be Allocated

Some default behaviors in LSF have the potential to seriously impair the ability of your scripts to run in certain situations and could impose restrictions on submitted jobs that might cause them to wait much longer in the queue than necessary. To prevent these situations from occurring, the following LSF directives can be used in batch scripts on SCOUT:

#BSUB -x

This line indicates to LSF that the job has exclusive access to use the requested nodes.

Also, jobs can share nodes with other jobs whose nodes are not fully utilized by adding the "mode=shared" option, as follows:

#BSUB -n 160
#BSUB -gpu "num=1:mode=shared"

Use the "mode=shared" directive to share nodes. Replace "shared" with "exclusive" if you don't want to share nodes, and you don't want to use the "-x" directive.

For a further explanation of these directives, see the bsub man page.

2.2.4. How Long to Run

Next, LSF needs to know how long your job will run. For this, you will have to make an estimate. There are three things to keep in mind.

Your estimate is a limit. If your job hasn't completed within your estimate, it will be terminated.
Your estimate will affect how long your job waits in the queue. In general, shorter jobs will run before longer jobs.
Each queue has a maximum time limit. You cannot request more time than the queue allows.

To specify how long your job will run, include the following directive:

#BSUB -W HH:MM

2.2.5. Which Queue to Run In

Now, LSF needs to know which queue you want your job to run in. Your options here are determined by your project. Most users only have access to the debug, standard, interactive, and background queues. Other queues exist, but access to these queues is restricted to projects that have been granted special privileges due to urgency or importance, and they will not be discussed here. As their names suggest, the standard and debug queues should be used for normal day-to-day and debugging jobs. The background queue, however, is a bit special because although it has the lowest priority, jobs that run in this queue are not charged against your project allocation. Users may choose to run in the background queue for several reasons:

You don't care how long it takes for your job to begin running.
You are trying to conserve your allocation.
You have used up your allocation.

To see the list of queues available on the system, use the bqueues command. To specify the queue you want your job to run in, include the following directive:

#BSUB -q queue_name

2.2.6. Your Project ID

LSF now needs to know which project ID to charge for your job. You can use the show_usage command to find the projects that are available to you and their associated project IDs. In the show_usage output, project IDs appear in the column labeled "Subproject." Note: Users with access to multiple projects should remember that the project they specify may limit their choice of queues.

To specify the Project ID for your job, include the following directive:

#BSUB -P Project_ID

2.3. The Execution Block

Once the LSF directives have been supplied, the execution block may begin. This is the section of your script that contains the actual work to be done. A well written execution block will generally contain the following stages:

Environment Setup - This might include setting environment variables, loading modules, creating directories, copying files, initializing data, etc. As the last step in this stage, you will generally cd to the directory that you want your script to execute in. Otherwise, your script would execute by default in your home directory. Most users use "cd $LS_EXECCWD" to run the batch script from the directory where they typed "bsub" to submit the job.
Compilation - You may need to compile your application if you don't already have a pre-compiled executable available.
Launching - Your application is launched using the mpiexec command for IBM Spectrum MPI codes.
Clean up - This usually includes archiving your results and removing temporary files and directories.

3. Submitting Your Job

Once your batch script is complete, you will need to submit it to LSF for execution using the bsub command. For example, if you have saved your script into a text file named run.lsf, you would type "bsub < run.lsf".

Occasionally you may want to supply one or more directives directly on the bsub command line. Directives supplied in this way override the same directives if they are already included in your script. The syntax to supply directives on the command line is the same as within a script except that #BSUB is not used. For example:

bsub -W HH:MM < run.lsf

4. Simple Batch Script Example

The batch script below contains all of the required directives and common script components discussed above.

#!/bin/bash ##Specify your shell
## Required LSF Directives --------------------------------------
#BSUB -P Project_ID
#BSUB -q standard
#BSUB -n 320
#BSUB -x
#BSUB -W 12:00
## Optional LSF Directives --------------------------------------
## %J is the variable for the Job ID
#BSUB -o ./code.%J_hw.out
#BSUB -e ./code.%J_hw.err 

## Execution Block ---------------------------------------------
# Environment Setup
# cd to your scratch directory
cd ${WORKDIR}

# create a job-specific subdirectory based on JOBID and cd to it
JOBID=`echo ${LSB_JOBID}`
mkdir -p ${JOBID}
cd ${JOBID}

## Launching ----------------------------------------------------
# copy executable from $HOME and submit it
cp ${HOME}/mpicode.x .

# The following line provides an example of setting up and running
# an MPI parallel code built with the default compiler and default MPI.
mpiexec -n 320 ./mpicode.x > out.dat

## Clean up  -----------------------------------------------------
# Remove temporary files
rm *.o *.temp

5. Job Management Commands

The table below contains commands for managing your jobs in LSF.

Job Management Commands
Command	Description
bsub	Submit a job.
bjobs	Check the status of a job.
bqueues	Display the status of all LSF queues.
bkill	Delete a job.
bstop	Place a job on hold.
bresume	Release a job from hold.
bstatus	Display attributes of and resources allocated to running jobs.
bpeek	Lets you peek at the stdout and stderr of your running job.

6. Optional LSF Directives

In addition to the required directives mentioned above, LSF has many other directives, but most users will only use a few of them. Some of the more useful optional directives are listed below.

6.1. Job Identification Directives

Job identification directives allow you to identify characteristics of your jobs. These directives are voluntary, but strongly encouraged. The following table contains some useful job identification directives.

Job Identification Directives
Directive	Options	Description
-J	job_name	Name your job.

6.1.1. Job Name

The "-J" directive allows you to designate a name for your job. In addition to being easier to remember than a numeric job ID, the LSF environment variable, $LSB_JOBNAME, inherits this value and can be used instead of the job ID to create job-specific output directories. To use this directive, add a line in the following form to your batch script:

#BSUB -J job_20
Or to your bsub command
bsub -J job_20...

6.2. Job Environment Directives

Job environment directives allow you to control the environment in which your script will operate. The following table contains a few useful job environment directives.

Job Environment Directives
Directive	Options	Description
-I		Request an interactive batch shell.
-env	variable_list	Export environment variables to the job.
-m	training, inference, visualization	Choose your node type.

6.2.1. Interactive Batch Shell

The "-I" directive allows you to request an interactive batch shell. Within that shell, you can perform normal Linux commands, including launching parallel jobs. To use "-I", append it to your bsub request. The "s" directive requests a pseudo terminal shell. The interactive job will hang if this is not used. Use the designations /bin/sh, /bin/csh, /bin/tcsh, to assign your preferred shell. If no designation is provided, your default shell is used. You may also use the "-X" option to allow for X-Forwarding to run X-Windows-based Graphical interfaces on the compute node. The directive "j_exclusive=yes | no" specifies whether the allocated GPUs can be used by other jobs. When the mode is set to exclusive, the "j_exclusive=yes" option is set automatically. For example:

bsub -Is -X -n 160 -m training -P Project_ID -q debug -gpu "num=2:mode=shared:j_exclusive=yes" -W 01:00 -x

6.2.2. Export Environment Variables

The "-env" directive tells LSF to export environment variables from your login environment into your batch environment. You can also use '-env "all"' to export all environment variables from your login environment. Alternatively, you can use '-env "none"' if you don't want to export environment variables at all.

To use this directive, add a line in the following form to your batch script:

#BSUB –env "all" // to export all environment variables.
Or,
#BSUB –env "none" // to export no environment variables.

Or to your bsub command, as follows:

bsub –env "all" // to export all environment variables.
Or,
bsub –env "none" // to export no environment variables.

Caution, when using the "-env" directive on the bsub command line, "-env" must be the first directive on the bsub command line after "bsub".

The "-env" directive can also tell LSF to export specific environment variables from your login environment into your batch environment. To do so, add a line in the following form to your batch script:

#BSUB -env DISPLAY
Or to your bsub command, making sure the "-env" directive is at the beginning.
bsub -env DISPLAY

Using either of these methods, multiple comma-separated variables can be included. It is also possible to set values for variables exported in this way, as follows:

bsub -env my_variable=my_value, ...

6.2.3. Request a Node Type

The "-m" directive allows you to select the node type on which the job will run on. To use this directive add a line in the following form to your batch script:

#BSUB -m training
Or to your bsub command
bsub -m training

6.3. Reporting Directives

Reporting directives allow you to control what happens to standard output and standard error messages generated by your script. They also allow you to specify e-mail options to be executed at the beginning and end of your job.

6.3.1. Redirecting Stdout and Stderr

By default, messages written to stdout and stderr are captured for you in files named x.ojob_id and x.ejob_id, respectively, where x is either the name of the script or the name specified with the "-J" directive, and job_id is the ID of the job. If you want to change this behavior, the "-o" and "-e" directives allow you to redirect stdout and stderr messages to different named files. The "-eo" directive allows you to combine stdout and stderr into the same file.

Redirection Directives
Directive	Options	Description
-e	file_name	Redirect standard error to the named file.
-o	file_name	Redirect standard output to the named file.
-eo	file_name	Merge stderr and stdout into stderr.

6.3.2. Setting up E-mail Alerts

Many users want to be notified when their jobs begin and end. The "-B" directive makes this possible. If you use this directive, you must supply a valid email address or addresses separated by a comma. If the "-u" directive is used, SCOUT usernames must be provided.

E-mail Directives
Directive	Options	Description
-u	username	Send e-mail to a SCOUT user.
-B	e-mail_address(es)	Set the e-mail address(es) to be used.

For example:

#BSUB -u joesmith
#BSUB -B joesmith@mail.mil,joe.smith@us.army.mil

6.4. Job Dependency Directives

Job dependency directives allow you to specify dependencies that your job may have on other jobs. This allows users to control the order jobs run in. These directives will generally take the following form:

#BSUB -w "dependency_expression(jobid)"

where dependency_expression is a comma-delimited list of one or more dependencies, and each dependency is of the form:

type(jobids)

where type is one of the directives listed below, and jobids is a colon-delimited list of one or more job IDs that your job is dependent upon.

Job Dependency Directives
Directive	Description
done	Execute this job after listed jobs have begun.
ended	Execute this job after listed jobs have terminated without error.
exit	Execute this job after listed jobs have terminated for any reason.
started	Listed jobs may be run after this job begins execution.
post_done	Listed jobs may be run after this job terminates without error.
post_err	Listed jobs may be run after this job terminates with an error.
job_ID	Listed jobs may be run after this job terminates for any reason.

For example, run a job after completion (success or failure) of job ID 1234:

#BSUB -w "exit(1234)"

Or, run a job after successful completion of job ID 1234:

#BSUB -w "ended(1234)"

For more information about job dependencies, see the bsub man page.

7. Environment Variables

7.1. LSF Environment Variables

While there are many LSF environment variables, you only need to know a few important ones to get started using LSF. The table below lists the most important LSF environment variables and how you might generally use them.

Frequently Used LSF Environment Variables
LSF Variable	Description
$LSB_JOBID	Job identifier assigned to job or job array by the batch system.
$LS_EXECCWD	The absolute path of directory where bsub was executed.
$LSB_JOBNAME	The job name supplied by the user.

The following additional LSF variables may be useful to some users.

Other LSF Environment Variables
LSF Variable	Description
$LSB_JOBINDEX	Index number of subjob in job array.
$LSF_BATCH_JID	Indicates job type: LSF_BATCH_JID.
$LSB_INTERACTIVE	Indicates if user is in an interactive session.
$LSB_JOBFILENAME	Filename containing a list of vnodes assigned to the job.
$LSB_SUB_HOST	Host name on which the bsub command was executed.
$LS_SUBCWD	Value of PATH from submission environment.
$LSB_QUEUE	The name of the queue from which the job is executed.

7.2. Other Important Environment Variables

In addition to the LSF environment variables, the table below lists a few other variables which are not specifically associated with LSF. These variables are not generally required, but may be important depending on your job.

Other Important Environment Variables
Variable	Description
$OMP_NUM_THREADS	The number of OpenMP threads per node.
$MPI_DSM_DISTRIBUTE	Ensures that memory is assigned closest to the physical core where each MPI process is running.
$MPI_GROUP_MAX	Maximum number of groups within a communicator.

8. Example Scripts

All of the script examples shown below contain a "Cleanup" section which demonstrates how to automatically archive your data using the transfer queue and clean up your $WORKDIR after your job completes. Using this method helps to avoid data loss, and ensures that your allocation is not charged for idle cores while performing file transfer operations.

8.1. MPI Script

The following script is for a 320-core MPI job running for 20 hours in the standard queue.

#!/bin/ksh
## Required Directives ------------------------------------
#BSUB -n 320
#BSUB -gpu "num=2:mode=exclusive"
#BSUB -W 20:00
#BSUB -q standard
#BSUB -P Project_ID

## Optional Directives ------------------------------------
#BSUB -J testjob
#BSUB -oe
#BSUB -B my_email@mail.mil

## Execution Block ----------------------------------------
# Environmental Setup
. /usr/share/Modules/init/ksh

# cd to your scratch directory
cd ${WORKDIR}

# create a job-specific subdirectory based on JOBID and cd to it
JOBID=`echo ${LSB_JOBID}`
mkdir -p ${JOBID}
cd ${JOBID}

# copy data from $HOME
cp ${HOME}/my_data_dir/*.dat .

# copy the executable from $HOME
cp ${HOME}/my_prog.exe .

## Launching ----------------------------------------------
# for spectrummpi, your launch command is as follows:
module unload mpi/openmpi
module load mpi/spectrum/10.4
mpiexec ./my_prog.exe > my_prog.out

# for MPI, your launch command can also be:
mpiexec -n 320 ./my_prog.exe > my_prog.out

## Clean up ------------------------------------------------
# archive your results
# Using the "here document" syntax, create a job script
# for archiving your data.
cd ${WORKDIR}
rm -f archive_job
cat >archive_job <<END
#!/bin/bash
#BSUB -W 12:00
#BSUB -q transfer
#BSUB -P Project_ID
#BSUB -oe
#BSUB -S /bin/bash
cd ${WORKDIR}
	
# tar local directory $JOBID into ${JOBID}.tar and put the tar file 
# in remote directory ${ARCHIVE_HOME}/myprogdir:

archive put -C myprogdir -t ${JOBID}.tar $JOBID || echo "ERROR: Archive of ${JOBID}.tar to ${ARCHIVE_HOME}/myprogdir failed...."

archive ls myprogdir/${JOBID}
# Remove scratch directory from the file system.
cd ${WORKDIR}
rm -rf ${JOBID}
END

# Submit the archive job script.
bsub < archive_job

8.2. MPI Script (accessing more memory per process)

By default, an MPI job runs one process per core, with all processes sharing the available memory on the node. If you need more memory per process, then your job needs to run fewer MPI processes per node.

The following script requests 320 cores, but because "span" is set to 1, it uses only one core (and one MPI process) per node for an MPI job running for 20 hours in the standard queue.

#!/bin/ksh
## Required Directives ------------------------------------
#BSUB -n 1280
#BSUB -gpu "num=8:mode=exclusive"
#BSUB -R "span[ptile=1]"
#BSUB -W 20:00
#BSUB -q standard
#BSUB -P Project_ID

## Optional Directives ------------------------------------
#BSUB -J testjob
#BSUB -oe
#BSUB -B my_email@mail.mil

## Execution Block ----------------------------------------
# Environmental Setup

# cd to your scratch directory
cd ${WORKDIR}

# create a job-specific subdirectory based on JOBID and cd to it
JOBID=`echo ${LSB_JOBID}`
if [ ! -d ${JOBID} ]; then
  mkdir -p ${JOBID}
fi
cd ${JOBID}

# copy input from $HOME
cp ${HOME}/my_data_dir/*.dat .

# copy the executable from $HOME
cp ${HOME}/my_prog.exe .

## Launching ----------------------------------------------
# for spectrummpi, your launch command is as follows:
module unload mpi/openmpi
module load mpi/spectrum
mpiexec -n 8 ./my_prog.exe > my_prog.out

## Clean up ------------------------------------------------
# archive your results
# Using the "here document" syntax, create a job script
# for archiving your data.
cd ${WORKDIR}
rm -f archive_job
cat >archive_job <<END
#!/bin/bash
#BSUB -W 12:00
#BSUB -q transfer
#BSUB -P Project_ID
#BSUB -oe
cd ${WORKDIR}

# tar local directory $JOBID into ${JOBID}.tar and put the tar file 
# in remote directory ${ARCHIVE_HOME}/myprogdir:

archive put -C myprogdir -t ${JOBID}.tar $JOBID || echo "ERROR: Archive of ${JOBID}.tar to ${ARCHIVE_HOME}/myprogdir failed...."
archive ls myprogdir/${JOBID}
# Remove scratch directory from the file system.
cd ${WORKDIR}
rm -rf ${JOBID}
END

# Submit the archive job script.
bsub < archive_job

8.3. OpenMP Script

The following script is for an OpenMP job using one thread per core on a single node and running for 20 hours in the standard queue. Note the use of the $BC_CORES_PER_NODE environment variable.

#!/bin/ksh
## Required Directives ------------------------------------
#BSUB -n 160
#BSUB -gpu "num=1:mode=exclusive"
#BSUB -R "span[ptile=40]"
#BSUB -W 20:00
#BSUB -q standard
#BSUB -P Project_ID

## Optional Directives ------------------------------------
#BSUB -J testjob
#BSUB -oe
#BSUB -B my_email@mail.mil

## Execution Block ----------------------------------------
# Environmental Setup
# cd to your scratch directory
cd ${WORKDIR}

# create a job-specific subdirectory based on JOBID and cd to it
JOBID=`echo ${LSB_JOBID}`
if [ ! -d ${JOBID} ]; then
  mkdir -p ${JOBID}
fi
cd ${JOBID}

# copy data files from $HOME
cp ${HOME}/my_data_dir/*.dat .

# copy the executable from $HOME
cp ${HOME}/my_prog.exe .

## Launching ----------------------------------------------
export OMP_NUM_THREADS=${BC_CORES_PER_NODE}
./my_prog.exe > my_prog.out

## Cleanup ------------------------------------------------
# archive your results
# Using the "here document" syntax, create a job script
# for archiving your data.
cd ${WORKDIR}
rm -f archive_job
cat >archive_job <<END
#!/bin/bash
#BSUB -W 12:00
#BSUB -q transfer
#BSUB -n 1
#BSUB -P Project_ID
#BSUB -oe
#BSUB -S /bin/bash
cd ${WORKDIR}

# tar local directory $JOBID into ${JOBID}.tar and put the tar file 
# in remote directory ${ARCHIVE_HOME}/myprogdir:

archive put -C myprogdir -t ${JOBID}.tar $JOBID || echo "ERROR: Archive of ${JOBID}.tar to ${ARCHIVE_HOME}/myprogdir  failed...."
archive ls myprogdir /${JOBID}

# Remove scratch directory from the file system.
cd ${WORKDIR}
rm -rf ${JOBID}
END

# Submit the archive job script.
bsub < archive_job

8.4. Hybrid MPI/OpenMP Script

The following script uses 1,280 cores with one MPI task per node and one thread per core. Note the use of the $BC_CORES_PER_NODE environment variable.

#!/bin/ksh
## Required Directives ------------------------------------
#BSUB -n 1280
#BSUB -gpu "num=8:mode=exclusive"
#BSUB -R "span[ptile=1]" 
#BSUB -W 20:00
#BSUB -q standard
#BSUB -P Project_ID

## Optional Directives ------------------------------------
#BSUB -J testjob
#BSUB -oe
#BSUB -B my_email@mail.mil

## Execution Block ----------------------------------------
# Environmental Setup

# cd to your scratch directory
cd ${WORKDIR}

# create a job-specific subdirectory based on JOBID and cd to it
JOBID=`echo ${LSB_JOBID}`
if [ ! -d ${JOBID} ]; then
  mkdir -p ${JOBID}
fi
cd ${JOBID}

# copy input data from $HOME
cp ${HOME}/my_data_dir/*.dat .

# copy the executable from $HOME
cp ${HOME}/my_prog.exe .

## Launching ----------------------------------------------
export OMP_NUM_THREADS=${BC_CORES_PER_NODE}
# for spectrummpi, your launch command is as follows:
module unload mpi/openmpi
module load mpi/spectrum
mpiexec -n 8 ./my_prog.exe > my_prog.out

## Clean up ------------------------------------------------
# archive your results
# Using the "here document" syntax, create a job script
# for archiving your data.
cd ${WORKDIR}
rm -f archive_job
cat >archive_job <<END
#!/bin/bash
#BSUB -W 12:00
#BSUB -q transfer
#BSUB -P Project_ID
#BSUB -oe
#BSUB -S /bin/bash
cd ${WORKDIR}

# tar local directory $JOBID into ${JOBID}.tar and put the tar file 
# in remote directory ${ARCHIVE_HOME}/myprogdir:

archive put -C myprogdir -t ${JOBID}.tar $JOBID || echo "ERROR: Archive of ${JOBID}.tar to ${ARCHIVE_HOME}/myprogdir  failed...."
archive ls myprogdir/${JOBID}

# Remove scratch directory from the file system.
cd ${WORKDIR}
rm -rf ${JOBID}
END

# Submit the archive job script.
bsub < archive_job

8.5. Hybrid MPI/OpenMP Script (Alternative Example)

The following script uses 2,560 cores with two MPI tasks per node and one thread per core. Note the use of the $BC_CORES_PER_NODE environment variable.

#!/bin/ksh
## Required Directives ------------------------------------
#BSUB -gpu "num=16:mode=exclusive" 
#BSUB -n 2560
#BSUB -W 20:00:00
#BSUB -q standard
#BSUB -P Project_ID

## Optional Directives ------------------------------------
#BSUB -J testjob
#BSUB –j oe
#BSUB -B my_email@mail.mil

## Execution Block ----------------------------------------
# Environmental Setup
# the following environment variable is not required, but will
# optimally assign processes to cores and improve memory use.
export MPI_DSM_DISTRIBUTE=yes

# cd to your scratch directory
cd ${WORKDIR}
# create a job-specific subdirectory based on JOBID and cd to it
JOBID=`echo ${LSB_JOBID}`
if [ ! -d ${JOBID} ]; then
  mkdir -p ${JOBID}
fi
cd ${JOBID}

# copy input data from $HOME
cp ${HOME}/my_data_dir/*.dat .

# copy the executable from $HOME
cp ${HOME}/my_prog.exe .

## Launching ----------------------------------------------
export OMP_NUM_THREADS=${BC_CORES_PER_NODE}
# Your launch command is as follows:
module unload mpi/openmpi
module load mpi/spectrum
mpiexec -n 160 ./my_prog.exe > my_prog.out

## Clean up ------------------------------------------------
# archive your results
# Using the "here document" syntax, create a job script
# for archiving your data.
cd ${WORKDIR}
rm -f archive_job
cat >archive_job <<END
#!/bin/bash
#BSUB -W 12:00
#BSUB -q transfer
#BSUB -P Project_ID
#BSUB -oe
#BSUB -S /bin/bash
cd ${WORKDIR}

# tar local directory $JOBID into ${JOBID}.tar and put the tar file 
# in remote directory ${ARCHIVE_HOME}/myprogdir:

archive put -C myprogdir -t ${JOBID}.tar $JOBID || echo "ERROR: Archive of ${JOBID}.tar to ${ARCHIVE_HOME}/myprogdir  failed...."
archive ls myprogdir/${JOBID}

# Remove scratch directory from the file system.
cd ${WORKDIR}
rm -rf ${JOBID}
END

# Submit the archive job script.
bsub < archive_job

9. Additional Resources

For additional information on the LSF queueing system, consult the IBM online guide:
https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_welcome/lsf_welcome.html.

Guide to the LSF Queuing Systemon SCOUT

Table of Contents