Skip Nav U.S. Army Research Laboratory DoD Supercomputing Resource Center
Sitemap Contact Us Quick Links

Cover Story

Clusters 101


By Christopher Slaughter,
Principal Engineer,
ARL MSRC/Raytheon
Cluster hardware components
  Hardware components associated with cluster computing that work
behind the scenes to deliver HPC performance at an economical price.

The availability of high-speed networks and increasingly powerful commodity microprocessors is making the usage of clusters, or networks, of computers an appealing vehicle for cost effective parallel computing. Clusters, built using commodity-of-the-shelf (COTS) hardware components as well as open source software, are playing a major role in redefining the concept of high performance computing.

Since 2000, the number of clusters in the Top 500 list has grown from 28 (5.6%) to 294 (58.8%), and the trend is likely to continue (http://www.top500.org). For the Technology Insertion 2004 (TI-04), the ARL MSRC installed two new clusters: a 2304 processor, IBM Cluster 1350 using AMD Opteron processors, and a 2048 processor Linux Networx Evolocity II using Intel Xeon EM64T processors. Other institutions have installed even larger clusters, including the number one system on the Top 500 list, the BlueGene/L, used by the Department of Energy, and the number 2 system, Columbia, at the NASA Ames Research Center. The table on page 5 boasts the top 15 fastest machines in the world, one of the newest ARL MSRC machines ranks number 13 on this list.

In 1993, Donald Becker and Thomas Sterling developed the first cluster prototype while working at the Center of Excellence in Space Data and Information Sciences (CESDIS) located within the NASA Goddard Space Flight Center. Known as Beowulf, the first cluster consisted of 16 Intel x486/DX4 nodes. This cluster was so inexpensive to build and performed so well, for some applications, that it became an instant success, and demand for more clusters spread throughout NASA and into the academic and research communities. Clusters were first introduced into the High Performance Computing Modernization Program (HPCMP) in 2000, when test systems were installed at ARL and ASC, and a 512 processor cluster was installed and made operational at the Maui High Performance Computing Center. In 2003, ARL installed a 256 processor production cluster, the first at an MSRC.

There are several types of clusters, including High-Availability (HA) and load-balancing, as well as parallel processing, the type that is typically used in the HPC domain. HA clusters are used to provide non-interrupted services, such as web or email. If one server fails, the other is able to pick up the services of the first, so clients will not see an interruption.

Load balancing clusters are used to share a load across multiple servers. A service like Google (http://www.google.com) will typically use load-balanced web servers to share the load across multiple machines (in the hundreds) so no one server is ever overloaded and unable to fulfill a request. In this configuration, the servers run a limited set of services.

Parallel processing clusters use many servers, or nodes, to allow applications to be scaled up from a single processor to thousands of processors, and are usually general purpose, supporting many types of applications.

Traditional supercomputers are Single System Image (SSI) machines. All processors in the system are managed by a single copy of an Operating System, as well as the entire memory space. In a parallel processing cluster configuration, every node runs its own copy of the Operating System, and the memory in each node is not available to other nodes.

A typical HPC cluster consists of different types of nodes: login, compute and management nodes.

Login nodes, also called user nodes, are the access points for the cluster. Interactive work is done on the login nodes, including compiling and debugging software.

Compute nodes are dedicated to running applications that are usually scheduled by a batch scheduler. For the TI-04 systems at the ARL MSRC, the Load Sharing Facility (LSF) scheduler from Platform Computing (http://www.platform.com) will be used. LSF manages the resources of the cluster so the applications have access to the full resources of the nodes that they are scheduled on.

Management nodes are responsible for monitoring the health of the cluster, as well as providing facilities for reloading nodes with a common system image.

A clustered filesystem is utilized to provide common storage across the cluster. These filesystems are architected so that they may scale with the size of the cluster, and run on a set of storage nodes. These storage nodes provide the Fibre Channel connection to the disk arrays, and share the filesystem across all nodes, so there is no single point of failure, and the network traffic is not limited to only a few nodes. For the IBM 1350 cluster, the General Parallel File System (GPFS) from IBM (http://www1.ibm.com/servers/eserver/clusters/software/gpfs.html) is used. For the Linux Network cluster, the Lustre file system from Cluster File Systems (http://www.clusterfs.com) is used.

One of the most critical components of an HPC cluster is the interconnect, which is the method that the nodes use to communicate with each other. While Ethernet can be used, the best performance for a wide range of applications is provided by a high-speed dedicated interconnect. The IBM and Linux Networx clusters both use the Myrinet interconnect from Myricom (http://www.myricom.com). Myrinet is a low-latency interconnect that can be scaled to tens of thousands of nodes. By using a special version the Message Passing Interface (MPI) library, all communication between MPI programs will use the high-speed Myrinet interface. Myrinet has a bandwidth of 2 Gbps, or 2x gigE and a latency of 6 microseconds for MPI tasks, approximately 10x lower than gigE.

Compute clusters are here to stay for the foreseeable future. Compared to proprietary RISC and vector processors, commodity processors have closed a significant performance gap in recent years. This trend appears to be holding, even as commodity processors run out of headroom to increase performance through gains in clock speed. Dual and multicore commodity processors are on the horizon and will enable cluster systems to continue to offer leading levels of performance per dollar. DARPA’s High Productivity Computing Systems (HPCS) is on track to produce systems available at the end of the decade that will lead to 10x gains in productivity at the end of the decade. These systems, if successful, will change the HPC market by offering a value above commodity clusters, Until then, it is difficult to see how purely proprietary systems will compete well on a price performance level with commodity based clusters.