Skip Nav U.S. Army Research Laboratory DoD Supercomputing Resource Center
Sitemap Contact Us Quick Links

Clarke on Coding

What's all this Visualization Cluster Stuff Anyhow?

By Jerry A. Clarke, Computer Scientist, SciVis Team Lead, ARL MSRC

Have you ever "really" watched Shaquille O'Neal down in the post? Everyone in the arena knows he's going to get the ball, particularly the poor player assigned to defend him. Yet time and time again he posts his man up, receives the feed into the post, and throws it down. It's not that the defense did anything wrong, it's just sometimes more is just better! Shaq generates a ton of offense, we work for the Department of Defense, which naturally leads us to Scientific Visualization of enormous datasets.

More is Better!

Today's High Performance Computing clusters regularly generate gigabytes to terabytes of output from a normal production run. If we use the shock physics code CTH as an example, production runs typically utilize grids in the tens to hundreds of millions of cells. If data from a hundred or so iterations is saved and each cell has five to six values, well pretty soon we're talking about some serious data. Try to visualize this amount of data utilizing a serial machine and you're in for a bit of a wait.

The ARL scientific visualization team learned this first hand. A recent CTH simulation ran on the Stryker cluster, using 2048 processors, or 9 teraflops of compute power. To run the simulation to 1200ms, required five days, the equivalent of over 28 cpu-years. The resulting isosurfaces contained over 50 million polygons per iteration. Trying to render these interactively on our old workhorse SGI Onyx 4 IR system was just asking too much of our friend who has served us tirelessly for many years.

Linunx Networx LS-V
Linux Networx LS-V Ultimate Visualization Cluster

Clearly, we needed something more. Utilizing a cluster of 10 Opteron workstations with NVidia FX3000 graphics cards on a Gigabit-Ethernet switch, we used the open source visualization tool ParaView to interactively visualize the dataset (and also make a great demo). The key to making this work was parallel rendering. The polygons are distributed among the processors in geometrically close piles using a parallel Kd tree algorithm. Each processor renders it's pile of polygons to the graphics card, reads back the image, compresses the data and sends it back to the desktop display where the results are composited to provide a final image. In this scalable fashion, enormous datasets are able to be visualized interactively.

Infiniband is Better too

So if 10 nodes on a Gig-E switch is good, 64 nodes on an Infiniband interconnect would be better, right? Of course right! ARL is installing a 64 node, dual, dual-core system this summer as part of the High Performance Computing Modernization Program (HPCMP) Technology Insertion for 2006 (TI-06). In addition to the four cores, each node will also have a pair of NVidia FX4500 graphics cards with an SLI bridge. The visualization cluster is being built by Linux Networx and their partners with oodles of fast disk storage in seven handsome 19" cabinets.

In addition to the low latency Infiniband network, the system will greatly benefit from the availability of PCI Express on the motherboards. PCI Express allows the images to be read back from the FX4500 graphics cards several times faster than is possible with cards utilizing the AGP slot. That means that the entire compositing process during parallel rendering gets a whole lot faster.

How it Works

In practice, a user will reserve several nodes of the cluster for visualization. Typically running either ParaView or EnSight DR (Distributed Rendering), the user will interact with enormous datasets via the desktop while all of the heavy lifting is being done by the cluster behind the scenes. Since the individual un-compositied images are of little interest, there will not be any monitors attached to the output of the graphics cards

This scalable approach to visualization will allow us to interactively explore datasets that are beyond our current capability. While batch visualization techniques can be utilized to some extent, using them is somewhat like turing one of those old dial radios with the volume tuned down. No, to really understand what is happening in these high fidelity physics based simulations, you need parallel interactive visualization; you need the Shaq of visualization systems.