New cluster, 'Owl,' provides CPU power with mega memory
(From left) Miles Gentry, lead systems engineer; Jessie Bowman, systems engineer; and Jeremy Johnson, IT operations manager for Advanced Research Computing all worked on the installation of the new Owl cluster. Photo by Angela Correa for Virginia Tech.

In FY 2024, Advanced Research Computing(ARC) bolstered its high-performance computing (HPC) offerings to the Virginia Tech research community with the addition of a new CPU cluster, named Owl. Owl boasts 84 nodes with 768 gigabytes of memory each and a combined processor core count of 8,064. The system is augmented with two large-memory nodes with 4 terabytes of memory and one huge-memory node with 8 terabytes of memory.
Compared to ARC’s other large CPU (central processing unit) cluster, TinkerCliffs, which has 2 gigabytes of memory per core, Owl has 8 gigabytes per core. This allows researchers using Owl to:
Conduct more types of calculations simultaneously
Increase the amount of complexity in data simulations for more detailed results
Run jobs quickly and make any needed adjustments sooner in the research process
Return research results more quickly
Owl is unique in ARC’s HPC offerings in that the system utilizes direct-to-node water-cooling and is the first of its kind at Virginia Tech. Copper ducts carry liquid coolant to the components in each system that generate the most heat, providing cooling via conduction. This highly efficient cooling system allows the system to run at top speed around the clock, virtually eliminating thermal throttling that can negatively impact the performance of traditional air-cooled HPC clusters.
The direct-to-node cooling also benefits the data center in Steger Hall where the system is housed. The power usage effectiveness (PUE) of a data center utilizing direct-to-node cooling is significant. PUE measures the amount of power a data center uses and is expressed by a ratio of the total energy required to run the facility by the energy used for computing. The lower the PUE, the more energy efficient an HPC cluster is operating.
An air-cooled data center typically has a PUE of 1.5 to 2.0, while a rear door heat exchanger cooling system, such as that utilized by ARC’s flagship cluster TinkerCliffs, can reach efficiencies of 1.2-1.4. By eliminating the power required for cooling fans, direct-to-node cooling can provide a PUE of 1.1, with the added benefit of allowing the processors to run at maximum speed with no thermal throttling.
With the addition of Owl, ARC can now offer researchers access to two powerful CPU clusters, each with its own advantages. With Owl able to take on jobs that benefit from its ultra-efficient memory speed and capacity, this will free up space on TinkerCliffs, enabling more researchers access to the high-performance computing resources they need, when they need them.