Lawrence Livermore National Laboratory's GPU-powered Edge cluster.
After last week’s cloud- and GPU-heavy Supercomputing conference, it’s fair to ask whether high-performance computing will ever look the same. The cloud provides on-demand (and sometimes free) resources likes of which organizations requiring HPC resources have never had access to before without buying their own clusters. GPUs are everywhere and proving adept at seriously boosting performance for certain types of code. For this reason alone, it seems likely that HPC architectures of the future might be a lot more virtual and a lot less CPU-centric.
Why Cloud, Why Now
Think about it: Even without AWS Cluster Compute and GPU Instances, certain scientists were starting to use the cloud fairly heavily for certain tasks. Pharmaceutical company Eli Lilly last week shared some of its experiences. It talked about having an ideal use case — modeling and simulation for drug discovery — that makes the on-demand cloud cost model “amazing.” During a recent call with HPC software vendor Platform Computing, my suspicion was confirmed that interest in Amazon EC2 for HPC applications has increased markedly since the advent of Cluster Compute Instances in July.
The reason for this is that Cluster Compute Instances not only consist of multiple cores and relatively high memory, but also sit atop Intel Nehalem processors and a high-throughput, low-latency 10 GbE network. In the latest Top500 supercomputer list, AWS’s Cluster Compute Instances infrastructure ranked No. 231. GPU Instances utilize the same infrastructure, but add two Nvidia Tesla M2050 GPUs into the mix. Performance was high already, and the addition of GPUs just ups the octane level. According to a benchmark test by HPC cloud-resource middleman Cycle Computing, GPU Instances outperform in-house GPU clusters in certain cases.
Yes, security concerns persist for certain applications, and there’s still the tricky issue of calculating bandwidth costs for applications that send lots of data back and forth between Amazon’s infrastructure and on-premise resources, but the barriers are falling fast. For HPC users willing to trade performance for security, OpSource just made available its eight-core, 64GB instances. AWS provides an identical size AMI as part of its standard EC2 offering, but OpSource provides advanced VPN and virtual private cloud options.
In addition, Microsoft is now letting scientists use Windows Azure for free to run genomic queries using the National Center for Biotechnology Information BLAST tool. Not that such queries depend on advanced networking, but the fact that both the very large database and compute resources are housed in Windows Azure helps mitigate any potential performance issues related to accessing large datasets via the public Internet. It’s free and it’s not their data — where’s the risk?
Why GPUs, Why Now
Even if HPC users don’t embrace cloud computing as heavily as I suspect, there can be little doubt they’ll embrace new GPU-powered architectures. In a field obsessed with speed, GPUs can seriously accelerate performance for massively parallel, multi-threaded workloads. GPUs provided the brunt of the processing power for three of the top four systems on the Top500, and 11 in total. Dell, IBM and HP (as well as many HPC-focused vendors) are all rolling out GPU-based servers and systems. GPUs won’t replace CPUs any time soon (or ever), but even skeptical programmers should come around to writing hybrid code thanks to pressure from both vendors and their bosses to increase performance in a hurry.
I’m not an HPC analyst, but I have some idea how that industry functions. Performance sells, even if it comes at a price. In part, this is because high performance hasn’t previously been available without fairly significant cost. Cloud computing and GPUs change this. Now, HPC users can fully embrace the cloud value proposition of spending a lot less on IT and a lot more on doing business. I think the economics will be too good to pass up.