The Cloud : An Inferior Implementation of HPC

The use of cloud computing as an alternative implementation for high performance computing (HPC) initially seems to be appealing, especially to IT managers and to users who may find the jump from their desktop application to the command line interface challenging. However a careful and nuanced review of metrics should lead to a reconsideration of these assumptions.

The first, and popular assumption among managers, is that the cloud switch from uncertain CAPEX to more certain OPEX. "You only pay for what you use", is the common refrain by its promoters. But of course, this is not really the case, but rather an illusion fostered by a technology that appears to add machines "on demand".

The reality is, however, that these are virtual machines, and the underlying technology is a real CAPEX. Rest assured, under any business logic, that CAPEX is amortized into the customer's OPEX. That is why, per CPU hour, or per gigabyte of memory, or storage, cloud technologies are and must be more expensive than the same technologies provided in physical machines. The provider must recoup their CAPEX (and hopefully make sufficient profit for reinvestment etc), and you're paying for it. Sorry cloud providers, this is what happens when an HPC sysadmin with an MBA turns up and reveals your smoke and mirrors trick. The costing advantage of clooud providers is their economies of scale that they can achieve - and quite significant one's at that. But it's a sad state of affairs when an IT manager acknowledges that their estimates of hardware requirements are inferior to that of cloud providers.

The second popular assumption is the cloud automatically is the most effective way to move researchers from desktop machines to HPC. The unfounded belief, again especially popularised by IT managers who actually aren't in the business of scientific research, is that learning the command line interface and PBS job submission is too steep a learning curve. This critically underestimates the intelligence of post-graduate (and often post-doctoral) researchers.

Several years of delivering HPC training courses has convinced me, beyond any reasonable doubt, that such learners are indeed capable of moving from no experience with the command line to MPI programming within three days. This is not because I am an especially good educator, but rather it is achieved with well-established andragogical techniques. Yes, advanced adult education is a specialist discipline, but so is the software engineering required for building and maintaining cloud technologies.. and who is paying for that? (Go on, think again about how costs are passed on to the final consumer).

Another issue which is surprisingly overlooked is how cloud technologies must (from first principles) and do provide inferior metrics to physical HPC machines. Again, per CPU hour, or per gigabyte of memory, or storage, these are virtual machines with the added issue of latency reducing their effectiveness. Cloud technologies, even when using a form of parallelisation which they are well suited for (OpenMP tasks) still result in a 50% performance decline compared to real systems. In the world of increasingly large datasets, such a performance hit in research processing is assuredly a death-knell for any research institution.

This is not to say there is not some uses for cloud technologies in an HPC environment. If money is no object, then the cloud can be provide a "quick and (relatively) easy" virtual cluster deployment or extension if you have a dataset that requires processing, and requires it right now. Likewise if time is of the essence or the opportunity for advanced adult education is simply not available, then the cloud does provide a temporary solution of "dumbing down the interface" rather than "skilling up the user". It should be obvious which is more beneficial for research in the longer run.

But overall, the summary is expressed by the title: The Cloud is an inferior implementation of HPC.