Exploring Issues in Event-Based HPC Cloudbursting

The use of cloud compute, especially in proportion to single-node tasks, provides a more effective allocation of financial resources. The introduction of cloud-bursting to scheduling systems could ideally provide on-demand compute resources for High Performance Computing (HPC) systems, where queue wait-times are a source of user consternation.

Using experiential examples in Slurm's Cloudbursting capability (an extension of the scheduler's power management features), initial successes and bug discoveries highlight the problems of replication and latency that limit the scope of cloudbursting. Nevertheless, under such circumstances wrapper scripts for particular subsets of jobs are still considered viable; an example of this approach is indicated by MOAB/NODUS.

A presentation to HPC:AI 2018 Perth Conference