Spartan: From Experimental Hybrid towards a Petascale Future
Previous presentations to eResearch Australiasia described the implementation of Spartan, the University of Melbourne’s general- purpose HPC system. Initially, this system was small but innovative, arguably even experimental. Features included making extensive use of cloud infrastructure for compute nodes, OpenStack for deployment, Ceph for the file system, ROCE for network, Slurm as the workload manager, EasyBuild and LMod, etc.
Based on consideration of job workload and basic principles of sunk, prospective, and opportunity costs, this combination maximised throughput on a low budget, and attracted international attention as a result. Flexibility in design also allowed the introduction of a large LIEF-supported GPGPU partition, the inclusion of older systems from Melbourne Bioinformatics, and departmental contributions. Early design decisions meant that Spartan has been able to provide performance and flexibility, and as a result continues to show high utilisation and job completion (close to 20 million), with overall metrics well what would be a “top 500” system. The inclusion of an extensive training programme based on androgogical principles has also helped significantly.
Very recently Spartan has undergone some significant architecture modifications, which this report will be of interest to other institutions. The adoption of Spectrum Scale file system has further improved scalability, performance, and reliability, along with adapting a pure HPC environment with a significant increase in core count designed for workload changes and especially queue times. Overall, these new developments in Spartan are designed to be integrated to the University’s Petascale Campus Initiative (PCI).
Presentation to eResearchAustralasia 2020