Another Year in Supercomputing

Since late in 2007 I have been involved in the field of high performance computing. Initially, this was at the Victorian Partnership for Advanced Computing, but just before that organisation closed its doors in December 2015 I accepted a similar role at the University of Melbourne. The end of the year provides a reason for reflection, an annual report if one likes, and whilst activities not related to my vocation and profession will be dealt with in a subsequent entry, the opportunity is taken here to review workplace activities and in particular, changes in the environment for the University's general HPC system, Spartan. Spartan now has 6159 accounts across 2109 projects in diverse disciplines in the life sciences, engineering, economics, mathematics, and more and has been cited in 62 papers in the past year.

Some of those papers led to presentations to the Research Computing Services (RCS) team through the Cultural Working Group (CWG), which I have chaired for the past two years and held responsibility for organising these talks. In total six presentations were held this year, with a personal favourite on the use of AI algorithms, a supercomputer (Spartan), and robotics to sort plastic waste from two researchers at the Department of Infrastructure Engineering. The CWG was formed in 2020 following recognition from a staff survey that not all was well in RCS in terms of staff awareness of the group's objective, work between the different groups within the RCS, transparency in decision-making, involvement, and influence in decisions, career-progression opportunities, and job security. The staff-led CWG (with one management representative) made a concerted effort across those targetted areas and, following a survey in the middle of this year, substantial improvements were found in every criterion. At the end of this year, just after the last tech/researcher presentation, it brought great pleasure to say that whilst operations would continue, the group had succeeded in achieving its objectives and could close down as a formal body and as a successful project.

A very large part of my role at the University consists of training various postgraduate and postdoctoral researchers on how to use the system. This year included some 24 days of workshops involving close to 500 participants, roughly on par with other years and deliberately pulling back a bit from the first year of COVID, when over 40 of such workshops were conducted. Of particular note was early in the year a review was conducted of usage from those who had received training the previous year, resulting in the very surprising metric that at least 54.14% of cluster utilisation in 2022 was conducted by users after they had received training. I have always emphasized how important HPC training is but it was astounding to see such a metric as proof. As another form of training this year I continued with my regular activity as a guest lecturer and tutor for the master's level course Cluster and Cloud Computing. My role in this, previously just a single lecture, has now been extended to six lectures and workshops and is likely to expand in 2024. I must also mention here a presentation on RCS services to the Quantitative and Applied Ecology Research Group, with a future paper in development from that body on software citations.

Another major part of my role is scientific software optimisation and installation. Apart from the usual work in this field this year had the bonus of Spartan receiving its first major operating system upgrade since it was first turned on in 2015. Changing the underlying major release of the operating system (and indeed, jumping from RHEL v7 to v9) required existing software to be recompiled. In a one-month period, working with demonic fury, I was primarily responsible for around 500 software builds and an expansion in job submission examples. At the same time, Spartan also finally had the opportunity to run the LINPACK tests to be recognised as one of the world's supercomputers. It was an award that was long overdue (we've had sufficient performance to be on that list for years) and even then the certificate was for only part of the entire system.

Other activities included establishing the Spartan HPC Champions group among power-users of the system who can provide training advice to other members of their research teams, and continued involvement as a Board member of the international HPC Certification Forum and as an irregular contributor to the EasyBuild code repository. I have no doubt that these and other activities will all continue in 2024, however, there will be an additional role as well, following a necessary and considered restructure of RCS, I have found myself as the recipient of a small promotion in role and responsibility. It will be a position I will take with the appropriate seriousness; after all, supercomputing is one of those activities that has made a massive change to improving the world and will continue to do so. For the technical staff, it can be challenging and rewarding as they provide the researchers the tools to make great discoveries and inventions. But those staff also need to be in an environment where they feel secure and can flourish - and that means listening to their technical advice, as they actually do know best for such matters. This will be certainly the most significant challenge in the coming year.