Book Chapter Proposal : Monitoring HPC Systems Against Compromised SSH

Objective

To describe protections and monitoring against compromised SSH keys on HPC systems.

Scope

To describe the development and application the SSH cryptographic protocol and its use HPC systems. To illustrate a prominent case example where compromised SSH credentials affected several major HPC centres in Europe. To illustrate tools and processes developed and used at the University of Melbourne to protect against SSH compromises. To suggest an "all-of-campus" common security system as a future research project.

Structure

Secure Shell is a very well established cryptographic network protocol for accessing operating network services and is the typical way to access high-performance computing (HPC) systems in preference to various unsecured remote shell protocols, such as rlogin, telnet, and ftp. As with any security protocol it has undergone several changes to improve the strength of the program, most notably the improvement to SSH-2 which incorporated Diffie-Hellman key exchange. The security advantages of SSH are sufficient that there are strong arguments that computing users should use SSH "everywhere". Such a proposition is no mere fancy; as an adaptable network protocol SSH can be used not just for remote logins and operations, but also for secure mounting of remote file systems, file transfers, port forwarding, network tunnelling, web-broswing through encrypted proxies.

Despite the justified popularity and engineering excellence of SSH, in May 2020 multiple HPC centres across Europe found themselves subject to cyber-attacks via compromised SSH credentials. This included, among others, major centres such as the University of Edinburgh's ARCHER supercomputer, the High-Performance Computing Center Stuttgart's Hawk supercomputer, the Julich Research Center JURECA, JUDAC, and JUWELS supercomputers, and the Swiss Center of Scientific Computations (CSCS), with attacks being launched from compromised networks from the University of Krakow, Poland, Shanghai Jiaotong University, PR China, and China Science and Technology Network, PR China. It is speculated that the attacks were making use of GPU infrastructure for cryptographic coin mining, certainly the most obviously vector for financial gain.

The phrase "compromised SSH credentials" does not imply a weakness in SSH as such, but rather practises around SSH-key use. As explicitly stated by system engineers, some researchers had been been using private SSH keys without passcodes and leaving them in their home directories. These would be used by users to login from one HPC system to another, as it is not unusual for researchers to to have accounts on multiple systems. It is noted that users engaging in such an approach are either unaware or ignored the principles of keeping a private key private, encrypting private keys, or making use of an SSH agent. Access to the keys could be achieved through inappropriate POSIX permissions, or more usual methods of access (e.g., ignoring policies of sharing accounts), with follow-up escalations. Passphraseless SSH keys are common as they are the default when creating a new key with `ssh-keygen` and are are convenient to use, without needing to set up an ssh agent. Passphraseless SSH is also offered by default as part of many cloud offerings, as a relatively secure way to provide a new user with access to their virtual machines.

Based at experiences at the University of Melbourne HPC, it is possible at a system level using `ssh-keygen` to script a search to detect all keys with an empty password even when they are named differently with additional complexity required when parsing non-standard directories and configuration files. This is far more elegant than conducting a `grep` for 'MII' and similar techniques which is commonly suggested. A further alternative is a test making direct use of `libssh` headers. This however, will require a version of `libssh` which incorporates the new SSH format, which is atypical for HPC systems which tend to have a degree of stability in the operating system level, even if they make use of diverse versions and compilers on the application level. Of course, invoking a different version of `libssh` (e.g., through an environment modules approach) provides an alternative solution which can be incorporated into a small C program (`key_audit.c`), which elegantly tests validation of an empty passphrase against a given keyfile.

For monitoring such programs are extremely efficient; a test of more than 3000 user accounts takes less than 1.5 seconds on a contemporary system. Following this the use of `inotifywait` can be applied so that any new insecure keys would be detected immediately instead of waiting for a cron task to initiate. The system can be further strengthened by using SSH key-only logins, rather than allowing for password authentication, or restricting password authentication to VPN logins only with `sshd_config` and two-factor authentication. Prevention of shared private keys is achieved by checking for duplications in the `authorized_keys` file. Further with `authorized_keys` managed through a repository with version control (e.g., GitHub, Gitlab), another layer of protection would exist to prevent multiple users to log in with the same key. Each key would a separate file named after its own checksum and use an `AuthorizedKeysCommand` directive.

Further research in this area would involve developing a university-wide API offering public keys for arbitrary ssh logins for various systems on the campus. Keys can be stored for user accounts, which are used for git clone actions, pull requests etc. Access to systems could also be implemented via a zero trust security framework (e.g., BeyondCorp), which would protect systems from both intruders who are already within a network perimeter, but also provide secure access to users who are outside it.

Biographies

Lev Lafayette is a senior HPC DevOps Engineer at the University of Melbourne, which he has been since 2015. Prior to that he worked in a similar role at the Victorian Partnership for Advanced Computing (VPAC) for eight years. He an experienced HPC educator and has a significant number of relevant publications in this field

Narendra Chinnam is a Senior HPC DevOps Engineer at the University of Melbourne. Prior to that, he worked as a Systems Software Engineer in the HPC/AI division at Hewlett-Packard Enterprise (HPE) for thirteen years. He has made significant contributions to the HPE HPC cluster management software portfolio and was a member of several top500 cluster deployment projects, including "Eka" - the 4th fastest supercomputer in the world as of Nov-2007.

Timothy Rice is a DevOps/HPC Engineer at the University of Melbourne. In past lives, he was a researcher and tutor in applied mathematics, an operations analyst in the Department of Defence, and a Software Carpentry instructor.

Publication

A proposal for a book chapter is needed from prospective authors before the proposal *submission due date*, describing the objective, scope, and structure of the proposed chapter (no more than 5 pages). With the chapter proposal, please also submit a brief biography of each author. Acceptance of chapter proposals will be communicated to lead chapter authors after a formal double-blind review process to ensure relevance, quality, and originality. The submission of chapter proposals should be sent directly via email to Nitin Sukhija (email: nitin.sukhija@sru.edu) and Kuan-Ching Li (kuancli@gm.pu.edu.tw).

https://sites.google.com/view/cs-hpc-call/home