Blogs

Multicore World 2016 : A Summary

Multicore World is a small annual international conference held in New Zealand/Aotearoa sponsored by OpenParallel. I have been fortunate enough to act an MC for all but one of the five conferences since its inception, this year also presenting a short paper on the introduction of the new HPC/Cloud hybrid at the University of Melbourne.

Foreword to Sequential and Parallel Programming with C and Fortran by Dr. John L. Gustafson

It is finally time for a book like this one.

When parallel programming was just getting off the ground in the late 1960s, it started as a battle between starry-eyed academics who envisioned how fast and wonderful it could be, and cynical hard-nosed executives of computer companies who joked that “parallel computing is the wave of the future, and always will be.”

Reviving a Downed Compute Node in TORQUE/MOAB

The following describes a procedure for bringing up a compute node in TORQUE that's marked as 'Down'. Whilst the procedure, once known, is relatively simple, investigation to come to this stage required some research and to save others time this document may help.

1. Determine whether the node is really down.

Following an almighty NFS outage quite a number of compute nodes were marked as "down". However the two standard tools, `mdiag -n | grep "Down"` and `pbsnodes -ln` gave significantly different results.

NFS Cluster Woes

A far too venerable cluster (Scientific Linux release 6.2, 2.6.32 kernel, Opteron 6212 processors) with more than 800 user accounts makes use of NFS-v4 to access storage directories. It is a typical architecture, with a management and login node with a number of compute nodes. The directory /usr/local is on the management node and mounted across to the login and compute nodes. User and project directories are distributed two storage arrays appropriately named storage1 and storage2.


[root@edward-m ~]# cat /etc/fstab

Can processes survive after shutdown?

I had a process in a "uninterruptible sleep" state. Trying to kill it is, unsurprisingly, unhelpful. All the literature on the subject will say that it cannot be killed, and they're right. It's called "uninterruptible" for a reason. An uninterruptable process is in a system call that cannot be interrupted by a signal (such as a SIGKILL, SIGTERM etc).

Deleting "Stuck" Compute Jobs

Often on a cluster a user launches a compute job only to discover that they have some need to delete it (e.g., the data file is corrupt, there was an error in their application commands or PBS script). In TORQUE/PBSPro/OpenPBS etc this can be carried out by the standard PBS command, qdel.


[compute-login ~] qdel job_id

Sometimes however that simply doesn't work. An error message like the following is typical: "qdel: Server could not connect to MOM". I think I've seen this around a hundred times in the past few years.

Enduring Problems with HTML Email and Proprietary Attachments

Once upon a time, in a generation past, letters would be received with written text. There was a default form (paper with ink or pencil) and an encoding (in the language of the correspondents). Whilst this may all seem very trivial, it does have a particular importance for the subject at hand in the context of contemporary electronic mail. Can the recipient of your message actually read what you've sent them? Could imagine a situation where people knowingly sent written correspondence in a format that recipient couldn't read? Have you ever received an email attachment that you couldn't open?

Cluster Installations of GF2X,NTL, and HElib

The installation of three associated packages on a Linux cluster for fast arithmetic, a number theory library, and homomorphic encryption provides some interesting challenges.

GF2X

GF2X "is a C/C++ software package containing routines for fast arithmetic in GF(2)[x] (multiplication, squaring, GCD) and searching for irreducible/primitive trinomials".

Download and extract into a sensible place, and change to that directory.


mkdir /usr/local/src/GF2X
cd /usr/local/src/GF2X

Foreward to "Supercomputing with Linux" by Emeritus Professor David Beanland, AO, FTSE, FIEAust

Our era has been defined by the ever-increasing scale and performance of information technology and its impact on many facets of society. Information technology has been made possible by the rapid, and continuing, development of semiconductor technology which enables high speed electronic processing and storage of data. These advances have continued unabated over more than six decades, enabling the realisation of computers with increasing speeds, sophistication and capability to facilitate the solution of complex problems of larger scale, more rapidly and with increased detail.

Pages