lev_lafayette's blog

GnuCOBOL: A Gnu Life for an Old Workhorse

Submitted by lev_lafayette on Sat, 07/16/2016 - 12:48

COBOL is a business-orientated programming language that has been in use since 1959, making it one of the world's oldest programming languages.

Despite being much criticised (and for good reasons) it is still a major programming language in the financial sector, although there are a declining number of experienced programmers.

Spartan: A New Architecture for Research Computing

Submitted by lev_lafayette on Fri, 07/01/2016 - 13:15

Edward and Spartan Bare Metal Thursday July 30th, at the Gryphon Gallery at the University of Melbourne, was the official launch of the 'Spartan' high-performance computing and cloud hybrid. Speakers at the launch included Dr Stephen Giugni, Director, Research Platform Services., Prof Margaret Sheil, Acting Vice Chancellor of the University of Melbourne., Professor Richard Sinnott, Director, eResearch and Professor of Applied Computing Systems., Mr Bernard Meade, Head of Research Compute Services, Research Platform Services, and yours truly, in my role as HPC Support Engineer, Research Platform Services.

As I argued in my presentation, the great advantage of Spartan is that it is designed around what users need. Based on research from the previous general compute resource, Edward, most people wanted to submit lots of jobs with a relatively small core count and memory footprint with data parallel approaches, but some really needed a large core counts with a fast interconnect. Putting the two types of users of the same system was not ideal. Also, engineers tend to want performance from a system, whereas managers want flexibility. Spartan provides both through its partitioning system. I am convinced that this will be architecture of future research computing.

Spartan's launch has received extensive media coverage, including high ranking sites such as HPC Wire, Gizmodo, and Delimiter. In addition to the aforementioned speakers, particular thanks must also be given to Linh Vu, Daniel Tosello, and Chris Samuel for their engineering excellence in helping put together the system, and to Greg Sauter for his project management (and for his photography). Welcome to Spartan!

Universal Numbers Presentation to Linux Users of Victoria, May 3rd, 2016

Submitted by lev_lafayette on Thu, 05/05/2016 - 01:38

Due to underlow and overflow, computers suffer rounding errors. These errors are highly significant, computers make them constantly and with great speed. Sometimes those errors cost millions of dollars, or directly lead to a tragic loss of life. Many of these errors are caused by the way that computers store numbers. The use of scientific notation, as implemented by IEEE floating point standards, is not only imprecise, but also requires too many bits - which is costly in power, time, and money.

Password Praise in the Future Tense

Submitted by lev_lafayette on Fri, 04/08/2016 - 11:33

Apropos the previous post, I am coming to the conclusion that University's are very strange places when it comes to password policies. Mind you, it shouldn't really come to much of a surprise - the choice of technologies adopted are often so mind-bogglingly strange one is tempted to conclude that the decisions are more political than technical. Of course, that would never happen in the commercial world. All this aside, consider the password policy of a certain Victorian university.

The constellation is changed, the disposition is the same

Submitted by lev_lafayette on Sat, 03/19/2016 - 11:52

Ars Technica has reported of a relatively small GPU-Linux cluster which can crack by brute force standard eight-character MS-Windows passwords in under six hours. There are, of course, a reasons and caveats. Firstly, as online servers will typically block repeat password attempts, is system is most effective against offline password hashes, which then of course can be used for online exploits.

The Danger of Reusing Old Scripts

Submitted by lev_lafayette on Sun, 03/13/2016 - 04:47

Did you know you can bring down an entire HPC cluster with an old script? Well, this week I had such an experience. As the systems administrator for a seriously aging cluster with over 800 post-graduate and post-doctoral researchers, "stress" is a normal part of daily life (for future reference: it's probably killing me).

Batchholds, Leap Seconds, and PBS Restarts

Submitted by lev_lafayette on Sat, 02/27/2016 - 12:41

It is not unusual for a few jobs to fall into a batchhold state when one is managing a cluster; users often write PBS submissions with errors in them (such as requesting more core than what is actually available). When a sysadmin has the opportunity to do so they should check such scripts, and educate the users on what they have done wrong.

Multicore World 2016 : A Summary

Submitted by lev_lafayette on Sun, 02/21/2016 - 00:55

Multicore World is a small annual international conference held in New Zealand/Aotearoa sponsored by OpenParallel. I have been fortunate enough to act an MC for all but one of the five conferences since its inception, this year also presenting a short paper on the introduction of the new HPC/Cloud hybrid at the University of Melbourne.

Foreword to Sequential and Parallel Programming with C and Fortran by Dr. John L. Gustafson

Submitted by lev_lafayette on Wed, 02/17/2016 - 23:42

It is finally time for a book like this one.

When parallel programming was just getting off the ground in the late 1960s, it started as a battle between starry-eyed academics who envisioned how fast and wonderful it could be, and cynical hard-nosed executives of computer companies who joked that “parallel computing is the wave of the future, and always will be.”

Reviving a Downed Compute Node in TORQUE/MOAB

Submitted by lev_lafayette on Tue, 02/02/2016 - 23:21

The following describes a procedure for bringing up a compute node in TORQUE that's marked as 'Down'. Whilst the procedure, once known, is relatively simple, investigation to come to this stage required some research and to save others time this document may help.

1. Determine whether the node is really down.

Following an almighty NFS outage quite a number of compute nodes were marked as "down". However the two standard tools, `mdiag -n | grep "Down"` and `pbsnodes -ln` gave significantly different results.

Lev Lafayette

lev_lafayette's blog

GnuCOBOL: A Gnu Life for an Old Workhorse

Spartan: A New Architecture for Research Computing

Universal Numbers Presentation to Linux Users of Victoria, May 3rd, 2016

Password Praise in the Future Tense

The constellation is changed, the disposition is the same

The Danger of Reusing Old Scripts

Batchholds, Leap Seconds, and PBS Restarts

Multicore World 2016 : A Summary

Foreword to Sequential and Parallel Programming with C and Fortran by Dr. John L. Gustafson

Reviving a Downed Compute Node in TORQUE/MOAB

Pages

You are here

lev_lafayette's blog

Pages