As dataset size and complexity requirements grow increasingly researchers need to find additional computational power for processing. A preferred choice is high performance computing (HPC) which, due to its physical architecture, operating system, and optimised application installations, is best suited for such processing. However HPC systems have historically been less effective at the visual display, and least of all in an interactive manner, leading into a general truism of "compute on the HPC, visualise locally".
There is much that irks me in academia. The way that disciplines are almost randomly assigned to artium, scientiae, or legum, without any reference to their means of verification or falsification. Or, for that matter, the Dewey (or Universal) Decimal Classification for libraries, which, in its insanity, places computer applications in the same category as "Fundamentals of knowledge and culture" and "Propaedeutics". One could also describe ask why the value "Dead languages of unknown affiliation" also belongs with "Caucasian languages". I suppose most of them are "near dead", right?
Then there is the eye-watering level of digital illiteracy among academics, researchers, students, and professional staff. It is little wonder that closed-knowledge academic journals and proprietary software companies fleece the university sheep and make out like bandits. They don't even realise that they've been robbed, such is the practical ignorance of the lofty principles that they espouse. Ever received a document in a proprietary format explaining how important it is to make content accessible for the visually impaired? Yeah, it's like that all the time, a combination of hypocrisy combined with willful ignorance.
But I reserve a special spot in my hell for referencing systems.
"EAZY is a photometric redshift code designed to produce high-quality redshifts for situations where complete spectroscopic calibration samples are not available", which is a pretty excellent project. However, it has a few issues that are illustrative of typical problems when developers aren't thinking in terms of operations. I like this software, it carries out a valuable scientific task with ease, which will be awesome in an HPC environment, especially when run as a job array.
A common need among those who engage in large scale image processing is to assign a watermark of some description to their images. Further, so I have been told, it is preferable to have multiple watermarks that have slightly different kerning depending on whether the image is portrait or landscape. Thus there are two functions to this script, one for separating the mass of images into a directory into whether they are portrait or landscape, and a second to apply the appropriate watermark. The script is therefore structured as follows and witness the neatness and advantages of structured coding, even in shell scripts. I learned a lot from first-year Pascal programming.
With annual conferences since 2007 eResearchAustralasia was hosted online this year, due to the impacts of SARS-CoV-2. Typically conferences are held along the eastern seaboard of Australia, which does bring into question the "-asia" part of the suffix. Even the conference logo highlights Australia and New Zealand, to the exclusion of the rest of the word. I am not sure how eResearch NZ feels about this encroachment on their territory.
As datasets grow in size and complexity faster than personal computational devices are able to perform more researchers seek HPC systems as a solution to their computational problems. However, many researchers lack the familiarity with the environment for HPC, and require training. As the formal education curriculum has not yet responded sufficiently to this pressure, leaving HPC centres to provide basic training.
Previous presentations to eResearch Australiasia described the implementation of Spartan, the University of Melbourne’s general- purpose HPC system. Initially, this system was small but innovative, arguably even experimental. Features included making extensive use of cloud infrastructure for compute nodes, OpenStack for deployment, Ceph for the file system, ROCE for network, Slurm as the workload manager, EasyBuild and LMod, etc.
Secure Shell is a very well established cryptographic network protocol for accessing operating network services and is the typical way to access high-performance computing (HPC) systems in preference to various unsecured remote shell protocols, such as rlogin, telnet, and ftp. As with any security protocol it has undergone several changes to improve the strength of the program, most notably the improvement to SSH-2 which incorporated Diffie-Hellman key exchange. The security advantages of SSH are sufficient that there are strong arguments that computing users should use SSH "everywhere".
Recently, a friend expressed a degree of shock that I could pull old, even very old, items of conversation from emails, Facebook messenger, etc., with apparent ease. "But I wrote that 17 years ago". They were even dismayed when I revealed that this all just stored as plain-text files, suggesting that perhaps I was like a spy, engaging in some sort of data collection on them by way of mutual conversations.
For my own part, I was equally shocked by their reaction. Another night of fitful sleep, where feelings of self-doubt percolate. Is this yet another example that I'm have some sort of alien psyche? But of course, this is not the case, as keeping old emails and the like as local text files is completely normal in computer science. All my work and professional colleagues do this.
What is the cause of this disparity between the computer scientist and the ubiquitous computer user? Once I realised that the disparity of expected behaviour was not personal, but professional, there was clarity. Essentially, the convenience of cloud technologies and their promotion of applications through Software as a Service (SaaS) has led to some very poor computational habits among general users that have significant real-world inefficiencies.
It may initially seem counter-intuitive, but sometimes one needs to process an image file without actually viewing the image file. This is particularly the case if one has a very large number of image files and a uniform change is required. The slow process is to open the images files individually in whatever application one is using and make the changes required, save and open the next file and make the changes required, and so forth. This is time-consuming, boring, and prone to error.