Matlab R2012b DCS Job Submission

MATLAB (registered trademark implied in this post) is a popular closed-source graphical product for matrix mathematics. Whilst I certainly prefer in most cases the extremely compatible open-source competitor Octave for all the well-founded usual reasons, MATLAB does have a small mountain of libraries and a small fortune backing it, both of which aren't quite always available to the GNU Octave people. So in some cases, based invariably on user requests, some dealings with MATLAB is sometimes required.

MATLAB can run on clusters in parallel through a combination of encoding and decoding from the local machine using PCT (parallel computing toolbox) and DCS (Distributed Computing Server), both of which have been described in the past. Note that this is different to the single core headless jobs that can be run on a cluster as well (and are somewhat easier to implement, if slower on difficult tasks).

Assuming that one has MATLAB PCT installed and a cluster with MATLAB DCS jobs can be submitted in parallel to the cluster. The submission syntax however has changed significantly from in R2012b on from previous versions illustrated in previous posts. The following is for the new version.

1. Submission Files

README file must be followed and the relevant files copies as requested.

The path for Linux clients for R2012b (for example) are:

$matlabroot/toolbox/distcomp/examples/integration/old/pbs/nonshared/README

2. Configuring SSH on Linux

Run ssh-keygen to create a new key (if you don't already have one) and set a passphrase.
Run ssh-copy-id $user@example.org to copy your ID to the cluster.
Run ssh-add to add the newly created key to your existing ssh-agent configuration.

3. Initial Configuration

MATLAB PCT requires some initial configuration of the scheduler to say what the name of the cluster is, where MATLAB is installed on the cluster, where the MATLAB temporary files are to be stored there and where they should be created on your PC.

Linux users will need to use ssh-add to unlock their key for this session.

This code is mostly common to both the examples below and should be run before each of them.


cluster = parallel.cluster.Generic( 'JobStorageLocation', '/tmp/' );
set(cluster, 'HasSharedFilesystem', false);
;change if you're using a different version
set(cluster, 'ClusterMatlabRoot', '/usr/local/matlab/default');
set(cluster, 'OperatingSystem', 'unix');
clusterHost = 'trifid.vpac.org';
;insert your username here
remoteJobStorageLocation = '/home/$username/matlab';
set(cluster, 'IndependentSubmitFcn', {@independentSubmitFcn, clusterHost, remoteJobStorageLocation});
set(cluster, 'CommunicatingSubmitFcn', {@communicatingSubmitFcn, clusterHost, remoteJobStorageLocation});
set(cluster, 'GetJobStateFcn', @getJobStateFcn);
set(cluster, 'DeleteJobFcn', @deleteJobFcn);

4. Serial Job on a Cluster

The following code sequence is an example of how to submit a number of individual tasks, each doing independent work, as a single job.


j = createJob(cluster);
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
submit(j);
; wait for all those tasks to finish.
; NB: This could take some time if you have to wait in the queue!
results = getAllOutputArguments(j);
celldisp(results);

5. Parallel Job on a Cluster

This submits a parallel job running on 2 processors that returns a series of random numbers.


j = createCommunicatingJob(cluster, 'Type', 'spmd');
createTask(job, 'rand', 1, {3});
cluster.NumWorkers=2;
submit(j);
results = getAllOutputArguments(j);
celldisp(results);