Installing MATLAB DCS/PCT on a Linux Cluster with PBS and TORQUE

MATLAB is a numerical computing environment allowing matrix manipulation, plotting of functions and data, implementation of algorithms, and high-level programming language. Typically it is run on desktop installs, which is quite problematic if one is doing a large computational problem. As a proprietary program, it really has missed the boat compared to alternatives like GNU Ocatve, which uses a similar language ("mostly compatible"), albeit without all the environment features, toolboxes etc. So, in most cases I personally would prefer to use GNU Octave whenever possible.

But suppose for a moment that you work in high performance computer centre. And suppose that many users, based on the familiarity of their academic environment, have pushed for MATLAB instead. Especially supposing that the producers of said software have (after much prodding) have released a distributed computing engine... Well folks, you're on you're way to submitting MATLAB jobs on a parallel system.

Essentially how it works is that the user must have MATLAB installed on their desktop system plus a creature called a parallel computing toolbox (PCT). The cluster system must also have MATLAB installed plus a creatured called the Distributed Computing Server (DCS) and a license to run on a certain number of nodes (aka 'workers'). The user submits a job from the desktop, it is encoded in a particular format that can be read by the DCS by the PCT, the DCS creates a pseudo submission script (we'll be using PBS) which carries out the task, and sends the information back to the desktop. which decodes and displays the results. Easy, eh?

There is a 157page (!) install guide for MATLAB as of version R2010b. The following is a most abbreviated summary and specific for machines with the topology as described above i.e., Linux cluster, Torque and PBS for resource management and job scheduling. The specific example below also refers to the use of a Linux client desktop system.

To install MATLAB on the cluster and desktop

1. Download from the website (or install from the DVD provided)
2. Copy to /usr/local/src directory (e.g., /usr/local/src/MATLAB/R2010b/)
3. In that directory, start up the autoinstaller (./install) or /dvdpath/install &
4. Install without using the Internet.
5. Accept the software agreement.
6. Provide the file installation key.
7. Enter license key and location of license.dat when prompted
8. Enter the full path to your license file. This may be on a license manager.
9. Select a custom install.
10. Specify an install in /usr/local/matlab/R2010b
11. Select the products to install. Ensure that the MATLAB DCS is selected for cluster and Parallel Computing Toolbox for Desktop.
12. Select symlinks to /usr/local/bin
13. Complete installation. Now set up the environment.
14. Copy the decode functions to the MATLAB path of the workers on the cluster. e.g.,

[root@tango-m ~]# cd /usr/local/matlab/R2010b/toolbox/distcomp/examples/integration/old/pbs/nonshared/unix
[root@tango-m unix]# cp pbsNonSharedSimpleDecodeFcn.m /usr/local/matlab/R2010b/toolbox/local
[root@tango-m unix]# cp pbsNonSharedParallelDecodeFcn.m /usr/local/matlab/R2010b/toolbox/local

15. On the Desktop, copy the PBS functions to the MATLAB path
lev@isocracy:/usr/local/MATLAB/R2010b/toolbox/distcomp/examples/integration/old/pbs/nonshared/unix$ sudo cp * /usr/local/MATLAB/R2010b/toolbox/local
lev@isocracy:/usr/local/MATLAB/R2010b/toolbox/distcomp/examples/integration/old/pbs/nonshared/unix$ sudo cp pbs* /home/lev/matlab
16. Test following the examples on the external VPAC website.

There, wasn't that fun? Now you get the chance to test it out, following the example provided for Using MATLAB DCS at VPAC (which was mostly written by me, with contributions by Jin Zhang and Chris Samuel). Yes, you will need most certainly need passwordless SSH; otherwise the client will not be able to submit to the server. Note that this is lot more difficult if you're running MS-Windows - see the above site for links that will help. In any case the following sample MATLAB scripts have been tested.

Serial Job


clusterHost = 'tango.vpac.org';
remoteDataLocation = '/home/lev/matlab/';
sched = findResource('scheduler', 'type', 'generic');
set(sched, 'DataLocation', '/home/lev/matlab')
set(sched, 'ClusterMatlabRoot', '/usr/local/matlab/R2010b');
set(sched, 'HasSharedFilesystem', true)
set(sched, 'ClusterOsType', 'unix');
set(sched, 'GetJobStateFcn', @pbsGetJobState);
set(sched, 'DestroyJobFcn', @pbsDestroyJob);
set(sched, 'SubmitFcn', {@pbsNonSharedSimpleSubmitFcn, clusterHost, remoteDataLocation});
j = createJob(sched)
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
submit(j)
waitForState(j)
results = getAllOutputArguments(j);
celldisp(results)

Parallel Job


clusterHost = 'tango.vpac.org';
remoteDataLocation = '/home/lev/matlab/';
sched = findResource('scheduler', 'type', 'generic');
set(sched, 'DataLocation', '/home/lev/matlab')
set(sched, 'ClusterMatlabRoot', '/usr/local/matlab/R2010b');
set(sched, 'HasSharedFilesystem', true)
set(sched, 'ClusterOsType', 'unix');
set(sched, 'GetJobStateFcn', @pbsGetJobState);
set(sched, 'DestroyJobFcn', @pbsDestroyJob);
set(sched, 'ParallelSubmitFcn', {@pbsNonSharedParallelSubmitFcn, clusterHost, remoteDataLocation});
pjob = createParallelJob(sched)
createTask(pjob, 'rand', 1, {3});
set(pjob,'MinimumNumberOfWorkers',9);
set(pjob,'MaximumNumberOfWorkers',9);
submit(pjob)
waitForState(pjob)
results = getAllOutputArguments(pjob)
celldisp(results);