Installing NAMD 2.7b1 with CUDA on a 64-bit AMD Opteron Cluster running CentOS 5 Linux

NAMD is a parallel molecular dynamics code for large biomolecular systems. Version 2.7b1 has some advantages over 2.6 and an extensive user's guide. CUDA (Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA providing a computing engine in NVIDIA graphics processing units or GPUs. It should make NAMD awesome, if it works.

Like Troy McLure, you might remember NAMD from earlier installs. Most of those instructions still apply. Especially note that you cannot do a CUDA version of NAMD with versions prior to 2.7b1

Download and unpack as per the previous instructions. Extensive testing at VPAC has suggested that the Intel compilers work the best. So set the following environment variables. Hopefully you use modules.


module load openmpi/1.3.2-intel
module load fftw/2.1.5-intel-openmpi
module load tcl/8.5.3
module load intel/11.0

Build and test the Charm++/Converse library with the following commands:


tar xf charm-6.1.tar
cd charm-6.1
./build charm++ mpi-linux-x86_64 --no-shared -O -DCMK_OPTIMIZE=1
cd mpi-linux-x86_64/tests/charm++/megatest
make pgm
mpiexec -n 2 ./pgm

I'm not going to make the 'charm' pun a second time.

Make your way back up to the source directory and either follow the instructions in the readme.txt for installing the TCL and FFTW libraries or, in this particular case, load the modules as above. Load modules if you have not already done so (the parallel build would have failed if you did not load the openmpi module), make a directory, and build for each FFTW, Open-MPI, GCC and PGI combination as appropriate.

Because NAMD's config file does not offer a --prefix option, one needs to create a new directory;


mkdir /usr/local/src/NAMD/2.7b1-openmpi-intel-cuda
cp -r /usr/local/src/NAMD/NAMD_2.7b1_Source/ /usr/local/src/NAMD/2.7b1-intel-cuda/
cd /usr/local/src/NAMD/2.7b1-openmpi-intel-cuda

Note that there's significant differences between fftw 2.x and fftw 3.x; the former works, I haven't tried the latter.

Go into the arch directory and modify the Linux-x86_64.fftw and Linux-x86_64.tcl files so that the former includes the path FFTDIR=/usr/local/fftw/2.1.5-gcc-openmpi and the later TCLDIR=/usr/local/tcl/8.5.3 and TCLLIB=-L$(TCLDIR)/lib -ltcl8.5.3 -ldl

Then configure and make.


./config Linux-x86_64-icc --charm-arch mpi-linux-x86_64 --with-cuda
cd Linux-x86_64-icc
make

... and it crashes gloriously. It gets very confused with the idlepoll definition in src/BackEnd.C.

src/BackEnd.C(135): error: identifier "idlepoll" is undefined.

This is a bug and has been filed as such. So let's comment that out until we can find out how necessary it really is.


//#ifdef NAMD_CUDA
// if ( ! idlepoll ) {
// NAMD_die("Please add +idlepoll to command line for proper performance.");
// }
//#endif

This hack is so dirty I had to wash my hands afterwards. Nevertheless make clean and then make again and it works.

Nota bene: I have received confirmation of the bug. The above obviously works, but the correct solution is as follows:

That is a bug in NAMD. idelpoll is only defined in net version of
charm++, not the mpi-linux-x86_64 version that you are compiling NAMD
against.

That code should be changed to:


#if defined(NAMD_CUDA) && CMK_NET_VERSION
if ( ! idlepoll ) {
NAMD_die("Please add +idlepoll to command line for proper performance.");
}
#endif

In the Linux-x86_64-icc directory there will now be a namd binary. Copy this to /usr/local/namd/2.7b1-openmpi-intel-cuda and create a new module (not a symlink, because it requires different pre-installed modules (e.g., cuda, intel compilers etc).

The good Dr. Mike Kuiper ran some tests comparing the GPU version of NAMD with the normal version and received the following truly impressive results:


regular NAMD
Single node tango:
TCL: Running for 5000000 steps
Info: Initial time: 1 CPUs 1.20061 s/step 6.94798 days/ns 170396 kB memory
Info: Benchmark time: 1 CPUs 1.19534 s/step 6.91746 days/ns 170892 kB memory
Info: Benchmark time: 1 CPUs 1.19519 s/step 6.91663 days/ns 171276 kB memory
Info: Benchmark time: 1 CPUs 1.18844 s/step 6.87757 days/ns 171828 kB memory


CUDA NAMD
Single node + gpu :
Info: Initial time: 1 CPUs 0.105493 s/step 0.610495 days/ns 154.206 MB memory
Info: Benchmark time: 1 CPUs 0.105549 s/step 0.610814 days/ns 154.578 MB memory
Info: Benchmark time: 1 CPUs 0.105243 s/step 0.609047 days/ns 154.904 MB memory
Info: Benchmark time: 1 CPUs 0.105273 s/step 0.609219 days/ns 155.172 MB memory

Further Update

An install was made using the CVS repository on April 6, 2010. This required two minor modifications to the above instructions.

Firstly, the charm directory is now 6.1.3, rather than 6.1

Secondly, we wanted to install against Cuda 3.0 rather than 2.3 and thus had to modify ../arch/Linux-x86_64.cuda to reflect this change, and modify LIBCUDARTSO=libcudart.so.2 to LIBCUDARTSO=libcudart.so.3.