Linux System Admin Memo: 2 Node Cluster with PMICH2 and Benchmarking

Creating 2 Nodes Cluster Using PMICH2 on Centos 5.5 x86

We will use 3 VMs

master    192.168.1.120
node1    192.168.1.121
node2    192.168.1.122

1. Editing /etc/hosts

Master

# vim /etc/hosts
127.0.0.1       localhost.localdomain   localhost
192.168.1.120   master
192.168.1.121   node1
192.168.1.122   node2

# scp /etc/hosts node1:/etc/
# scp /etc/hosts node2:/etc/

2. Creating SSH Keys

Master

# ssh key-gen --> Don't change anything
# cat /root/.ssh/*.pub > /root/.ssh/authorized_keys
# scp -r /root/.ssh/ node1:/root/
# scp -r /root/.ssh/ node2:/root/

3. Install PDSH ( Parallel Distribution SHell ) to control more than one machine at once

Master

Install Rpmforge Repo

# rpm --import http://apt.sw.be/RPM-GPG-KEY.dag.txt --> If you got an error this mean it installed before
# rpm -Uvh http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el5.rf.i386.rpm --> Or download it with wget and install it locally
# yum install pdsh --> For distributing commands to all nodes once at the same time
# vim /etc/pdsh/machines
node2
node2
master

# pdsh -a uptime --> you should have the result of the 3 machines

Note:- If any machine didn't respond and you sure all the above configurations are ok try to change the orde of the machines in /etc/pdsh/machines

4. Install Time Server to Disallow Time Variances Between The Nodes

Install this time server in any other machine for example the host hosting this VMs "In my case the host IP 192.168.1.2"

# yum install ntp
# vim /etc/ntp.conf

#server 0.centos.pool.ntp.org
#server 1.centos.pool.ntp.org   --> Comment them
#server 2.centos.pool.ntp.org

server 127.127.1.0 # local clock
fudge   127.127.1.0 stratum 10 --> Make sure that they are uncommented

# /etc/init.d/ntpd start
# chkconfig ntpd on

Master

phsh -a yum -y install ntp
# vim /etc/ntp.conf

server 192.168.1.2

#server 0.centos.pool.ntp.org
#server 1.centos.pool.ntp.org
#server 2.centos.pool.ntp.org        --> Comment them
#server 127.127.1.0 # local clock
#fudge   127.127.1.0 stratum 10

# scp /etc/ntp.conf node1:/etc/
# scp /etc/ntp.conf node2:/etc/
# pdsh -a /etc/init.d/ntpd start
# pdsh -a chkconfig ntpd on

5. Sharing The /cluster Using NFS

Master

# yum install nfs-utils.i386
# vim /etc/exports

/cluster    *(rw,sync,no_root_squash)

# mkdir /cluster
# /etc/init.d/portmap start
# /etc/init.d/nfs start
# chkconfig nfs on
# chkconfig portmap on
# pdsh -w node1,node2 mkdir /cluster
# pdsh -w node1,node2 yum -y install nfs-utils
# pdsh -w node1,node2 /etc/init.d/portmap start
# pdsh -w node1,node2 mount.nfs master:/cluster /cluster
# pdsh -w node1,node2 chkconfig nfs on
# pdsh -w node1,node2 chkconfig portmap on

Node1 Node2

# vim /etc/fstab
master:/cluster    /cluster    nfs    defaults    0 0

Note:- If you reboot the VMs or Start it any other time make sure that the master VM start first

6. Creating mpiuser and his SSH keys

Master

# pdsh -a groupadd -g 1000 mpigroup
# pdsh -a useradd -u 1000 -g 1000 -d /cluster/mpiuser mpiuser
# pdsh -a yum -y install gcc gcc-c++.i386 compat-gcc-34-g77.i386
$ su - mpiuser
$ ssh key-gen --> Don't change anything
$ cat ~/.ssh/*.pub > ~/.ssh/authorized_keys

7. Installing MPICH2

Master

# yum -y install patch
# cd /cluster && wget http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.4.1p1/mpich2-1.4.1p1.tar.gz
# chown mpiuser.mpigroup -R /cluster
# su mpiuser
$ cd /cluster && tar -xvzf mpich2-1.4.1p1.tar.gz
$ cd mpich2-1.4.1p1 && ./configure --prefix=/cluster/mpich2
$ make && make install
$ vim ~/.bash_profile --> Edit it as follow

PATH=$PATH:$HOME/bin:/cluster/mpich2/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cluster/mpich2/lib

export PATH LD_LIBRARY_PATH

$ source ~/.bash_profile
$ vim /cluster/mpiuser/hosts
node1
node2

$ mpiexec -f /cluster/mpiuser/hosts hostname --> results must be as follow
node1
node2

Note:- Make sure that print fingers of all nodes are saved in know_hosts of mpiuser

$ mpiexec -n 1 -f /cluster/mpiuser/hosts /cluster/mpich2-1.4.1p1/examples/cpi --> Test execution with one node
$ mpiexec -n 2 -f /cluster/mpiuser/hosts /cluster/mpich2-1.4.1p1/examples/cpi --> Test execution with two nodes

Note :- Since we are using VMs on the same host you won't notice any changes and sometimes the indicator will increase instead of decrease but in
        real world you will happy with the results.

Now Using The Cluster To Compile a File And Make Another Test

$ mpicc -o /cluster/mpich2-1.4.1p1/examples/icpi /cluster/mpich2-1.4.1p1/examples/icpi.c
$ mpiexec -f /cluster/mpiuser/hosts   -n 1 /cluster/mpich2-1.4.1p1/examples/icpi --> Add intervals say 1000000 then repeat the test with -n 2

8. Installating Benchmark Tool "Linpack"

Master

$ cd && wget http://ftp.freebsd.org/pub/FreeBSD/ports/distfiles/gotoblas/GotoBLAS2-1.13_bsd.tar.gz
$ tar -xvzf GotoBLAS2-1.13_bsd.tar.gz --> Special library for Linpack
$ cd GotoBLAS2
$ make TARGET=NEHALEM

$ cd && wget http://www.netlib.org/benchmark/hpl/hpl-2.0.tar.gz
$ tar -xvzf hpl-2.0.tar.gz && cd hpl-2.0
$ cp setup/Make.Linux_PII_FBLAS_gm .
$ vim Make.Linux_PII_FBLAS_gm --> Edit the following directives as follow
TOPdir       = $(HOME)/hpl-2.0
LAdir        = $(HOME)/GotoBLAS2
LAinc        =
LAlib        = $(LAdir)/libgoto2.a -lm -L/usr/lib/gcc/i386-redhat-linux/4.1.2 --> This is the path of gcc ver 4.1.2 make sure it does exit
CCFLAGS      = $(HPL_DEFS) -O3
LINKER       = mpicc

$ make arch=Linux_PII_FBLAS_gm
$ mkdir -p /cluster/mpiuser/hpl/
$ cp Make.Linux_PII_FBLAS_gm /cluster/mpiuser/hpl/

Note:- I made the last 2 steps to get around an error in the compilation process

9. Cluster Benchmarking

Master

$ cd /cluster/mpiuser/hpl-2.0/bin/Linux_PII_FBLAS_gm
$ cp HPL.dat HPL.dat.bak

To Determine The Size of The Problem

$ free -b --> To get the number of free blocks in RAM, in my case 181088256.Apply it in the following formula in your calculator using bc command

Note:- The free command should be executed any any node not the master

sqrt ( .1 * 181088256 * 2 ) --> 2 is the number of nodes result is 6018.1
$ vim HPL.dat --> Edit the following
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
6000      Ns
1            # of NBs
100      NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1        Ps
2        Qs
16.0         threshold
3            # of panel fact
0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
2            # of recursive stopping criterium
2 4          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
3            # of recursive panel fact.
0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

$ mpiexec -f ~/mpd.hosts -n 2 ./xhpl --> This will run many tests to benchmark performance

Note :- Tweak the HPL.dat configuration untill you get maximum utilization of CPU, for me after tweaking the configuration I got the following is the results in top command for node1 and node2

PID USER      PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
2136 mpiuser   25   0 1564m 1.5g 1188 R 100.2 76.6   1:00.74 xhpl

Linux System Admin Memo

Pages

Thursday, 19 January 2012

2 Node Cluster with PMICH2 and Benchmarking

No comments:

Post a Comment