login webmail hrvatski
Installation
IRB: Bijenička 54, HR-10000 Zagreb. tel: +385 (0)1 4561-111, fax: 4680-084, PR: 4571-269, mail: info@irb.hr
... Projects Internal Projects Debian Cluster Components Installation
search people contact where we are? sitemap help print history Bookmark and Share

DCC/Debian cluster installation HOWTO

Nikola Pavkovic

Valentin Vidic

$Id: install.sgml,v 1.19 2005/02/14 10:53:32 vvidic Exp $

$Date: 2005/02/14 10:53:32 $

This is a step-by-step guide on setting up a DCC/Debian based cluster. The reader is expected to have some prior knowledge about common system administration tasks on a Debian based system, such as OS installation, network configuration, disk partitioning etc.


1. Hardware

A Linux cluster consists of a number of computers, networked over a (preferably high speed) switch. One of those nodes has a second Ethernet controller, which is connected to the outside network. That 'public' node is called the front-node. The other nodes in the internal network are called work-nodes. The following figure shows a typical cluster network topology:

 ------------- eth0
| work-node 1 |-------------
 ------------- 10.0.0.2     |
                            |
 ------------- eth0         |
| work-node 2 |------------ |                               ________
 ------------- 10.0.0.3    ||                              (        )
     ....                  ||   eth1 ------------ eth0    ( outside  )
     ....                  ||     --| front-node |-------(  network  )
 ------------- eth0        ||    |   ------------ x.x.x.x (         )
| work-node n |----------- ||    | 10.0.0.1                (_______)
 ------------- 10.y.y.y   |||    |
                        -------------
                       | eth. switch |
                        -------------

In order to build a simple cluster you will need:

  1. One front-node computer with two Ethernet controllers and plenty of disk space

  2. One or more work-nodes

  3. Ethernet switch & cables

NoteNetwork speed
 

Internal network speed has a great impact on installation performance. Although a 10Mbit/s network will do the work, a minimum 100Mbit/s switch is preferred for reasonable performance.

NoteNode classes
 

As described above, only the front-node is directly accessible from the outside network, which makes it the only possible point to access the system. Since the front node runs all the essential cluster services, holds the users' home directories and provides Internet access to the outside network, it is the most important node in the cluster. Front-node also holds the images of the work-nodes' operating systems and acts as a network-boot server and the autoinstallation server for the work-nodes. The front-node is a as a typical networked Debian server system, running plenty of inter-dependant services that are needed for the cluster in whole, to be functional. Work-nodes represent the computational work-power of the cluster. For parallel high-performance application execution, it is recommended that the work nodes are of the same type.


2. Prerequisites

Before you start building your new DCC/Debian cluster, make sure that you have/know the following:

  1. Hardware components listed above,

  2. Network settings for accessing the outside network,

  3. Debian Sarge installation CD (try the netinst image, it's cool).


3. Installation

Now that you have all the required hardware/software/knowledge, we are ready to proceed with the cluster installation procedure.


3.1. Hardware setup

Put together all the components as described in figure above.


3.2. Front-node installation

Your first step is to install Debian Sarge/testing on your front-node (the one with two network cards). Please refer to official Debian installer homepage , and download the installation CD. Check Debian installation manual for details.

NotePartition sizes
 

Make sure you have enough free space on the partition where /var/lib/systemimager/images resides because that is where the node image will be created.

NoteMTA configuration
 

It is highly recommended that the mail transport agent is configured properly.


3.2.1. Network setup

Now you have a working Debian Sarge/testing installation on the front-node. In order to install DCC components on the front-node, two network interfaces have to be configured properly, one of which is connected to the outside network, and the other connected to the internal switch to which all the other cluster nodes are connected too.

A typical network configuration might look like this (file /etc/network/interfaces):

auto eth0 eth1
iface eth0 inet static
        address 161.53.2.15
        netmask 255.255.255.0
        broadcast 161.53.2.255

iface eth1 inet static
         address 10.0.0.1
         netmask 255.255.255.0
         broadcast 10.0.0.255
    
Don't forget to check/set the hostname in /etc/hostname file:
EXTERNALNAME
    

The corresponding /etc/hosts file should look like:

10.0.0.1      node0.cluster   node0
161.53.2.15   EXTERNALNAME.DOMAIN EXTERNALNAME
127.0.0.1     localhost
    
The EXTERNALNAME, DOMAIN, and the corresponding IP address are to be changed to whatever your settings for accessing the public network are. Restart network if necessary for the changes to take effect.


3.2.2. Check network setup

Check ifconfig, you should have eth0 and eth1 configured. Also, check the following (all should work):

  # getent hosts $EXT_IP
  # getent hosts 10.0.0.1
  # getent hosts $EXT_NAME
  # getent hosts node0
  # hostname -f
  
Of course, use the external hostname and IP address instead of $EXT_* .


3.2.3. APT setup

As you have the network properly configured, you are ready to pull the DCC packages from our APT repository. In order to do so, add following line on top of your /etc/apt/sources.list file:

  deb http://ftp.irb.hr/pub/irb/dcc ./
  
Update local APT database:
  # apt-get update
  
Before proceeding with DCC installation, it is preferred to upgrade all packages to the newest version:
  # apt-get -u upgrade
  


3.2.4. DCC installation

Now, you are ready to install the DCC packages. This is done in two steps:

    # apt-get install debconf-dcc
    
Answer possible questions asked by debconf, and as debconf-dcc installation finishes, install the dcc-front meta-package:
    # apt-get install dcc-front
    

NoteSLAPD password
 

You will have to set the slapd administrator password. In order to do so, debconf will ask you to set it in the dialog screen while configuring the slapd package.

After the process finishes, you end up with a running front-node, waiting to serve as the network-boot installation server for the work-nodes. However, there are some things to be done before the actual work-node installation process can begin.


3.3. Building the image

In order to install the operating system on the work-nodes, we must understand how the autoinstallation procedure works. DCC comes with System Installation Suite, and as such, is an image-based installation model. The image needs to be built before it can be deployed on the work nodes. To get the whole picture how the SIS autoinstallation system works, please refer to the SIS project documentation. The image that is to be deployed on the work-nodes has to be built on the front-node. It is done following this simple procedure:

  1. Edit /etc/dcc/disktable

    This file describes the partition and mount table of the work nodes. You can experiment with various settings, but uncommenting the lines with root partition and swap partition should work on any IDE/ATA-based work-nodes. If you have SCSI-based work-nodes just change /dev/hda1 to /dev/sda1 or similar. Make sure that the sum of declared partition sizes does not exceed the total size of the physical hard disk on the target node.

  2. The image will be built using packages from ftp.debian.org. If you prefer to use some other Debian mirror, change the deboot line in /etc/dcc/sources.list, for example:

    deboot http://ftp.irb.hr/debian sarge
          

  3. Run the DCC buildimage wrapper-script

    # dcc_buildimage
          
    As the process finishes, you end up with full-blown work-node filesystem image inside the /var/lib/systemimager/images/node directory.


3.4. Deploying the image on work nodes

Now that the work-node image is created, the images have to be deployed on the work-nodes. This is done in a very easy way. If your work-nodes' BIOS supports booting over the network(PXE), choose this option as the primary boot method. On the front-node, run:

  # dcc_discovernode
  
...and turn on your work-node(s). As the DHCP requests come from the work-nodes, their MAC addresses are extracted, a new IP address is assigned to that MAC, and that information is put into the SIS database.

However, if your work-nodes' BIOS does not support netboot, create a boot diskette:

  # mkautoinstalldiskette
  
... boot from the diskette or CD, and proceed as described above.

When the installation finishes, the work-node is rebooted. As it comes up again, it is ready to do the real work. Do the same with all the work nodes that you wish to install.

NoteIn case of an error
 

If you encounter an error while installing the work-node, check what the error message on the console is, it can help you find out what went wrong.


4. Postinstall notes

Run the following two commands on the front-node after all the worker-nodes have been installed:

# cpushimage image_name
# /etc/init.d/torque-server restart-quick
# cexec /etc/init.d/torque-mom restart-quick
For a detailed explanation why this is required, read the DCC FAQ.

Check out Ganglia cluster-monitor web interface at http://your.cluster.fqdn/ganglia !