![]() |
login
webmail
|
||
![]() |
FAQ | ||
|
... |
|||
|
|
|||
DCC/Debian cluster FAQNikola PavkovicValentin Vidic$Id: faq.sgml,v 1.16 2005/03/21 16:01:02 nix Exp $Copyright © 2004 Nikola Pavkovic Copyright © 2004 Valentin Vidic $Date: 2005/03/21 16:01:02 $ 1. User administrationTo add a user to the cluster use the dcc_useradd(8) script, for example: # dcc_useradd -m bobThis will add user bob to the user database and create a home directory for him ( -m option). For the complete list of available
command line options see cpu-ldap(8).To delete a user from the cluster use the cpu(8) command, for example: # cpu userdel -r bobThis will delete user bob from the user database and remove his home directory ( -r option). Again, read cpu-ldap(8) for
the list of available options.To change the user's password, use the passwd(1) command: # passwd bob bob@cluster$ passwd To change the default shell for the user, use the chsh(1) command: # chsh bob bob@cluster$ chsh To change the Geckos field for the user, use the chfn(1) command: # chfn bob bob@cluster$ chfn 2. Software administrationFirst enter the work-node image: # dcc_editimage node-imageThen install the software like: CHROOT# apt-get install libpvm3or like: CHROOT# ./configure && make && make installFinally, update the work-nodes: # cpushimage node_image
Install the kernel from the package: CHROOT# apt-get install kernel-image-2.4-686-smpor use the source: CHROOT/usr/src/linux-2.4.28# make menuconfig CHROOT/usr/src/linux-2.4.28# make dep CHROOT/usr/src/linux-2.4.28# make bzImage modules CHROOT/usr/src/linux-2.4.28# make modules_install CHROOT/usr/src/linux-2.4.28# cp arch/i386/boot/bzImage /boot/vmlinuz-2.4.28 CHROOT/usr/src/linux-2.4.28# cp System.map /boot/System.map-2.4.28Update the list of kernel-images: # mksidisk -A --file /etc/dcc/disktable --name image_namePush the changes to the work-nodes: # cpushimage node_imageBefore rebooting the work-nodes, check that systemconfigurator(1p) configured the boot-loader on the work-nodes correctly (e.g. /etc/lilo.conf). 3. The queuing systemTo submit a job to the queueing system use qsub(1). If you want an interactive session, use it like this: $ qsub -IThat will open an interactive session on a free node within the cluster. As you are finished, just exit the shell, and that's it. This is the most simple method of running jobs through the queueing system. However, your session does not need to be interactive. You can create a script containing shell commands you want to execute, and submit the script to the queueing system: $ qsub /home/user/testjob.shThe queueing system will run the script on one of the work-nodes. It is possible to describe the needed resources in detail, for example: $ qsub -l nodes=2:ppn=2 /home/user/testjob.shThe requested resources are two work-nodes (nodes=2) with two processors each (ppn=2). For a detailed list of available parameters check Job submission section of the Torque manual. As you have successfully submitted your job to the queueing system, you are able to monitor it's status. It is done either with qstat(1), pbstop or pestat commands. Additionally, you can monitor overall cluster performance at your Ganglia cluster monitor web interface. Your local Ganglia URL is http://your.cluster.fqdn/ganglia For information on fine-tuning your queueing-system configuration please refer to the TORQUE Admin Manual 4. MiscellaneousIf you want to change disk partition configuration on the work-nodes, first change the settings in /etc/dcc/disktable so that it reflects your desired partitioning scheme. After that, you have to issue two commands for the changes to take effect: mksidisk -A --name node --file /etc/dcc/disktable mkautoinstallscript --image node --force --ip-assignment dhcp --post-install rebootFinally, reboot your work-nodes for the installation process to take place. For more information on above two commands, consult mksidisk(1) and mkautoinstallscript(8) man pages. To change the front-node's external hostname after installation, you have to update following configuration files on the front-node: /etc/hosts , /etc/hostname , /etc/network/interfaces , /etc/torque/server_name , /etc/c3.conf and /etc/gmond.conf. You will also want to reconfigure your MTA, which can be done like this (if you're using exim4): # dpkg-reconfigure exim4-configFinally, reboot the cluster, to check if everything works correctly. If the jobs spanning multiple work-nodes won't start try running the following two commands on the front-node: # cpushimage image_name # /etc/init.d/torque-server restart-quick # cexec /etc/init.d/torque-mom restart-quickThis is required because various files can get out of sync if the nodes are installed (and booted) one after the other. Consider the following sequence of events:
As you start working on DCC clusters you may notice that root password is not set on the work-nodes, root's SSH key doesn't exist on the work-nodes etc. We decided on this because it is not safe to put sensitive information into the images. Images are served anonymously via rsync(1) so any cluster user can retrieve sensitive data from the image: bob@cluster$ rsync -v rsync://node0/image_name/etc/shadow . shadow sent 91 bytes received 784 bytes 1750.00 bytes/sec total size is 679 speedup is 0.78While it is possible to use rsync or SSH password authentication, there is no easy fix that would still allow for unattended work-node installation. Therefore, be careful about putting sensitive information in the images. debconf-dcc package is installed first. It will ask you (through debconf) to enter the names of front-end's internal and external interface. It will deduce other information from these answers (using ifconfig(8) and friends). As a result, /etc/dcc/config and /etc/dcc/debconf are created. /etc/dcc/config is the main DCC configuration file, and /etc/dcc/debconf contains a debconf database with preloaded answers for packages DCC depends on. This way when dcc-front and its dependencies (like slapd, ssh etc.) are installed they won't ask you for any questions. Hence you have a silent install. Moreover, dcc-front depends on a series of packages whose only purpose is to configure existing Debian packages. slapd-dcc will, for example, configure LDAP server to be used as an user database for the cluster. For details look at the postinst scripts of those packages. DCC installation in the work-node image is similar. After the debootstrap builds the base system, /etc/dcc/config is copied into the image. When debconf-dcc is installed in the image it will use this file instead of asking questions. Just like dcc-front on the front-node, dcc-node and its dependencies will be silently installed because of the preloaded answers to debconf questions (/etc/dcc/debconf). See dcc_buildimage(8) for details of the image build process. Although all this makes the installation simple, it also violates the strict Debian policy guidelines. slapd-dcc, for example, modifies /etc/ldap/slapd.conf which is owned by another package. Moreover, there is no sense in installing e.g. slapd-dcc if debconf-dcc isn't installed first. This applies to the other DCC packages too. Also there might be other issues not mentioned here. Several manual steps are required. More info in the HOWTO provided by Gordon Grubert. | |||||
|
© 2003-2009 Ruđer Bošković Institute || last changed: 12/05/2006 04:53 pm (Valentin Vidić) Optimized for: Internet Explorer and Mozilla Firefox. | |||||