login webmail hrvatski
About
IRB: Bijenička 54, HR-10000 Zagreb. tel: +385 (0)1 4561-111, fax: 4680-084, PR: 4571-269, mail: info@irb.hr
... Projects Internal Projects Debian Cluster Components About
search people contact where we are? sitemap help print history Bookmark and Share

About DCC project

Nikola Pavkovic

Valentin Vidic

$Id: readme.sgml,v 1.4 2005/02/15 16:06:12 nix Exp $

$Date: 2005/02/15 16:06:12 $


1. Project Overview

Linux clusters are being more and more adopted as supercomputing infrastructure facilities in scientific research laboratories, but also in other kinds of research work where high computing power is essential. With their price/performance ratio Linux clusters more and more take over the market share of specialized super-computers.

In order to build a "Linux cluster" from a number of standalone PCs, one must upgrade a standard Linux distribution with some extra functionality which will provide easy installation, administration, enforcing the security policy and monitoring of elementary resources within a Linux cluster. Although some of the tools already exist, there are a very few complete distributions of Linux targeting High-Performance users. The real problem is that all cluster-targeted distributions are based on RedHat as a base Linux distribution.


Due to lack of funding, this project is no longer being maintained!

2. Why Debian?

There are many Linux distributions available on the market today, and every one of them has some pros and cons. But, some quality characteristics make Debian GNU/Linux a prime choice when selecting a platform for intensive, mission-critical applications.

From practical experience, it is very well known that Debian handles the security patches in a very unique, admin-friendly way. The average system can be upgraded to up-to-date package versions in a few minutes, issuing only one single command. This is essential, because no administrator wants to spend his/her time manually resolving inter-package dependencies. There are some tools developed for this task targeted for other distributions, but APT (Debian's package tool) has been integrated into the system a long time ago, and is proved to be very reliable tool, which is capable of resolving most complex inter-package dependency problems.

On the other hand, the Debian security team is one of the most responsive security teams among other Linux distribution's security teams. All the disclosed security issues are patched and put to the official Debian APT mirror sites in less than 48 hours. Debian is often the first Linux distribution that releases a patched package when a security problem occurs. In order to keep the system's security level high, this is a very important issue.

Finally, from phylosophical point of view, Debian tries hard to be the 'purest' GNU distribution. The social contract assures that all the developed software is to be held within Open Source. Since it is driven by 'philosophy', rather than the market, the concept is fully functional for more a decade. Unlike some other distributions, the first goal for Debian is quality of released software, and since it is not driven by the market, there is always plenty of time to assure the quality of software.

Looking from technical, legal and security points of view, Debian makes the first choice when selecting a Linux distribution for mission-critical deployment.


3. Project Goals

We expect to integrate some existing technologies (like LDAP, System Installation Suite, Torque, C3...) and develop a production-grade toolset for easier cluster management, based on Debian GNU/Linux distribution. This involves development of automation mechanisms that provide a flexible platform for high-performance computation tasks, but also provide a system-administrator to have a secure, easy to maintain, reliable and good supported cluster administration toolbox, based on Debian/GNU Linux.


4. Components

The DCC suite consists of various software components:

System Installation Suite

From the homepage: "System Installation Suite is a collection of open source software projects designed to work together to automate the installation and configuration of networked workstations."

The DCC project strongly depends on SIS suite as the core autoinstallation and image deployment model.

TORQUE Queueing System

From the homepage: "TORQUE (Tera-scale Open-source Resource and QUEue manager) is a resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original PBS project and has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC, the U.S. Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many other leading edge HPC organizations."

DCC suite comes with TORQUE system, prepackaged for Debian. Hence the following legal notice: "This product includes software developed by NASA Ames Research Center, Lawrence Livermore National Laboratory, and Veridian Information Solutions, Inc. To obtain complete source code for OpenPBS and modifications/additions provided in torque visit www.openpbs.org and/or www.supercluster.org/downloads."

C3 cluster suite

From the homepage: "This suite implements a number of command line based tools that have been shown to increase system manager scalability by reducing time and effort to operate and manage the cluster."

The C3 suite is a integral part of DCC.

LDAP - Lightweight Directory Access Protocol

From the homepage: "LDAP stands for Lightweight Directory Access Protocol. As the name suggests, it is a lightweight protocol for accessing directory services, specifically X.500-based directory services. LDAP runs over TCP/IP or other connection oriented transfer services. The nitty-gritty details of LDAP are defined in RFC2251 "The Lightweight Directory Access Protocol (v3)" and other documents comprising the technical specification RFC3377. This section gives an overview of LDAP from a user's perspective."

DCC clusters use LDAP as the core authentication infrastructure within the cluster. The LDAP server is located on the front-node, and it holds all the account information (/etc/passwd etc.). It makes central account management possible.

Ganglia system-monitoring with web-frontend

From the homepage: "Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes."

For system-health monitoring, Ganglia was chosen for it's scalability and robustness. Although the main system monitoring software was already available within the Debian tree, the webfrontend needed to be packaged into Debian package form.

DHCP server

From the homepage: "ISC's Dynamic Host Configuration Protocol Distribution provides a freely redistributable reference implementation of all aspects of the DHCP protocol, through a suite of DHCP tools: DHCP server, DHCP client, DHCP relay agent"

DHCP v3 is a integral part of the DCC suite. The DHCP server is responsible for IP address assignment within the cluster.

Shorewall firewall

From the homepage: "The Shoreline Firewall, more commonly known as "Shorewall", is a high-level tool for configuring Netfilter. You describe your firewall/gateway requirements using entries in a set of configuration files. Shorewall reads those configuration files and with the help of the iptables utility, Shorewall configures Netfilter to match your requirements. Shorewall can be used on a dedicated firewall system, a multi-function gateway/router/server or on a standalone GNU/Linux system."

For it's simplicity and flexibility in configuration, Shorewall firewall was chosen to be a integral part of DCC.

TFTP server

The atftpd server is used to provide network-boot functionality for the cluster-nodes.


5. Disclaimer

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.