Diskless Compute Node Installation


From HEP at Tennessee

Jump to: navigation, search

Contents

Linux Kernel with Lustre Support

The Linux kernel was compiled from the Lustre kernel source package with the following changes.

  • Added CONFIG_ROOT_NFS
  • Must enable "Loadable Module Support" sub-options for a clean compile.
  • Fixed a bug in the kernel source module nfsroot.c: added "Opt_acl, Opt_noacl," to line 127.
  • Full config file

PXE Config for Diskless Booting

LABEL linux
        KERNEL KERNEL
        APPEND noapic root=/dev/nfs nfsroot=10.0.0.1:/mnt/nodefs/rootfs-SL4-x86_64 nomce ro
        ipappend 1

Notes:

  • noapic: Avoid qc_timeout errors when the system is initializing the SATA hard drives.
  • nomce: Prevent spurious Machine Check Exceptions due to Opteron 265 processors.
  • ro: The diskless systems mount the root filesystem read-only (stateless).
  • ipappend: Kernel does not need DHCP IP autoconfiguration enabled.

DHCPD Configuration

DHCPD server for network booting was compiled from source (dhcp-latest.tar.gz) and uses the following configuration:

default-lease-time            86400;
max-lease-time                86400;
 
option subnet-mask            255.255.255.0;
option broadcast-address      10.0.0.255;
option routers                10.0.0.254;
 
ddns-update-style ad-hoc;
allow booting;
allow bootp;
 
subnet 10.0.0.0 netmask 255.255.255.0 {
        range 10.0.0.2 10.0.0.253;
        }
 
next-server 10.0.0.1;
filename        "pxelinux.0";

TFTP Server

The tftp server used in serving the linux kernel to diskless machines was installed from RPM packages provided by the SL4 distribution and runs through xinetd. It is activated by setting the variable "disabled=no" in the xinetd configuration file /etc/xinetd.d/tftp.

NFS Server

The root file system for diskless clients must be exported with the option "no_root_squash."

Root File System

  1. Installed a stock Scientific Linux 4.3 system onto a hard disk.
  2. Copied entire disk structure to a directory on the fileserver.
  3. Added Stateless Linux packages to the new filesystem.
  4. Changed configuration file /mnt/nodefs/rootfs-SL4-x86_64/etc/sysconfig/readonly-root to include "READONLY=yes".

Torque Batch Queue System

  • Downloaded Torque V.2.1.6 source from distributor.
  • Configured with --enable-syslog and --with-scp
  • Compiled source and created binary RPM packages with "make rpm".
  • Added line "mount_files /var/spool/torque" to /etc/rc.readonly on diskless node filesystem to support pbs_mom with readonly root.
  • Created pbs_mom configuration file:
$pbserver      10.0.0.254
$restricted    10.0.0.254
$logevent      255
  • Added Torque server name to /var/spool/torque/server_name.
  • Created secondary ssh server using host-based authentication for private network. Configuration:
ListenAddress 10.0.0.254
HostbasedAuthentication yes
IgnoreUserKnownHosts yes
IgnoreRhosts yes
  • Added host keys for all nodes to ssh_known_hosts using update_ssh_known_hosts script.
  • Modified ssh_config on cluster nodes for host-based authentication. Configuration:
EnableSSHKeysign yes
HostbasedAuthentication yes
  • Added RSA host key for babar3, babar3.phys.utk.edu, and 10.0.0.254 to ssh_known_hosts on cluster nodes.
Personal tools