From HEP at Tennessee
Contents |
[edit]
Linux Kernel with Lustre Support
The Linux kernel was compiled from the Lustre kernel source package with the following changes.
- Added CONFIG_ROOT_NFS
- Must enable "Loadable Module Support" sub-options for a clean compile.
- Fixed a bug in the kernel source module nfsroot.c: added "Opt_acl, Opt_noacl," to line 127.
- Full config file
[edit]
PXE Config for Diskless Booting
LABEL linux
KERNEL KERNEL
APPEND noapic root=/dev/nfs nfsroot=10.0.0.1:/mnt/nodefs/rootfs-SL4-x86_64 nomce ro
ipappend 1
Notes:
- noapic: Avoid qc_timeout errors when the system is initializing the SATA hard drives.
- nomce: Prevent spurious Machine Check Exceptions due to Opteron 265 processors.
- ro: The diskless systems mount the root filesystem read-only (stateless).
- ipappend: Kernel does not need DHCP IP autoconfiguration enabled.
[edit]
DHCPD Configuration
DHCPD server for network booting was compiled from source (dhcp-latest.tar.gz) and uses the following configuration:
default-lease-time 86400;
max-lease-time 86400;
option subnet-mask 255.255.255.0;
option broadcast-address 10.0.0.255;
option routers 10.0.0.254;
ddns-update-style ad-hoc;
allow booting;
allow bootp;
subnet 10.0.0.0 netmask 255.255.255.0 {
range 10.0.0.2 10.0.0.253;
}
next-server 10.0.0.1;
filename "pxelinux.0";
[edit]
TFTP Server
The tftp server used in serving the linux kernel to diskless machines was installed from RPM packages provided by the SL4 distribution and runs through xinetd. It is activated by setting the variable "disabled=no" in the xinetd configuration file /etc/xinetd.d/tftp.
[edit]
NFS Server
The root file system for diskless clients must be exported with the option "no_root_squash."
[edit]
Root File System
- Installed a stock Scientific Linux 4.3 system onto a hard disk.
- Copied entire disk structure to a directory on the fileserver.
- Added Stateless Linux packages to the new filesystem.
- Changed configuration file /mnt/nodefs/rootfs-SL4-x86_64/etc/sysconfig/readonly-root to include "READONLY=yes".
[edit]
Torque Batch Queue System
- Downloaded Torque V.2.1.6 source from distributor.
- Configured with --enable-syslog and --with-scp
- Compiled source and created binary RPM packages with "make rpm".
- Added line "mount_files /var/spool/torque" to /etc/rc.readonly on diskless node filesystem to support pbs_mom with readonly root.
- Created pbs_mom configuration file:
$pbserver 10.0.0.254 $restricted 10.0.0.254 $logevent 255
- Added Torque server name to /var/spool/torque/server_name.
- Created secondary ssh server using host-based authentication for private network. Configuration:
ListenAddress 10.0.0.254 HostbasedAuthentication yes IgnoreUserKnownHosts yes IgnoreRhosts yes
- Added host keys for all nodes to ssh_known_hosts using update_ssh_known_hosts script.
- Modified ssh_config on cluster nodes for host-based authentication. Configuration:
EnableSSHKeysign yes HostbasedAuthentication yes
- Added RSA host key for babar3, babar3.phys.utk.edu, and 10.0.0.254 to ssh_known_hosts on cluster nodes.