From HEP at Tennessee
To Do
- move incr. backups to non-redundant storage
- need to activate ports OSE250165 and OSE250166 (Gb)
- use stateless linux packages to adapt rootfs for cluster
MCE error on CMS machines
Getting error at random intervals like:
CPU 0: Machine Check Exception: 4 Bank 4: b20000000000070F0F Kernel panic - not syncing: CPU context corrupt
Possibly caused by the kernel over-reacting to a certain condition with our hardware (not actually a fatal error). By disabling MCE's with kernel flag "nomce" or disabling when compiling the kernel we have fixed the problem.
hardlink
Traverses one or more directories searching for duplicate files. When it finds duplicate files, it uses one of them as the master. It then removes all other duplicates and places a hardlink for each one pointing to the master file. This allows for conservation of disk space where multiple directories on a single filesystem contain many duplicate files.
PBS (torque) problem
- "Post job file processing error" in server_log
- *.OU and *.ER don't get sent back to the queue server
- The problem is that torque is using the autofs mountpoint as the output path (not '/mnt/SP/').
NPTL
In Linux kernel version 2.5 and later a new threading model was introduced called the Native POSIX Threading Library (NPTL). This library was backported by Redhat for use in the Linux kernel shipped with their distribution. This causes a problem when trying to run a Redhat system using a vanilla kernel (from kernel.org) since the patch that they use to include NPTL support is nowhere to be found. This problem occurs in the form of a segmentation fault whenever the 'clone' method is called by a process (such as ypbind, dig, host, etc). One solution is to disable NPTL by 'hiding' the libraries from the system. They are located at '/lib/tls'. Just move this directory somewhere safe and NPTL will be disabled.