From HEP at Tennessee
We are planning upon using Lustre as the clustering file system for our cluster.
Version: 1.4.8
See the Lustre Step-by-Step Guide for information on how to implement lustre.
Fixed problem with the non-responsive startup on the test machine. Apparently, the hostname of the computer was mapped to 127.0.0.1, which would not work for a NID of a Lustre system. I changed the hosts file to point to the proper IP address, and it worked fine.
Current issues to be resolved:
- Test diskless boot of Lustre-enabled kernel gives various error messages and does not successfully start.
- Change configuration file to use lnet as opposed to direct ethernet
Contents |
Building Lustre Modules
Here are the steps taken so far:
- Built kernel from the Lustre 1.4.8 Source Package using the most recent working Config File.
- To have a clean build, you need to patch
nfsroot.cin the<kernel source directory>/fs/nfsdirectory with this file.
- To have a clean build, you need to patch
- Ran
./configure --with-linux=<path to built kernel tree>in an extracted 1.4.8 Source directory. - Found script to generate kernel.h file (here it is) Make kernel.h script
- Resolved the following error: configure-error.log is the present error log.
Current Lustre Configuration Script 1-16-07 - 1:32 PM
#!/bin/bash rm node-test.xml #Create the nodes lmc -m node-test.xml --add node --node babar8 lmc -m node-test.xml --add net --node babar8 --nid 10.0.0.1@tcp0 --nettype lnet lmc -m node-test.xml --add node --node client --nid '*' --nettype lnet #Configure MDS lmc -m node-test.xml --format --add mds --node babar8 --mds mds-test --fstype ldiskfs --dev /tmp/mds-test --size 50000 #Configure OSTs lmc -m node-test.xml --add lov --lov lov-test --mds mds-test --stripe_sz 1048576 --stripe_cnt 0 --stripe_pattern 0 lmc -m node-test.xml --add ost --node babar8 --lov lov-test --ost ost1-test --fstype ldiskfs --dev /tmp/ost1-test --size 100000 #Configure client lmc -m node-test.xml --add mtpt --node client --path /mnt/lustre-node-test --mds mds-test --lov lov-test
Linux Kernel with Lustre Support
The Linux kernel was compiled from the Lustre kernel source package with the following changes.
- Added CONFIG_ROOT_NFS (must first enable kernel-level ip-autoconfiguration under the Network options, as well as enabling kernel-level DHCP and BOOTP support.
- Must enable "Loadable Module Support" sub-options for a clean compile.
- Fixed a bug in the kernel source module nfsroot.c: added "Opt_acl, Opt_noacl," to line 127.
- Full Config File
Note: make gconfig is buggy; it doesn't properly load the .conf file. Do a make mrproper, copy the config file to <kernel source directory>.config, and then do a make oldconfig (you might have to specify a few options when you do the make oldconfig). You may then run make as usual.
Lustre Working On Test Setup!
Here is the script that generates the cluster-production.xml file that is working presently: Configuration Script
- Created partition sda1 on the test compute node
- Ran
lconf --reformat --node babar8 cluster-production.xmlon babar8. - Ran
lconf --reformat --node node001 cluster-production.xmlon the test node. - Ran
mount -t lustre 10.0.0.1:/cms-mds/client /var/writable/lustreon both of the computers. - Ran
touch /var/writable/lustre/test.txton babar8 - Checked to make sure that the resulting file was readable on both computers; it was.
Things left to do
- Script everything in such a way that everything is more customized for our cluster
- Try on entire hard drive instead of just a partition
- Benchmark