Lustre


From HEP at Tennessee

Jump to: navigation, search

We are planning upon using Lustre as the clustering file system for our cluster.

Version: 1.4.8

See the Lustre Step-by-Step Guide for information on how to implement lustre.

Fixed problem with the non-responsive startup on the test machine. Apparently, the hostname of the computer was mapped to 127.0.0.1, which would not work for a NID of a Lustre system. I changed the hosts file to point to the proper IP address, and it worked fine.

Current issues to be resolved:

  • Test diskless boot of Lustre-enabled kernel gives various error messages and does not successfully start.
  • Change configuration file to use lnet as opposed to direct ethernet


Contents

Building Lustre Modules

Here are the steps taken so far:

  • Built kernel from the Lustre 1.4.8 Source Package using the most recent working Config File.
    • To have a clean build, you need to patch nfsroot.c in the <kernel source directory>/fs/nfs directory with this file.
  • Ran ./configure --with-linux=<path to built kernel tree> in an extracted 1.4.8 Source directory.
  • Found script to generate kernel.h file (here it is) Make kernel.h script

Current Lustre Configuration Script 1-16-07 - 1:32 PM

#!/bin/bash

rm node-test.xml

#Create the nodes
lmc -m node-test.xml --add node --node babar8
lmc -m node-test.xml --add net --node babar8 --nid 10.0.0.1@tcp0 --nettype lnet
lmc -m node-test.xml --add node --node client --nid '*' --nettype lnet

#Configure MDS

lmc -m node-test.xml --format --add mds --node babar8 --mds mds-test --fstype ldiskfs --dev /tmp/mds-test --size 50000

#Configure OSTs

lmc -m node-test.xml --add lov --lov lov-test --mds mds-test --stripe_sz 1048576 --stripe_cnt 0 --stripe_pattern 0

lmc -m node-test.xml --add ost --node babar8 --lov lov-test --ost ost1-test --fstype ldiskfs --dev /tmp/ost1-test --size 100000

#Configure client

lmc -m node-test.xml --add mtpt --node client --path /mnt/lustre-node-test --mds mds-test --lov lov-test

Linux Kernel with Lustre Support

The Linux kernel was compiled from the Lustre kernel source package with the following changes.

  • Added CONFIG_ROOT_NFS (must first enable kernel-level ip-autoconfiguration under the Network options, as well as enabling kernel-level DHCP and BOOTP support.
  • Must enable "Loadable Module Support" sub-options for a clean compile.
  • Fixed a bug in the kernel source module nfsroot.c: added "Opt_acl, Opt_noacl," to line 127.
  • Full Config File

Note: make gconfig is buggy; it doesn't properly load the .conf file. Do a make mrproper, copy the config file to <kernel source directory>.config, and then do a make oldconfig (you might have to specify a few options when you do the make oldconfig). You may then run make as usual.


Lustre Working On Test Setup!

Here is the script that generates the cluster-production.xml file that is working presently: Configuration Script

  • Created partition sda1 on the test compute node
  • Ran lconf --reformat --node babar8 cluster-production.xml on babar8.
  • Ran lconf --reformat --node node001 cluster-production.xml on the test node.
  • Ran mount -t lustre 10.0.0.1:/cms-mds/client /var/writable/lustre on both of the computers.
  • Ran touch /var/writable/lustre/test.txt on babar8
  • Checked to make sure that the resulting file was readable on both computers; it was.

Things left to do

  • Script everything in such a way that everything is more customized for our cluster
  • Try on entire hard drive instead of just a partition
  • Benchmark
Personal tools