• National Institute for Computational Sciences is a UT/ORNL Partnership

Scratch Space

The Lustre file system is available as scratch space, available at /lustre/scratch/<user-name>. Lustre is a highly-scalable cluster file system. Storage of a given file is distributed (or, striped) across several hardware locations. This allows larger files than could be stored on any one location, also allowing for much faster transfer speeds if access to the file is parallelized.

Lustre is the only file system available to the compute nodes. Input, and output files must use this area, as well as the current directory at the time aprun is called. Executables, as well as file redirects to and from aprun may be on a home directory because aprun itself runs on a service node. If you receive an error such as no such file or directory, look for where your program is trying to access something on your home directory. Also, do not create files directly in /tmp. This is a small, memory-resident file system, and when /tmp fills up, system problems result.

Lustre File System Purge Policy

When Lustre begins to fill, users will be contacted by User Support and asked to clean up as much space as possible by archiving or deleting. At 90% it may be necessary to delete files, regardless of age, without prior notification. Research groups with the largest file usage may not be allowed to submit new jobs. The hold will be removed after a sufficient amount of space has been freed.

Files older than 90 days are routinely purged.

Lustre Structure

It may be helpful to know the basic layout of Lustre to understand how to use it best or what issues may come up. This is the "bottom-up" view, realize that when accessing files, the system follows a "top-down" path.

Files are generally striped on several Object Storage Targets, or OSTs to enable truly parallel access to files, and to allow files larger than any one OST. An OST may be thought of as a "virtual disk", though it often consists of several physical disks, in a RAID configuration for instance.

Object Storage Servers, or OSS's, are servers which control access to a small set of OST's, and contain some metadata on the files stored on their OSTs. These are often the bottleneck on Kraken. Finally, on Kraken, Lustre consists of a single Meta Data Server, or MDS (other installations may have more than one). The MDS is the first place to go when accessing a file, but has only basic metadata: filename and location.

Lustre Use

Due to the superior I/O speeds, Lustre is the only space accessible from Compute Nodes, and is recommended when transferring large files to/from HPSS. However, remember that this is a scratch directory for temporary files: Lustre is not backed up or guaranteed. If you care about your data, archive it, or transfer it to another computer.

Our Lustre file system deals with files at a different scale from any monolithic file system, and has some limitations standard file systems lack, therefore it is best used somewhat differently than you would use a laptop or network file system:

  • Most users alias ls to return more information than the standard, for example, using different colors for different file types. This additional information requires ls to query the OSSs. Depending on Kraken's usage at that moment, and how the files in that directory are distributed among the OSSs, there is a good chance that one of the OSSs are busy, which causes ls to hang. Instead, you could use /bin/ls to circumvent the alias. This command only has to query the MDS, and generally returns very quickly.
  • Similarly, it is usually more efficient to use the Lustre tool, lfs find rather than the GNU find when searching for files on Lustre.
  • Several other GNU commands, such as tar and rm are inefficient when operating on a large class of files on Lustre. For example, with millions of files, rm -rf * may take days, and have a considerable impact on Lustre for other users. A better way to do this is to generate a list of files to be removed or tar-ed, and to act them one at a time, or in small sets. For example, you can use the following script to remove files on Lustre when an normal rm would be inadequate. Warning: this script will remove files indiscriminately, as with rm -rf. Use with caution.

    The lustre-mass-delete command is a script that deletes files recursively 100 files at a time such that there is no heavy load on the system.

    /usr/local/bin/lustre-mass-delete

    For example, if I am already in /lustre/scratch/djohn/stuff and I want to delete /lustre/scratch/djohn/stuff/directory1 I can use:

    lustre-mass-delete directory1
    Deletes the directory called directory1 in the directory in your relative path.
    lustre-mass-delete directory1 directory2 directory3
    Deletes the directories directory1, directory2, and directory3 in your relative path. The use of this script also allows you to use absolute paths. For example
    lustre-mass-delete /lustre/scratch/djohn/stuff

    Another method, which allows you to review files before they are deleted is the following:
    lfs find <dir> -t f > rmlist.txt
    --view list--
    sed -e 's:^:/bin/rm :' rmlist.sh
    sh rmlist.sh
    # the directory structure will remain, but unless there are very many, 
    # directories, we can simply delete it:
    rm -rf <dir>
  • The default stripe count is currently 4, which means that each file is stored on 4 OSTs. In many cases, you will want to change this, for example, if your I/O is 'file-per-process', the best stripe count is likely 1. For more details about how to set stripe counts, and optimize I/O, please see I/O Tips.
More information about the various aspects of using the Lustre File system is described in running jobs, in the FAQ, and in I/O Tips.