Synchronizing Files in AFS

The Problem

You have a long-running process on Machine A (e.g. a compute machine which you can’t log into) that is continually appending data to a file in AFS. You’d like to look at the data on Machine B (e.g a front-end machine) while the program is running (to see if things are running OK or just for curiosity). Unfortunately, though you can open the file in AFS on machine B, it doesn’t have any data in it.

This “problem” is caused by the AFS cache manager. AFS’s caching system allows programs running on the same host to do most AFS file I/O to and from a cache located in the memory or disk of that host. This greatly speeds operation compared to non-cached network file systems like NFS, especially for files used read/write when the network is slow or congested. Unfortunately, the AFS cache manager daemons don’t store files to the server until they are closed or fsync’d. That means data written to a file by a program can only bee “seen” by other programs on the same machine until the file is closed. Programs on other machines only “see” data that was in the file when it was last closed/created (hence new files appear empty).

The Solution – fsync()

If you have access to your source code, the easiest solution is to periodically close() or fsync() the file so that the AFS cache manager will flush the file to the server. Unfortunately, many users don’t have access to their source code and/or understanding of the low-level Unix file routines.

The fsync Program

The fsync program is compiled and has been installed as a part of the UGE (batch queueing module). Fsync synchronizes the given filename back to the AFS server every 300 seconds. The command below should be started in the background (via the & notation) from you batch script:

fsync /path/to/file &

You can also use the $SGE_STDOUT_PATH variable in your script if you wish. The $SGE_STDOUT_PATH is the pathname of the file to which the standard output stream of the job is diverted. A sample script using this is given below.

#$ -q long
#$ -pe smp 1
#$ -N output

module load matlab

matlab -nodisplay -nosplash < matlab_file.m