What's wrong with synchronous metadata updates

What's synchronous metadata update?

It is a term from filesystem design. The file system resides partly in volatile buffers and partly in persistent storage like disks. The file system could write everything to disk as soon as the write happens. However, such a completely synchronous update results in a very slow file system.

Therefore, most file systems don't update everything synchronously. In particular, the Berkeley FFS (also known as UFS) by default waits for some time before updating the data, but writes metadata (i-nodes, directories etc.) synchronously. This policy is known as synchronous metadata updates.

What's wrong with them?

Many people (especially BSD users) mistakenly believe file systems with that synchronous metadata updates are safer than systems that don't write their metadata synchronously (e.g., Linux' ext2fs by default). Here's an anecdote that demonstrates the problem:

I was working on an Alpha under Digital OSF/1, with the filesystem being UFS (alas!). I had written one hour on a new file in Emacs, and just had done a save-buffer (C-x C-s), when the machine crashed. When it came up again after the file system check, my work was gone. The file I was working on, the autosave file, the backup file, all of them either did not exist or were empty.

What had happened? That can be explained very nicely with synchronous metadata updates: When I did the save-buffer, the file I was working on was written to the file system; the metadata was written to the disk, the data was kept in memory and was lost, and therefore the file system check made this file empty. The autosave file was deleted; the data was still on disk, but without associated metadata. The backup file had never been written, because this was a newly created file and the first save.

But isn't asynchronous metadata updates less safe against corrupted file systems?

I don't see why it should be. Any file system will try to keep the state on disk consistent for most of the time. During metadata updates, an inconsistent state can arise, and if a crash happens at that time, we have to hope that the file system check does a good job. Note that this inconsistency on disk does not depend on whether the metadata is written synchronously or asynchronously. It can arise for both systems.

On the practical side, due to hardware problems, I have had about 50 crashes under Linux some time ago. I have never lost more than about a half-minute of work I had done (and several minutes of waiting until the system came up again).

(Several years later:) In the mean-time I have learned that FFS writes meta-data in such an order that the damage is limited (IIRC to one wrong block). In contrast, ext2 relies on more sophistication in fsck. In any case, synchronous updates are not the only way to ensure that writes are ordered in a specific way, as demonstrated by soft updates (see below), which don't even require fsck upon crashing.

What would be the correct way?

You better read some papers on operating systems. There are several approaches:
Log-Structured file systems
write everything to a log and write checkpoints when the filesystem is consistent. When recovering from a crash everything written after the last checkpoint is discarded.
Journaling file systems
write the intended changes to a log to make them persistent, later to the homne locations. When recovering from a crash, the log is replayed up to the last checkpoint. Note that while it is possible to do the right thing in a JFS, many JFSs (e.g., IBM's JFS) just ensure metadata consistency and suffer from all the drawbacks discussed above.
Soft updates
allocate the blocks for the new data first, then write the data, then the metadata, and then free the old data. One would have to build a dependence graph of blocks to be written, and write them out according to a topological ordering. Note that, Soft Updates usually don't perform all changes in the same order as the application does them, and has not commiting checkpoints, so the consistency guarantees are not as nice as with good log-structured or journaling file systems.

Bibliography, mainly about Distributed and Log-structured File Systems | In-order semantics (a desirable file system feature wrt data consistency) | Ideas for a log-structured file system


Anton Ertl