File system crash consistency and performance
Crash consistency
POSIX guarantees a certain amount of consistency of file operations
during normal operation: E.g., consider an editor editing file F;
after a while, the user saves the file, and the editor implements this
request by writing the new contents to a new file, say #F#, then
renaming #F# into F.  Another process opening F will either get the
old file, or the new file.
POSIX does not make any useful guarantees in case of a crash (i.e.,
OS crash, power outage or the like), but what would we like file
systems to do?  A naive approach would be to perform all writes to
persistent memory synchronously (i.e., wait for them to complete), and
perform operations such as renames in a way that we see no
inconsistent state.  In case of a crash we would see the same file
system state that was current (and visible to other processes) right
before the crash.
While this approach may perform ok for editing, workloads that write a
  lot of data (e.g., untarring a file) would be slowed down a lot by
  this approach.  So file systems usually compromise on consistency in
  favour of efficiency.  How should they do that?  There are at least
  two positions:
  
    - Implementation-oriented
- Following this position, the application should tell the file
      system through the fsync() system call when it wants a specific
      file to be persistent.  This position is taken by, e.g., Ted
      Ts'o, implementor of the ext4 file system.  In our example, this
      would mean an fsync() of #F# right after writing its contents,
      maybe an fsync of the directory containing #F# after that, and
      an fsync of the directory containing F after the rename.  If the
      application does enough fsyncs (and it's not obvious how much is
      enough; I am not aware of a guide for achieving consistency
      using fsync), the data will be consistent, if not, there may be
      data loss and the file system developer will blame the
      application and deny any responsibility.
- Semantics-oriented
- The file system guarantees that the persistent state
      represents one of the states visible in normal operation before
      the crash; i.e., the user may lose some updates that other
      processes could see before the crash, but if the application
      esured consistency in normal operation, the persistent state
      will be a consistent state.  In our example F would contain
      either the old or the new state, no fsyncs required.  However,
      you probably want to sync before reporting the completion of a
      transaction (say, confirm a purchase) to a remote user.  My
      position is that file systems should give this guarantee.
      Unfortunately, last I looked among the Linux file systems, only
      NILFS2 gave this consistency guarantee.  The advocates of the
      implementation-oriented approach denigrate this
      as O_PONIES,
      because they want users to think that this is unrealistic.
One advantage of the semantics-oriented approach is that an
  application that is debugged for consistency in case of ordinary
  operation is automatically consistent (albeit not necessarily
  up-to-date) in case of a crash.
Performance
A frequent argument for the implementation-oriented approach is that
it provides better performance.  It may appear that way if you
benchmark an application with a given number of fsyncs, but that
results in different crash consistency guarantees.
In particular, if the application does not perform any fsyncs, the
  implementation-oriented file system does not guarantee any crash
  consitency, while the semantics-oriented file system does perform as
  consistently as ever.
If the application performs the fsyncs necessary for consistency on
  the implementation-oriented file system, the semantics-oriented file
  system guarantees that the persistent state represents the logical
  state of the file system at some point in time at or after the
  fsync, while the implementation-oriented file system does not give
  such a guarantee.
But what if we compare variants of the application tuned for the
  requirements of the specific file system?  I.e., no fsyncs or syncs
  for a semantics-oriented file system if we can live with losing the
  progress of a certain number of seconds (e.g., when using an editor
  with an autosave file, getting the old file and the autosave file
  may be almost as good as getting an up-to-date file), but the full
  complement of fsyncs for an implementation-oriented file system (an
  application could have a flag for enabling or disabling these
  fsyncs).
  - Implementation-oriented file system
- The application is going to perform a lot of fsyncs, at least
    as many as necessary for the consistency level that some user of
    the application may require, so possibly many more than needed
    for the user at hand.  And every fsync is going to wait until
    the data hits persistent memory and reports back success.  This
    may be quite slow.
- Semantics-oriented file systems
- For now, let's assume that the application does not need
    synchronicity, so it does not perform fsyncs.  In that case, the
    file system can perform as few synchronous block device flushes
    (or, theoretically,
    asynchronous barriers) as desired to reduce the maximum time
    lost (e.g., one synchronous flush every 5s to guarantee that not
    more than 10s of work are lost): First request writes of nearly
    all the data and metadata to free blocks, then flush, then
    request a write of the commit block makes that data and metadata
    visible to the file system; the commit block will become
    persistent with the next flush at the latest, and make all the
    data persistent to the file system.  This assumes that all
    writes (except maybe the commit block) go to free blocks, as
    happens with a copy-on-write file system, a log-structured file
    system, or a sufficiently journaling file system; there is some
    cost associated with this, but I expect that it is relatively
    moderate.
    Even if you need to sync (e.g., before you confirm a purchase
    to a remote customer), this will be much rarer and cost less than
    the many fsyncs needed for satisfying implementation-oriented file
    systems.  And it's clear when it is necessary to perform these
    syncs, because they come from application needs.
   
However, unlike running a given benchmark, my expectations are hard to
confirm or disprove, even for specific cases, much more so in the
general case: You never know if an application has enough fsyncs for
implementation-oriented file systems (it also depends on the usage
case of the application), unless you throw in an fsync after every
change to a file or directory; and in the latter case the performance
is going to suffer a lot and the advocates of
implementation-oriented file systems are going to complain that you
used too many fsyncs.Given these issues with implementation-oriented file systems, do
  you really want to use one if you care for crash consistency?
In any case, keep in mind that running a given benchmark on the
  two kinds of file systems usually does not produce crash consistency
  results and therefore the performance numbers may be misleading.
Anton Ertl