How can you trust a disk to write data?

It’s generally true that the closer you get to the hardware, the further you get from reality. Not only can you see that in M1 cores and their performance, but it’s a good rule when it comes to storage.

As soon as I got my first M1 Macs I ran readily available disk benchmark tests on them, to compare the speeds of their internal SSD with that in my iMac Pro. Some of those results suggested ridiculously high transfer rates of nearly 20 GB/s which appeared to be the result of caching. In other words, the test was being tricked by the SSD to measure not the rate at which it could write data to the disk, but just to transfer the data to its high-speed cache memory, with the write itself occurring some time later, more slowly.

After a lot of experimentation and testing, I came up with my own benchmarking app, Stibium, which has since given reasonably reliable results on a wide range of SSDs.

The problem of caching and disk performance has recently reappeared on M1 Macs, this time in the development of Asahi Linux. Because this runs native on M1 models and not in a virtual machine, it has to do much of its own driving, for which Hector Martin and his team are delving deep into Apple’s hardware. This led to what appeared to be concerns about trade-offs made between performance and data integrity, but in the end may come back to that old issue of caches/buffers and their flushing, and disks which don’t exactly tell the truth.

Flushing writes to disk

macOS has two low-level ways of flushing disk writes: fsync() and the F_FULLFSYNC command in fcntl(). The great majority of apps don’t go anywhere near that level, though, and leave it up to macOS to decide what to do. Choice is important, as they work very differently. fsync() causes all modified data and metadata of a file to be moved to the disk, and it flushes that from the computer. However, the disk itself may leave that data in its own buffers for some time before writing it to storage in slower time. F_FULLFSYNC does the same thing, but then asks the disk to flush all its own buffers and write the data to its storage.

When a disk does what the command asks for, F_FULLFSYNC is usually much slower than fsync(), as you would expect. If you were to believe that fsync() worked the same as F_FULLFSYNC, you could be badly misled on the true performance of a disk. Additionally, if a disk were to ignore the request to flush its buffers on F_FULLFSYNC, it would appear significantly faster than another disk which complied. Intriguingly, Apple’s man page for the latter command states that “certain FireWire drives have also been known to ignore the request to flush their buffered data”.

File system reliability

This isn’t just about performance and marketing of disks, though, as flushing behaviour is of crucial importance to file systems, databases, and reliability, as examined in this paper.

A simple example is what happens if a disk suddenly loses power. If changes to the file system have been flushed to the disk using fsync(), the disk may well still have those in its buffer. APFS uses copy-on-write to ensure that sudden disaster shouldn’t cause a problem in its file system data, by writing the new data before deleting the old. As disk buffers can be emptied out of order, it’s possible that at the moment of power loss the deletion has been performed before the new data has been written from the buffer, or that later new data has been written out of order. As Apple’s documentation for fsync() states: “This is not a theoretical edge case. This scenario is easily reproduced with real world workloads and drive power failures.”

That’s why macOS has F_FULLFSYNC, and why it’s implemented on the major file systems it supports, HFS+, FAT, UDF and APFS, according to its man page.

Backups

There’s another important piece of information which has immediate relevance to Mac users. If you turn to Apple’s old Time Machine over SMB Specification, you’ll see a whole section about F_FULLFSYNC, which states that “this command is issued by the Time Machine client at periodic checkpoints to ensure that data corruption does not occur in the backup.” Presumably that’s also true when Time Machine makes backups to local storage. If so it reassures us that it goes out of its way to prevent corrupted backups, and, given the performance hit of F_FULLFSYNC even on fast local storage, explains why making backups can be slower than you’d wish.

Conclusions

Caching behaviour of disks can be obvious when you’re trying to measure their performance, and can result in deceptively high transfer rates. It becomes most important when you need reliability, so macOS ensures that writes to disk are properly flushed when needed in APFS, HFS+, FAT and UDF. They also occur periodically when Time Machine is writing its backups, to protect them from corruption.

I’m very grateful to @rosyna, @janl and @evntdrvn for providing all the clues; however, any errors are mine alone.