hoakley February 18, 2022 Macs, Technology

How can you trust a disk to write data?

It’s generally true that the closer you get to the hardware, the further you get from reality. Not only can you see that in M1 cores and their performance, but it’s a good rule when it comes to storage.

As soon as I got my first M1 Macs I ran readily available disk benchmark tests on them, to compare the speeds of their internal SSD with that in my iMac Pro. Some of those results suggested ridiculously high transfer rates of nearly 20 GB/s which appeared to be the result of caching. In other words, the test was being tricked by the SSD to measure not the rate at which it could write data to the disk, but just to transfer the data to its high-speed cache memory, with the write itself occurring some time later, more slowly.

After a lot of experimentation and testing, I came up with my own benchmarking app, Stibium, which has since given reasonably reliable results on a wide range of SSDs.

The problem of caching and disk performance has recently reappeared on M1 Macs, this time in the development of Asahi Linux. Because this runs native on M1 models and not in a virtual machine, it has to do much of its own driving, for which Hector Martin and his team are delving deep into Apple’s hardware. This led to what appeared to be concerns about trade-offs made between performance and data integrity, but in the end may come back to that old issue of caches/buffers and their flushing, and disks which don’t exactly tell the truth.

Flushing writes to disk

macOS has two low-level ways of flushing disk writes: fsync() and the F_FULLFSYNC command in fcntl(). The great majority of apps don’t go anywhere near that level, though, and leave it up to macOS to decide what to do. Choice is important, as they work very differently. fsync() causes all modified data and metadata of a file to be moved to the disk, and it flushes that from the computer. However, the disk itself may leave that data in its own buffers for some time before writing it to storage in slower time. F_FULLFSYNC does the same thing, but then asks the disk to flush all its own buffers and write the data to its storage.

When a disk does what the command asks for, F_FULLFSYNC is usually much slower than fsync(), as you would expect. If you were to believe that fsync() worked the same as F_FULLFSYNC, you could be badly misled on the true performance of a disk. Additionally, if a disk were to ignore the request to flush its buffers on F_FULLFSYNC, it would appear significantly faster than another disk which complied. Intriguingly, Apple’s man page for the latter command states that “certain FireWire drives have also been known to ignore the request to flush their buffered data”.

File system reliability

This isn’t just about performance and marketing of disks, though, as flushing behaviour is of crucial importance to file systems, databases, and reliability, as examined in this paper.

A simple example is what happens if a disk suddenly loses power. If changes to the file system have been flushed to the disk using fsync(), the disk may well still have those in its buffer. APFS uses copy-on-write to ensure that sudden disaster shouldn’t cause a problem in its file system data, by writing the new data before deleting the old. As disk buffers can be emptied out of order, it’s possible that at the moment of power loss the deletion has been performed before the new data has been written from the buffer, or that later new data has been written out of order. As Apple’s documentation for fsync() states: “This is not a theoretical edge case. This scenario is easily reproduced with real world workloads and drive power failures.”

That’s why macOS has F_FULLFSYNC, and why it’s implemented on the major file systems it supports, HFS+, FAT, UDF and APFS, according to its man page.

Backups

There’s another important piece of information which has immediate relevance to Mac users. If you turn to Apple’s old Time Machine over SMB Specification, you’ll see a whole section about F_FULLFSYNC, which states that “this command is issued by the Time Machine client at periodic checkpoints to ensure that data corruption does not occur in the backup.” Presumably that’s also true when Time Machine makes backups to local storage. If so it reassures us that it goes out of its way to prevent corrupted backups, and, given the performance hit of F_FULLFSYNC even on fast local storage, explains why making backups can be slower than you’d wish.

Conclusions

Caching behaviour of disks can be obvious when you’re trying to measure their performance, and can result in deceptively high transfer rates. It becomes most important when you need reliability, so macOS ensures that writes to disk are properly flushed when needed in APFS, HFS+, FAT and UDF. They also occur periodically when Time Machine is writing its backups, to protect them from corruption.

I’m very grateful to @rosyna, @janl and @evntdrvn for providing all the clues; however, any errors are mine alone.

12Comments

Add yours

1

JW on February 18, 2022 at 2:08 pm

Back in the days we had SpeedDoubler; that uttil gave us way better indication about Copy-jobs stats, and had also verifying option. i’ve also liked QuickBench very much, but both are discontinued Dev.

LikeLiked by 1 person
2

Duncan on February 18, 2022 at 3:13 pm

Howard, I know this is a clumsy quirk of trying to match the English language to ever-evolving technology, but I flinch whenever I see the word ‘disk’ applied to SSDs. (And I sometimes do this myself, although I have been conscientiously trying to catch myself in an effort be more consistent.) Do you think the word ‘disk’ is so entrenched – like the floppy icon for ‘Save…’ in Windows – that we’re stuck with it, or can we eventually leave that anachronism behind?

Sorry to be off-topic but there is a real difference between a spinning hard disk and a solid-state drive, which sometimes impacts the nature of the discussion itself. (I suppose that when the last spinning disk is taken out of service this distinction will become moot and we can call SSDs whatever we want.)

Thank you.

LikeLiked by 1 person
- 3
  
  hoakley on February 18, 2022 at 5:52 pm
  
  Thank you. There is extensive discussion of the use of disk and drive in style guides such as Apple’s (which is well worth reading). It is worth considering carefully that you format and maintain your SSDs in Disk Utility, where they are shown and referred to as disks.
  One formerly useful distinction was that disk referred to the storage medium (even though it didn’t have to consist of any sort of disk or disc), and that drive referred to the complete object with its housing etc. So you could put a disk into a drive, which is exactly what we did with optical media, only then to be more perverse we spelt those as discs!
  The problem becomes more acute when we refer to storage which could be a hard disk or an SSD – such as the boot disk. Apple is quite happy with the use of the term disk as inclusive of all types of storage; I am moving steadily towards using ‘storage’, but there are so many idiomatic uses of disk that just can’t take that.
  Meanwhile, I’ll just use the Startup Storage pane to set the boot storage of my Mac to the storage I have just formatted in Storage Utility!
  Howard.
  
  LikeLike
  - 4
    
    Duncan on February 18, 2022 at 7:21 pm
    
    “One formerly useful distinction was that disk referred to the storage medium (even though it didn’t have to consist of any sort of disk or disc), and that drive referred to the complete object with its housing etc.”
    
    And the word ‘drive’ itself is of course a holdover from the electric motor that spun the disk.
    
    “Meanwhile, I’ll just use the Startup Storage pane to set the boot storage of my Mac to the storage I have just formatted in Storage Utility!”
    
    Sigh – we can’t win here. The word ‘storage’ just doesn’t have the one-syllable snap to it as ‘disk’ or ‘drive’, and ‘SSD’ is even harder to pronounce (but easier to type)*. Perhaps, thirty years hence, we can use the word ‘store’ as the unifying short-hand for all this, similar to how ‘app’ has become mainstream.
    
    * It has been remarked more than once that ‘WWW’, as an acronym, is far more cumbersome to pronounce than the words it stands for.
    
    And I remember, back in the early days of the web, one radio announcer (presumably only familiar with Windows) trying to read the then-novel URL for Hewlett Packard as, “H-T-T-P-colon-backslash-backslash-H-P-period-C-O-M”. Fingernails on chalkboard.
    
    LikeLiked by 1 person
    - 5
      
      Peter on February 22, 2022 at 12:16 pm
      
      Hi Duncan,
      
      I agree with your points, except for one (somewhat pedantic) detail: the word “holdover” implies that electric motors used to spin disks, but no longer do. This is not the case!
      
      Not used much in consumer devices, granted, but plenty of current-day applications still use electric motors. Tape drives, for instance, are far from dead, and there is a constant stream of upgrades to the LTO tape standard, which still provides the best value for money for data archiving applications. Not to mention, spinning hard drives are still the best $/terabyte option for storing bulk linear data, such as backups or family movie/photo collections, as far as consumers are concerned.
      
      Sure, boot drives and such should always be on solid state storage. But for mass data storage, spinning motor storage of one shape or another is still king as of 2022.
      
      LikeLiked by 1 person
    - 6
      
      hoakley on February 22, 2022 at 4:53 pm
      
      Thank you.
      Hard disks were the king in 2012, but many of us have left them long behind us. I haven’t used a hard disk for any significant storage for several years now, and my 12 TB of external storage for my production Mac is entirely SSD, and has been so for several years now.
      I no longer recommend hard disks unless they’re used in RAID arrays, where the cost of a decent system like a Promise Pegasus is not far from a comparable SSD system.
      Howard.
      
      LikeLike
7

Simon on February 18, 2022 at 5:23 pm

For very simple disk tests there’s the free Blackmagic Disk Speed Test (on MAS). If you set it to use very large file sizes (I think you can choose as much as 5 GB) you will force the disk to flush its cache.

The only thing I dislike about it is that it cannot test TMA backup disks. I suspect the same is true for other apps like Stibium because of the read-only way Apple has set up APFS disks used by TM.

LikeLiked by 1 person
- 8
  
  hoakley on February 18, 2022 at 5:56 pm
  
  Thank you.
  That was one of the benchmarks which I found came to grief on M1 Macs. If you’d like to read the linked articles, you will see why.
  There is also the very important point that most disk transfers, and often the most time-critical, involve much smaller files. Experience and a lot of testing shows that using such large files can give very different results. Hence Stibium, which is actually very quick to use for a vanilla test.
  You can test TM backup storage quite easily: simply add another APFS volume to it, and test into that.
  Howard.
  
  LikeLike
  - 9
    
    Simon on February 18, 2022 at 7:19 pm
    
    Thank you, Howard, for the tip on adding an APFS volume for testing purposes.
    
    LikeLiked by 1 person
10

Raoul on February 21, 2022 at 8:56 am

with COW and performing checksums of all written blocks…

Checksums are something Apple only apply to metadata blocks and not actual data blocks and so I’m not overly fond to keep data (including backups) on APFS that I cannot afford to lose. Am I correct that there’s still no way to verify a TM backup?

Part 2 relating to this article could be to explore when hardware “lies” to the OS… Classic example are cheap USB chipsets that receive the command to flush their cache, lie to the OS that they have done so and in actual fact–haven’t…
This is an issue that pops up over and over in the home NAS community when using USB for ZFS pools… it’s a no no.

LikeLiked by 1 person
- 11
  
  hoakley on February 21, 2022 at 9:07 am
  
  Thank you. Although there is an option to make a backup with a “consistency scan”, that doesn’t appear to check the integrity of the files being backed up. So currently there doesn’t appear to be any way to verify the data in a TM backup.
  I’ll defer part 2 to someone more expert than me. It’s deeply technical, and highly controversial. And very important.
  Howard.
  
  LikeLike
12

Michael Tsai - Blog - Apple SSD Benchmarks and F_FULLSYNC on March 9, 2022 at 7:58 pm

[…] (2022-03-09): See also: Howard Oakley, MacRumors, Howard […]

LikeLike

·Comments are closed.

Share this:

Related