How to prevent errors on SSDs

Following work by Hector Martin and the Asahi Linux team on the internal SSDs in M1 Macs, there has been intense discussion over the reliability of storage media. At the heart is the problem of ensuring that pending writes are flushed to non-volatile storage in the event of sudden failure. Put more succinctly, if your SSD suddenly loses power, could it be left with serious errors as a result?

There are two main types of write error you need to guard against: file system and file data.

Those in the file system are the more serious, as they can render the whole storage inaccessible, or for it to require repair with the potential to lose substantial file data. This was partly mitigated in HFS+ by journalling, but as we’ve all experienced that can still leave damage to both file systems and data. APFS doesn’t use journalling, but has defences including copy-on-write which should at least ensure that the file system itself remains intact. That relies on changes to the file system being performed in order, thus that macOS controls the flushing of pending writes to disk.

Applications are responsible for flushing their pending writes to disk, either using the F_FULLFSYNC command in fcntl(), which is very slow, or F_BARRIERFSYNC, which has far less performance penalty, but relies on support by the storage. Either way, if an SSD doesn’t obey the instructions and lies to macOS, there’s a real risk of data loss and file system damage.

The man page for fcntl() states that “Apple SSDs are guaranteed to provide” the hardware support for these safeguards, but it’s also known that some external SSDs don’t, even some more expensive NVMe models. Unfortunately, this isn’t a feature normally tested during review of SSDs, so users aren’t aware of which brands and models could be vulnerable to error in the event of sudden failure.

One answer can be selecting higher-specification SSDs, such as those branded for enterprise or datacentre use, which incorporate capacitors or batteries to ensure they can flush to non-volatile storage when power is lost or a similar disaster strikes, a feature sometimes termed PLP (power loss protection). However, those are significantly more expensive, further increasing their cost differential against hard disks. For example, a Samsung 860 QVO 1 TB SATA costs around £115, and its PRO version £370.

A more practical solution for most users should be an uninterruptible power supply (UPS), provided that both the Mac and all its external storage is connected to its battery-backed output, and the UPS is connected via USB (or networked with SNMP) to the Mac. You then need to configure the Energy Saver pane to shut your Mac down well before the UPS runs out of battery. To do that, connect the UPS, select the UPS item in the pane, and click on the Shutdown Options… button.

There is a catch here, though, when your Mac is a notebook, as macOS doesn’t offer UPS shutdown options for notebooks when connected to a UPS, as they have their own internal batteries. If your external storage is powered by that notebook, that isn’t a problem, as that storage won’t suffer any interruption to its power supply should mains/AC power be lost.

But what if your MacBook Pro (or other notebook) is connected to external storage which has its own independent power supply? If mains/AC power were lost, the notebook would continue running, but the storage would then suddenly lose power, and risk serious error. But if you connect the storage to a UPS, there’s normally no means for the UPS to signal the storage that it’s now running on battery and needs to shut down. Unless Apple were to change its policy on UPS support on notebooks, there’s no solution.

This also applies to any NAS system, which poses a different problem. Could you attach two systems, your Mac and the NAS, to the same UPS and configure them both to shut down in the event of mains/AC power failure? That is possible, but not using the regular USB connection; for that to work, the UPS, Mac and NAS would all have to employ SNMP over the network, something more familiar in enterprise systems. While I’m sure that’s possible, it isn’t easy, and the best solution is normally to provide a separate UPS for the Mac and NAS. Most NAS systems are designed to work well with that arrangement.

Compared to the extra cost of enterprise/data centre SSDs, one or two UPS seem quite a bargain.

I’m grateful to @rosyna, @xenadu02 and others for their illuminating work and discussion.