Card tricks with TextEdit, and saving files safely to SSD

It was once all so simple. To save a file, a document you were working on for example, the app just overwrote the original file with the new data. When everything worked perfectly, that was perfect too. But if anything went wrong, as it often did on slow old hard drives, chances were that both the original version of your document and the new one would be blown away and unusable. Hence the idea of a safe save.

Safe saving overlaps with what’s known as atomic saving, which has nothing whatsoever to do with weapons or quantum computing. What’s atomic is the requirement that, at any moment, the document on disk should either be a complete original, or a complete updated version, and never something betwixt or between.

One straightforward approach to this is to write the changed data out to a temporary file, then replace the original file with the temporary one. For an app to do this without operating system assistance requires quite a lot of work, as the updated file needs to have the same permissions set, the same date of creation, and so on. There’s also a problem with extended attributes: to preserve those, they’d need to be copied from the original file to the temporary one before the original file is replaced.

To make this easier, macOS has offered a function call exchangedata(). To use it, an app simply has to write the new document data into a temporary file, then call exchangedata() with the paths to the original and temporary files. macOS then swaps the contents of the files in an atomic fashion, so when you access what was the original file, you’ll get the updated data instead.

The snag with trying to use exchangedata() now is that it’s only supported on two file systems: HFS+, which is all but obsolete apart from storing Time Machine backups, and AFP, Apple’s old network file system, which is now dead and buried. Specifically, exchangedata() doesn’t work on APFS.

APFS instead offers its own form of safe saving: copy on write. When changed data are written out to an existing file, it’s written not to the block containing the original data, but to a different block. Not only that, it only writes out as much new data as it needs to.

CoS1

Let’s say that our original document is stored in two blocks, and we make changes which affect only the contents of the second.

CoS2

Instead of APFS writing out new versions of each of those blocks, it only writes the changed block, and the new file is then composed of one new and one old block. Used properly, these can save a lot of disk space and, most importantly on SSDs, can minimise the number of erase-write operations needed, which reduces wear on the SSD.

Unfortunately, support for these new APFS features isn’t always as good as it should be. If you look at the way that many apps currently (in Mojave) save files, it doesn’t appear to use APFS to its best. One way to test this is saving a simple document using TextEdit, which will turn into a card trick using files.

Open TextEdit, and create a new document with a single line of text such as “This is the first file.” Then save it with a distinctive name. In Terminal, create a hard link to that file, e.g. using the command
ln FirstFile.rtf FirstFileHardLink.rtf
When you now select either of those in the Finder, you’ll see a QuickLook preview showing that single line of text.

In TextEdit, add a second line such as “This is the second file”, then save that modified document using Command-S to write it to that same file.

If TextEdit were to use APFS copy on write, it would now write out that changed file, which APFS would store in a new block then change its metadata to point that file’s records at that newly written data. If that were to happen, the saved file’s inode would now be the same as the original file’s had been before the save, and the hard link would point to the changed file.

What actually happens is that the saved file has a new inode number, which is greater than that of the hard link, as that link still points to the original file, with its original inode, which is less than the inode number of the saved file. TextEdit thus writes out the changed data to a new file and deletes the original. In this case, because you made a hard link to that original it can’t be deleted until that link is deleted.

But there’s another card trick still to come.

With the updated (two-line) version of your document still open in TextEdit, use the Browse All Versions command in the Revert To command of the File menu. You should now see another quite separate copy of the original version of your document, this time stored in the macOS version system. Verify that’s different from the original version by opening the hard link in TextEdit, adding a different second line to it and saving that too.

You should now have two different files with a common version history, stemming back to your original single-line file. How’s that for sleight of file?

What actually happens, then, when you save a changed document in an app like TextEdit is more complex, and runs something like this:

The changed document is saved to a new file with a temporary name.
The changed document is also saved to the macOS version system.
Most metadata are copied from the original to the new file.
The original file is deleted (so long as there aren’t any hard links to it).
The changed document is renamed so that it effectively replaces the original, except for its inode.

You can check these in detail using my free utilities Precize and Revisionist, if you wish.

This is clearly both atomic and safe, particularly with the added copy in the version system. But how does this compare with older techniques? That depends on their implementation. If exchangedata() didn’t actually overwrite anything, but worked like APFS copy on write, then it would require a minimum of one erase-write for the temporary file, plus changes to the file system metadata, and leave the deleted original file requiring erasure before it could be re-used.

If exchangedata() instead actually overwrote the original file, that would add an erase-write and leave the deleted temporary file requiring erasure. By my reckoning, that’s the same as what actually happens, only that brings the bonus of having a copy saved in the version system. If I have read it right, not using the features of APFS doesn’t actually lose you much. But I’m open to suggestions.

1Comment

Add yours

1

Joe on October 25, 2019 at 11:00 pm

Your explanation is actually factually incorrect. You seem to be confused as to how APFS actually works. File data is not COW by default. You must either have already taken a snapshot or have a clone for COW to kick in. Additionally, when applications perform safe saves, the safe save can clone the file. Cloning the file does create a new filesystem entity, so it will have a new inode number. However, the clone does reference the underlying block of storage. The changes can then be made to the clone file, thus causing COW to kick in for any changes. Then the clone and the original can be atomically swapped (this can be done atomically on APFS for BOTH files AND directories – directories are something that is not a standard atomic swap in UNIX, that is the new non-portable rename API that APFS implemented). Once the swap has been performed, the original file can be deleted and any no longer referenced blocks can be freed.

You can easily see this behavior using fs_usage.

LikeLiked by 1 person

Share this:

Related