Inside the file system: 2 HFS+ volumes

The first article in this series of three explained the basic division of storage into partitions, with a GUID partition table. Each partition then can have its own file system, of which the traditional one for Macs is Macintosh Extended, generally known as HFS+ as it is the successor to the original Mac file system HFS (Hierarchical File System).

The order of data in the GPT is fixed. That of an HFS+ file system within a partition is much more flexible: there’s some reserved space at the start of the partition, following which is the Volume Header. After that, the file system metadata and data can be anywhere within the partition. At the end of the partition is the Alternate Volume Header, and another area of reserved space.

The Volume Header contains information about the whole partition, which makes it an HFS+ volume; only one HFS+ volume can go into each partition. This explains why, as far as HFS+ is concerned, the terms partition and volume mean essentially the same thing. Because of the importance of the Volume Header information, an identical copy is kept at the end of the partition, where it’s known as the Alternate Volume Header – just as with GPT Headers for the whole disk.

Data contained in the Volume Header include attributes for the whole volume, the location of any journal (if that volume uses journalling), its date of creation, and so on. There are also some summary figures for the whole volume, such as the total number of files and folders on it, and the number of unused allocation blocks (which tells you the amount of free space). There are also location and size information about the other ‘files’ containing file system information, such as the catalog file.

The other file system metadata is gathered into ‘files’:

  • Allocation file, a bitmap (‘volume bitmap’) for the whole partition recording whether each allocation block is currently allocated or not.
  • Catalog file, a B-tree containing information about the hierarchical structure of files and folders.
  • Extents Overflow file, a B-tree containing supplementary information about the location of storage allocated to files.
  • Attributes file, a B-tree which functions similarly to the Catalog file, but handles extended attributes (xattrs) and other attributes.
  • Startup file, intended for non-Mac systems without HFS+ support.
  • The Journal, containing incomplete transactions on the file system.

These can be located anywhere within disk space, between the Volume Header at the start and the Alternate Volume Header at the end, although in more recent implementations of HFS+ they are normally kept in the Metadata Zone (see below).

B-trees are a well-known system of structuring hierarchical data in computing, which are popular for their fast performance. Each consists of fixed-size nodes, containing records which consist of a key and some data. These are used to map a key to its data. Each B-tree has a single header node, which contains information to find any other node in the tree. The rest of the nodes consist of map, pointer and data records, known respectively as Map, Index and Leaf nodes, which give a clue as to their complex structure. Further information about B-trees is available here, and their use in HFS+ in Apple’s TN1150.

Each file consists of a series of one or more extents (contiguous allocation blocks) which contain that file’s data. The locations of the first eight of these extents are stored in the Catalog file; those files which require more extent information, because the data is spread across more than eight extents, have those additional extents stored in the Extents Overflow file. When the storage is fully defragmented, so that every file occupies just one extent, the Extents Overflow file should be empty. The more fragmented the files are, the larger is the Extents Overflow file.

The Extents Overflow file also contains a listing of any bad blocks that have been marked out within that partition of the disk.

The Journal is a proven tool for reducing the chances that any interruption to coordinated changes to the file system will result in damage. In HFS+, many basic operations such as creating a new, empty folder can require a series of changes to be made. If a process is interrupted, perhaps by a crash or loss of power, then the incomplete series can leave the file system in an inconsistent state. At worst, the damage can be serious enough to cause complete data loss.

The Journal tries to ensure that only complete series of changes are made to the file system: either that new folder is created properly, or not at all. It does this by gathering all the changes to be made in a transaction, which is recorded in the Journal. When the whole of a transaction has been committed successfully to disk, that transaction is marked as being complete.

If a Mac running journalled HFS+ starts up when there are still transactions in its Journal, those are completed by moving the changed data to their correct places, a process known as replaying the journal, and recorded in the log during startup. If there are transactions in the journal which can’t be replayed properly, it must be assumed that the file system has sustained damage. Note that replaying the journal doesn’t guarantee the integrity of data within affected files, but primarily aims to preserve the integrity of the file system.

From Mac OS X 10.3 onwards, HFS+ has operated a Metadata Zone, which aims to place volume metadata and more frequently used (smaller) files (‘hot files’) close together, to minimise access times on rotating hard disks. This is managed by a B-tree known as the Hot File B-tree kept in a hidden file.

You may also come across what is sometimes called HFSX, which is essentially HFS+ with case-sensitivity. It’s not recommended for any practical use with applications, many of which can’t cope with case-sensitivity.

A successful check of an HFS+ volume in Disk Utility should now report something like:
Volume was successfully unmounted.
Performing fsck_hfs -fy -x /dev/rdisk3s2
Checking Journaled HFS Plus volume.
Checking extents overflow file.
Checking catalogue file.
Checking multi-linked files.
Checking catalogue hierarchy.
Checking extended attributes file.
Checking multi-linked directories.
Checking volume bitmap.
Checking volume information.
The volume ThunderBay2HFS appears to be OK.
File system check exit code is 0.
Restoring the original state found as mounted.

Given the description above, you should now be able to identify each of the file system structures which is being reported on.

Repairing HFS+ is now a mature art. fsck_hfs, via Disk Utility, is usually very proficient, but third-party tools are invaluable for tackling problems which those won’t attempt. The flagship tool for working with HFS+ remains Alsoft’s DiskWarrior. Other operations are supported at the command line by diskutil. Data recovery from more hopeless cases is well supported by better data recovery services too.


Apple TN1150 HFS Plus Volume Format