Sparse disk images in virtual machines

I first noticed disk images using APFS sparse file format in lightweight virtual machines (VMs), before Maurizio kindly pointed out that plain UDRW read-write disk images can now be stored as sparse files, as I described yesterday. This article looks in more detail at those used in VMs, then considers how UDRW disk images might become converted to sparse format.

Disk.img

Lightweight virtualisation of macOS and GUI Linux on Apple silicon Macs uses a bundle (strictly speaking a package rather than a bundle) containing a boot disk image conventionally named Disk.img. The internal architecture of Disk.img in a macOS VM is complicated, as it contains the three APFS containers normally found on the internal SSD of Apple silicon Macs, one of which contains a complete boot volume group. Disk.img is also an APFS sparse file, and grows and shrinks in size according to its contents. This can be hard to see, as the Finder doesn’t update the bundle size very often.

I have some large VMs with nominal capacities for Disk.img of 120 GB. With Ventura 13.1 installed, the space taken on disk for what is supposed to be over 128 GB is less than 16 GB, achieved as an APFS sparse file. Thus the total storage space required for 141 GB of VM bundle is just under 28 GB.

Disk.img files are RAW disk images, into which is written an image of a complete disk. For macOS VMs, they can be partially attached using hdiutil, but their Data volume is encrypted and fails to attach or mount. Linux VMs presumably use a native file system, as hdiutil refuses to attach them at all. Neither can be accessed through Disk Utility, though.

RAW disk images have been used in the past, although they’re not among the types made accessible to users. In this case, they are created and maintained using the sparse file format in APFS, a potent combination, particularly for VMs which often contain a lot of empty space.

Efficiency

Current user-accessible sparse disk formats have to be manually compacted using hdiutil compact, and even then may not recover the expected space. To test the efficiency of these sparse RAW disk images in a VM, I wrote a total of 53.3 GB of test files into one, then deleted the folder containing them. The disk image first grew by 53.3 GB, then shrank back to its original 16 GB size without any noticeable delay or loss of space.

Measurements must be made on Disk.img itself, for instance using Sparsity or the Finder’s Get Info. Total disk space shown for the whole bundle in the Finder didn’t change from its original 28.33 GB, despite Disk.img inside it taking up 69.4 GB. Once again, the Finder showed itself to be disturbingly unreliable in reporting figures from APFS.

Creation

Sparse RAW disk images are readily created in code. Apple’s example code for lightweight virtualisation uses the sequence
let diskFd = open(diskImagePath, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR)
var result = ftruncate(diskFd, sizeDisk)
result = close(diskFd)

(error handling omitted) where sizeDisk is the size in bytes. Similar can be achieved using the command
dd if=/dev/zero of=Disk.img bs=1m count=0 seek=10240
where the number given for seek is the size in blocks.

These only create the sparse file itself, which cannot be attached and used as storage until a file system has been written to that file. In lightweight virtualisation, that process is performed during installation of the IPSW image or Linux, handled by calls to the API.

Use

Currently the only supported access to these sparse RAW disk images is through the VM running in them. The host Mac doesn’t attach or mount the Disk.img file, although it’s possible to attach the disk image from a macOS VM and modify it for testing purposes. They don’t appear particularly stable when so attached, and can disrupt the host Mac.

Like other disk image types, their write performance is significantly slower than direct access to the host storage: on a high-performance Thunderbolt 3 SSD, for instance, it’s reduced from around 3 GB/s to 0.5 GB/s, although read performance is only 10% slower. This appears to be the result of the overhead of the disk image format, as writing to shared folders through the virtiofs file system is almost as fast as reading and direct access.

For those who enjoy recursive systems, these sparse RAW disk images fully support sparse file in APFS, including UDRW read/write disk images, as well as sparse disk images and sparse bundles.

Conversion of UDRW disk images to sparse files

When first created and mounted, plain UDRW read/write disk images don’t use sparse format, unlike Disk.img in VMs, but occupy full storage space according to their total capacity. As I explained, for them to be converted into sparse format, they have to be unmounted, then mounted a second time. This appears to require the Finder to mount them, doesn’t work with hdiutil attach alone, and occurs with that second mounting.

Browsing the log during disk image mounting and unmounting isn’t easy, as there are thousands of entries from diskarbitrationd. Scattered through those are occasional entries from APFS, storagekitd and Disk Utility. Although APFS Space Manager records the number of blocks free and trims them, there’s no record of any change in file format.

Key here may be the involvement of the Finder. During that second mount, macOS updates the data stored for the disk image. That appears to be the trigger for it to be written out in sparse format, with empty space being skipped rather than written as null data. Why this doesn’t happen with the disk image is first created isn’t clear, though, although that part of the process isn’t common to disk images used in VMs, which are created as sparse files in the first instance.

Handle carefully

Like UDRW read/write disk images, RAW disk images take advantage of APFS sparse format, and need careful handling to ensure that they don’t explode to full size. Precautions include:

  • store them only on APFS volumes;
  • transfer them over a network using macOS file sharing;
  • never use AirDrop, and avoid iCloud;
  • when needed, protect them inside sparse bundles or sparse disk images;
  • compress them inside a folder using Apple Archive, or another protection such as tar;
  • back them up using Time Machine to APFS.