Whodunnit? The housekeeper that killed an app

I’m not sure which of the players in the game Cluedo/Clue is the housekeeper, but I’d like to report a murder, and they did it. They murdered my most-used third-party app, the same one I’m writing this article in, MarsEdit. What’s more, they did it with a RunningBoard. Let me explain.

Daniel Jalkut, developer of MarsEdit, the world’s best blog authoring app, was investigating reports of his app suddenly quitting, when he came across reference to RunningBoard, a new sub-system introduced from iOS to macOS Catalina. The ultimate cause of this appears to have been a housekeeping app, which in trying to free up disk space, pulled the rug from under running apps, and caused macOS to force those apps to quit, apparently out of the blue. His original discussion and account in his blog article go into all the gory details.

It’s more than two years since I last looked at the details of RunningBoard, so this is a good time to revisit it in Monterey, and ask how it has the power to terminate apps like that?

RunningBoard was originally developed in iOS, to manage the life cycle of apps, and ensure that they can’t hog the more limited resources of Apple’s devices. I believe that it was ported to macOS as part of the preparations for Apple silicon, where life cycle management is considered essential. This is most relevant for notebooks, to ensure that the excesses of individual apps can’t exhaust their battery. But it’s also important for desktop systems, if you want to maintain good performance when you need it.

In its early days in macOS, RunningBoard seems to have had its greatest role to play with Catalyst apps, which have common code across iPadOS and macOS, so are no strangers to life cycle management.

Perhaps the best way to illustrate how involved RunningBoard is with apps now, is to step through its role when an ordinary app is launched on an M1 Mac running Monterey.

LaunchServices receives the event calling it to open the app. It then declares the launch, and sends the launch request to RunningBoard (RB). That in turn ‘acquires an assertion targeting’ the app to be launched, and creates and launches a job for that. AMFI (security), through its service amfid, calls for a trust evaluation if necessary, as does syspolicyd, which quickly finds it in its database. A Gatekeeper check is then run.

RunningBoard next sets up its records for the app, checking what if anything it has to manage for it. It acquires more ‘assertions’ for the launch, then sets its state to running-active, with a role of UserInteractive. In the case of this particular app, no management is called for; if it was managed, it could have memory, lifecycle (suspend events), GPU and CPU resources managed for it by RB.

LaunchServices then declares success of the launch. Shortly afterwards, it starts listening for the expected death of the app from RB. However, RB continues acquiring assertions for the launch to proceed, and in the absence of any announcement of the death from RB, the launch continues. RB proceeds to perform a series of updates to the app’s state, each time declaring that it’s unmanaged.

By this time, the Gatekeeper scan has returned a satisfactory result, and the launch proceeds, with the app running at last, ready to check its access with TCC (privacy controls). After that, the app is moved to the front, which is noted by FuseBoardServices (another port from iOS, I believe), managing its windows in conjunction with WindowServer. Later, RB sets up an “AppNap adapter assertion” for the app’s AppNap settings. This contains the following:

  • Enable
  • Inactive
  • PreventDiskThrottle
  • PreventSuppressedCPU
  • PreventLowPriorirtyCPU [sic]
  • PreventBackgroundSockets
  • PreventTimerThrottleTier0

and possibly other fields. RB continues to acquire assertions for the app, until the end of its life cycle, when it records its death and removes it from its list of jobs.

So, if MarsEdit isn’t an app whose life cycle is managed by RunningBoard, how come a housekeeping app could lead to its murder?

What we must presume happened is that the housekeeping app called for a CacheDelete in its misguided attempts to clean up that Mac, specifically to free up disk space, which was running short at the time. That can’t be performed for a running app, so the only option was to force that app to quit. Whether RunningBoard actually committed the deadly deed, or just witnessed the event in its assertions, isn’t clear, but the crash report mentioned OS_REASON_RUNNINGBOARD as the reason for the app’s termination.

For apps whose life cycle RunningBoard doesn’t manage, this type of event should be very rare. However, there are some lessons:

  • Beware of any app which claims to perform housekeeping for you. Sub-systems like RunningBoard aren’t documented, and apparently helpful actions can have unintended consequences when macOS is also trying to manage its own resources.
  • If your boot disk is running low on free space, macOS itself may take actions to manage that shortage. This isn’t new, but RunningBoard is far more pervasive than services like assertiond were in the past.
  • If an app suddenly quits, recover the crash report (for example, through the notification) and pass it to the app’s developer.

And that’s how the housekeeper murdered an app using a RunningBoard.

I’m very grateful to Daniel Jalkut for telling me about this, and for writing the world’s best blog authoring app. Without it, there’d be no Eclectic Light.