Over the last few weeks, I’ve been posed a series of questions that can only be resolved using the log, such as why Time Machine backups are failing to complete. The common confounding factor is that something has gone wrong at an unknown time, but if you don’t know exactly what you’re looking for, nor when to look, the Unified log is a hostile place, no matter which tool you use to browse its constant deluge of entries.
This is compounded by the fact that errors seldom come singly, but often propagate into tens of thousands of consequential log entries, and those not only make the cause harder to find, but can shorten the period covered by the log. In the worst case, by the time you get to look for them, the entries you needed to find are lost and gone forever.
In late 2017, I experimented with a log browser named Woodpile that approached the log differently, starting from a histogram of frequencies of log entries from different processes.

Viewed across the whole of a log file, this could draw attention to periods with frequent entries indicating potential problems.

The user could then zoom into finer detail before picking a time period to browse in full detail. Of course, at that time the Unified log was in its early days, and entries were considerably less frequent than they are now today.

A related feature in my later Ulbow log browser provides similar insights over the briefer periods covered, but like Woodpile that view seems little used, and isn’t offered in LogUI.
Another concern is that a great deal of importance can be recorded in the log, but the user is left in the dark unless they hunt for it. I documented that recently for Time Machine backups, and note from others that this isn’t unusual. Wouldn’t it be useful to have a utility that could monitor the log for signs of distress, significant errors or failure? To a degree, that’s what The Time Machine Mechanic (T2M2) sets out to do, only its scope is limited to Time Machine backups, and you have to run it manually.
Given the sheer volume and frequency of log entries, trying to monitor them continuously in real time would have a significant impact on a Mac, even if this were to be run as a background process on the E cores of an Apple silicon Mac. In times of distress, when this would be most critical, the rate of log entries can rise to thousands per second, and any real-time monitor would be competing for resources just when they were most needed for other purposes.
A better plan, less likely to affect either the user or the many background services in macOS, would be to survey log events in the background relatively infrequently, then to look more deeply at periods of concern, should they have arisen over that time. The log already gives access to analysis, either through the log statistics command, or in the logdata.statistics files stored alongside the folders containing the log’s tracev3 files. Those were used by Woodpile to derive its top-level overviews.
Those logdata.statistics files are provided in two different formats, plain text and JSON (as JSON lines, or jsonl). Text files are retained for a shorter period, such as the last four days, but JSON data are more extensive and might go back a couple of weeks. I don’t recall whether JSON was provided eight years ago when I was developing Woodpile, but that app parses the current text format.
Keeping an eye on the log in this way overlaps little with Activity Monitor or other utilities that can tell you which processes are most active and using most memory, but nothing about why they are, unless you run a spindump. Those also only show current figures, or (at additional resource cost) a few seconds into the past. Reaching further back, several hours perhaps, would require substantial data storage. For log entries, that’s already built into macOS.
I can already see someone at the back holding up a sign saying AI, and I’m sure that one day LLMs may have a role to play in interpreting the log. But before anyone starts training their favourite model on their Mac Studio M3 Ultras with terabytes of log entries, there’s a lot of basic work to be done. It’s also worth bearing in mind Claude’s recent performance when trying to make sense of log entries.
What do you think?
