Getting more out of Sierra’s logs

There was a time when the logs in OS X were a mine of useful information, not that they had to be mined that hard. You could open Console, Show all, and scroll back to see what happened following the last startup, or when that app unexpectedly quitted on you. In Yosemite, that all started to get more cluttered, as more and more processes were chattering away. By El Capitan, the logs had become so congested that, if you didn’t know exactly what you were looking for, they were almost useless.

consoleboot
Console showing log entries for a typical restart.

At that stage, OS X was using a fairly traditional logging system: processes which felt the need to record significant events, errors, and failures did so by writing a plain text string to one of a group of log files. All Console had to do was open those on demand, and let you view the contents.

Over the years, there have been two main purposes for inspecting logs: developers (particularly Apple’s engineers) who are often trying to fix bugs in their products, and support people (including users themselves) who are often trying to work around the consequences of bugs. Their interests are unfortunately conflicting: developers want a great deal of detail about the product they are working on, while support staff and users want logs to be generally very sparse except for the events leading up to the problem which they are trying to solve.

Sierra’s new logs

Then, when announcing and detailing macOS Sierra at WWDC in June 2016, Apple explained that logs were about to change. Alarm bells should, perhaps, have started ringing when it was announced that Apple’s engineers “want as much logging on all the time as possible”. It was also decided to move all existing logging to the new system. Although this does not prevent apps from writing to the few other logs which remain in traditional text format, all conventional calls to write to regular system logs now end up in the new log system.

Sierra’s logs are not written or stored as text, but in a binary database format (.tracev3 files). This means that the only practical way to access these new logs is using Apple’s tools, either the Console app, or the log command in Terminal. Although Console has no intrinsic ability to browse logs already collected, you can (if you really want to) use log to export a section of existing logs to a .logarchive file, and read that into Console to explore it.

consolesall

Too much information

With such a huge amount of information being streamed to the logs all the time, log messages are classified into five different levels:

  1. Fault
  2. Error
  3. Default
  4. Info
  5. Debug.

This should, in theory, have made it much easier to see through the incessant chatter and view only the most important log entries. However, in practice the vast majority of log entries (of all levels of severity, it appears) are made at the Default level. Very few indeed occur at the Error or Fault level, and in any case there is no supported method for viewing only entries of Fault or Error levels, at least when it comes to examining previously collected logs.

If you know Consolation, LogLogger, or the log command, you may be aware that there are two options when viewing logs --info and --debug. These respectively include entries at the Info and Debug levels. log show (hence LogLogger and Consolation) has no option to confine entries to Error or Fault level.

You might also think that you could design a predicate filter to limit the levels used. There are two possible candidates here, eventType and messageType. However, eventType matches the type of event (logEvent, traceEvent, etc.) not the level, and messageType currently does not support Error or Fault.

Predicate filters currently supported in Console and log (and therefore available in LogLogger and Consolation) are thus unable to tell directly how important or significant log entries are. They can filter on the basis of the ‘subsystem’, which is a new feature in Sierra and little-supported even by the many components of Sierra itself, by the process or sender which originated the log event, or by basic patterns of content in the log messages themselves. But even used with quite complex predicates, they can only be very rough filters to hone in on the significant events which interest support staff or users.

sysdiagnose?

Indeed, it is telling that Apple’s log engineers stress the importance of sysdiagnose for capturing data to look at bugs and problems. You can invoke sysdiagnose using gestures:

  • on a Mac, press Shift+Control+Option+Command+. (full stop or period)
  • in iOS, press Volume Up+Volume Down+Power; there is a slight vibration on an iPhone to signify the start
  • on a Watch, press and hold the Digital Crown and Side Button for 1 second, which results in a slight haptic
  • on an Apple TV, press Play/Pause and Volume Down; older remote controls may need these to be held for 5 seconds.

You can then transfer the sysdiagnose dump using iTunes, where necessary.

I have elsewhere considered the huge amount of information generated by sysdiagnose, and the complexities of interpreting it. It is even more of a monster than the new logs are, and even more difficult for support staff or a user to diagnose a problem.

Filtering log entries

The new log database should be an ideal place to obtain carefully filtered entries. Each log entry there consists of a very precise timestamp, the thread number of the process writing the entry, the level (Fault to Debug, as above), an activity code (almost invariably 0), the process ID (as used in Activity Monitor), the name of the process writing the entry, its ‘subsystem’ where available, and the text of the entry itself.

However, Apple has already told us that it has no plans to enable access to the logs by third parties, at least until macOS 10.13 or later. So what we cannot do today using Console and log, and tools such as LogLogger and Consolation, we cannot even work around.

All my experience so far with the new logging system is that it only becomes usable and useful when you use filters, based on predicates and time periods, skilfully. This means working out roughly when the index event(s) occurred, then limiting your searches to the period around that, and applying predicates designed to bring out the entries that are most useful to you.

consolman10

The simplest illustration is, of course, the commonest reason that users take to their logs, to check whether Time Machine has been making backups without error. Thankfully Time Machine is one of the few defined subsystems, and it is one which logs very little beyond its essentials. It also runs every hour or so, but not between. So it is very simple to set up an appropriate time period and predicate which will select out all Time Machine entries.

Could we use similar techniques to filter new-style logs to contain just ‘the important bits’ which we could read in Yosemite and earlier, then?

Success in doing so depends on our prior knowledge of those important log entries, and our ability to design predicates which will include the right entries, and exclude those we don’t want. Because our criteria are heavily dependent on the content of the entry, i.e. the eventMessage CONTAINS specific strings, developing a successful filter is not easy. From my experience of looking at log entries, I think that the heuristics would actually need to parse the messages in order to assess their significance. Simply filtering on text content would not work.

There are three other factors which need to be borne in mind.

First, during critical periods, there are a lot of log entries. A typical trouble-free startup into Sierra may result in well over 100,000 log entries being written. Reducing those to 100 or less is a fearsome task.

Second, several important processes which used to write to the old logs do not appear to make any entries in the new ones. My best example is launchd, the master process launcher, which makes everything apart from the kernel and itself happen. I have not seen a single entry in a new log which relates to launchd, and suspect that Apple has deliberately suppressed such entries. There is thus a lot of important information which does not seem to exist in the new logs.

Finally, log has bugs which affect its filtering. Before even considering the accuracy of its elaborate predicate system, a much simpler filter for log entries are start and end times. These have consistently failed to include all log entries within the defined time window, according to the width of that window.

I test each new update to Sierra to see whether this issue has been addressed. My test is simple: I count the entries returned from an ‘all entries’ search for a given period of a minute or so, when that minute is set exactly by start and end times, and when it is contained within a broader window of two or three minutes.

Each release of Sierra has shown that, in any given minute, the precise time window returns far fewer log entries than the count of entries for the same period set within a broader window. The disparities are not small, either: for 10.12.3, the narrow window returned 330 entries in the minute, the broader window 806 in that same minute.

If the log command is incapable of filtering accurately on the basis of time, then I fear for its ability to apply complex predicates with any accuracy, but that is far harder to test.

So logs are now useless?

No, they are just harder to make use of, and getting the best out of them is an important skill for support staff. That is why I developed first LogLogger, then Consolation. They should be much easier to use than the log command, but you’re going to have to learn how to get the best out of the logs. I have also posted some tips and suggestions here, here, and here.