After the crash: replaying the journal to prevent disk errors

Once you have recovered your Mac from a crash/freeze/kernel panic/whatever, one of the first questions you should be asking is whether that resulted in any damage to its disk(s).

There are many layers between an app calling for the contents of a disk to be changed, and that change actually being completed. Most actions require a sequence of steps to be performed in the correct order. So at any moment in time, there may be a queue of actions pending.

When disaster strikes your Mac suddenly, chances are that some actions will still be pending. Before Apple introduced journalling to the Mac OS Extended File System (HFS+) used on your disk(s) since OS X 10.2.2 in 2002, this would often leave the disk(s) in a mess, resulting in disk errors. For example, if in the instant before a freeze, an app was in the process of making a copy of a file, the new copy (which it would be writing at the time) may not have been properly written and added to the disk’s directory structures.

Over time, those usually minor errors would accumulate, so that after a few such crashes, they would become significant enough to cause more substantial problems on the disk, which then required more serious repair.

The journal is intended to prevent that from happening. Each disk write is logged in that disk’s journal as soon as it is called. As the write actions are completed, they are removed from the pending list. If a crash forces your Mac to restart, early during the startup process macOS looks at the journal to see what actions were pending and have not been completed. It then steps through those actions, ‘replaying the journal’, until the disk has been brought up to date or synchronised with the journal, and reflects the state after the last journal entry.

Journal replays can only occur on volumes for which journalling is turned on. When it was first introduced, journalling was an option; it is now the default and you would be most ill-advised to turn it off.

The snag with HFS+ journalling is that it is aimed not at protecting data being written to files, but the directory structures and other essential information which makes the file system work. In the example given, when your Mac restarts it will correct those directories and should ensure that the drive works normally, but the data in a modified file may be junk, leaving the file corrupted. This may seem a serious omission, but it allows journalling to be much more efficient and effective.

macOS does not normally tell you in an alert when there is a problem with the journal, nor when it replays the journal to bring the disk back into sync. When you restart or start your Mac, macOS runs a quick check on the disks, using the command shell version of Disk Utility, fsck. That will normally indicate whether the journal looks to be in sync. If it is, macOS carries on starting up.

If that quick check shows there is a problem with the journal, then it will replay the journal to apply the necessary corrections. This is marked in your logs by entries looking like
jnl: disk0s2: replay_journal: from 13043200 to: 3971072 (joffset 0x15502000)

This reports that fsck found asynchrony in the journal of disk0s2, so called jnl to replay the actions which were needed. After that, your disk should be fully healthy again, and you should not need to perform any further repairs.

A good way to check this is to leave macOS to settle a bit after such a restart, then open Console, and search using the term jnl in the search box at the top right of the window. Once you have located the most recent line containing jnl, select it, then delete the jnl search term. The full logs will be revealed, for the time at which that line entry was made.

Rarely, there may be more severe damage which prevents journal replay, or which causes an error when trying to use the journal. You may then see log entries like
jnl: disk0s2: update_fs_block: failed to update block 2 (ret 5)
or
jnl: disk0s2: journal_open: Error replaying the journal!

The best action then is to restart in Recovery mode (Command-R at startup). This should automatically run a more powerful pass of fsck which should have a better chance of fixing the problems. Then switch to Disk Utility, and perform First Aid on the affected disk(s). If that cannot repair the damage, consider third-party disk repair tools such as DriveGenius, or you may even have to initialise the disk and reinstall.

Much of the time, journalling works effectively to prevent problems from occurring, and to prevent their building up to a full-blown disk crash. The good news is that Apple’s new APFS file system, which is intended for release in 2017, does not need this type of journalling, and should make disks more robust against the sort of problems that can arise from crashes. We will see if it does.

3Comments

Add yours

1

Dan on January 4, 2017 at 4:35 pm

I’m not having any luck with any tools “auto-repairing” the journal issue on my 2009 MBPro. I have exactly the error mentioned with the fail message “journal_open: Error replaying the journal!”.

I booted from the install CD (leopard) and ran lots of things (disk utility, fsck) and single user mode with fsck_hfs -y -Rc -d /dev/disk0s2 (my affected disk). No luck. I can’t mount in write mode because the journal error blocks the mount and I see “Disk full” even though I have ~ 20GB available (from 320).

I suspect 3 possible issues:
1. HD dying or Bad blocks stopping the journal from being applied
2. journal replay is trying to write the state of the computer before sleep (before forced restart) which is maybe ~8 + GB causing disk full scenario and breaking the journal replay thus a deadlock.
3. command line tools and fsck is outdated on my machine and not able to fix the issue (I need better software)

Is it possible to simply bypass the journal replay? Will the “Catalog b-tree” (filesystem) still work enough to mount in write mode and remove some files?

As I can mount in read-only mode I’m already recovering all my files and looking towards a complete zero out of the HD and a stronger backup regime.

LikeLike
2

dynamicdan on January 4, 2017 at 4:35 pm

I’m not having any luck with any tools “auto-repairing” the journal issue on my 2009 MBPro. I have exactly the error mentioned with the fail message “journal_open: Error replaying the journal!”.

I booted from the install CD (leopard) and ran lots of things (disk utility, fsck) and single user mode with fsck_hfs -y -Rc -d /dev/disk0s2 (my affected disk). No luck. I can’t mount in write mode because the journal error blocks the mount and I see “Disk full” even though I have ~ 20GB available (from 320).

I suspect 3 possible issues:
1. HD dying or Bad blocks stopping the journal from being applied
2. journal replay is trying to write the state of the computer before sleep (before forced restart) which is maybe ~8 + GB causing disk full scenario and breaking the journal replay thus a deadlock.
3. command line tools and fsck is outdated on my machine and not able to fix the issue (I need better software)

Is it possible to simply bypass the journal replay? Will the “Catalog b-tree” (filesystem) still work enough to mount in write mode and remove some files?

As I can mount in read-only mode I’m already recovering all my files and looking towards a complete zero out of the HD and a stronger backup regime.

LikeLike
- 3
  
  hoakley on January 4, 2017 at 4:45 pm
  
  I think that’s a good plan.
  As far as I know, the only time that the journal is replayed is during startup (or after restart), if there is a journalling mismatch indicating the disk was shut down without changes being completed to it. I don’t know of any way to bypass that, as it happens quite early on during the startup process.
  If you’re getting replays when the Mac is being shut down in a normal fashion, that suggests either soft or hard issues with the drive, or maybe something during shutdown which is corrupting the disk.
  If there’s an error in the journal, it means it cannot be replayed, so some changes made to the disk contents will not occur. That can leave disk errors and other problems with the contents.
  Re-initialising the disk should give it a chance to start afresh. Hopefully journalling will work normally again.
  Howard.
  
  LikeLike

Share this:

Related