Lost in Transcription

Come Macs, iPads, or iPhones, the world still loves meetings. Whether a grand Board to a bluechip, a good sleeves-rolled-up technical argument, or consultation with doctor or lawyer, we have to talk face to face. We also – in the name of governance, as evidence of agreement, or merely for the record – like to have written accounts of our meetings.

I somehow volunteered to organise a two-day workshop, for which I would dearly have loved complete transcripts. Although a colleague and I took careful notes throughout, once I became embroiled in debate my notes dried up. Thankfully we recorded the whole proceedings on a neat little Olympus recorder. Its audio files were peppered with assorted noise, but could easily have been converted into a transcript by a competent audio typist.

Like most organisations, the last typist of any variety had retired or been paid off soon after the first computer networks were installed. We swapped the bubbly blonde whose nails were undergoing permanent manicure for her diametric opposite, an unkept bloke muttering inchoate phrases from the command line. But now, with voice recognition skills of the calibre of Siri, we could surely justify that decision. So I set off to get my Mac to produce a complete transcript.

The first snag was that commercial products such as Nuance’s Dragon Dictate are designed to be trained to a single user’s voice. At the more heated points in our discussions, there were maybe half a dozen different voices, male and female, interjecting and throwing ideas around like a vigorous vocal beach volleyball game.

Training was not on the agenda. For our melee of untrained tongues, the only real option is Apple’s Dictation built into Yosemite. Its significant improvement over Mountain Lion is that you can now use ‘enhanced dictation’, with the speech to text engine downloaded to your Mac rather than accessed by Internet. Apple claims that this lets you dictate continuously and without training.

The next hurdle was that Yosemite Dictation works only with live audio input, not sound files. I checked if I could use Soundflower to play the files into OS X as if they were audio, but its development has been sporadic and I was unsure whether it would work, let alone leave my Mac in a stable condition. So I copied the audio files onto my Olympus LS-100 and played them into my iMac’s line input socket.

It quickly became clear that, whilst Dictation pulled some impressive transcription feats, such as recognising my own name and a few fairly esoteric phrases, it still does not cut the mustard. I tried slowing down the playback rate as far as 50%, which actually made it less accurate.

Given that the original audio was fairly noisy and in stereo, I opened it in Amadeus Pro, dropped the channel with the poorer signal to noise ratio, filtered out background noise, boosted the volume, and brightened the speech up a bit. Yet Dictation still came back with fragments as nonsensical as “if you put manifold someone skiing Time now known to be further sensitive affluence”. If anything my audio engineering efforts actually turned the transcript to even greater gibberish.

I thought back to those happy days when every productive office block had rooms full of audio typists, sat pounding out transcripts on typewriters. For all the billions of dollars invested in software development, the power and sophistication of modern processors, and the novelties that computing has brought, it cannot match the speech recognition and transcription skills of the worst of those audio typists.

Bring back the bubbly blonde with her emery board.

Updated from the original, which was first published in MacUser volume 30 issue 03, 2014.

Share this:

Related