We do it every day: double-click on a document in the Finder, and we expect it to open in our preferred app. It was one of the distinguishing features of the original Mac, and we’re most put out when it doesn’t just work like that. The process is so commonplace that even the vast number of entries made in the new unified log don’t trace its steps any more, unless there’s a fault somewhere along the line.
One marker of the start of this elaborate sequence which you can find in the log is the action in the Finder of double-clicking on the document, normally seen in a series of entries like
08:14:05.195328 Finder AppKit sendAction:
08:14:05.195387 Finder AppKit sendAction:
08:14:05.397417 Finder AppKit sendAction:
08:14:05.397478 Finder AppKit sendAction:
These mark two clicks or taps around 0.2 second apart, within the threshold for a double-click/tap. macOS recognises this as a double-click, and the Finder identifies the underlying document’s URL as the object in an Open command. We have told macOS to open the document at (say)
One of the things that macOS needs to know about the item at that URL is what it is, so that it can determine how to open it. It does this by checking its UTI (Uniform Type Identifier) – the central topic of this first article of three.
Previously, and in other operating systems, various schemes have been used to distinguish different types of object in the file system and beyond. Classic Mac OS used short type and creator codes and a Desktop database, which was a great strength but at times a troublesome weakness: the database became corrupted or fell out of synchrony, requiring a manual rebuild.
In Unix and operating systems still centred on commands, file types are determined by extensions to the filename, so .txt or .text indicates a file which should be treated as if it contains plain text. This forces a rigid naming discipline: omit the extension, and the operating system doesn’t know how to handle the file.
Back in Mac OS X 10.4, Apple introduced a taxonomic system based on a hierarchy of types with text names formed like reversed URLs, such as
com.apple.mail.mbox – the UTI. At the top of the hierarchy are at least ten fundamental types:
- public.item – the base type for items of any kind
- public.content – the base type for document content
- public.archive – the base type for archives
- public.executable – the base type for executable data (code)
- public.contact – the base type for contact information
- public.message – the base type for all forms of message
- public.calendar-event – the base type for all scheduled events
- public.stored-url – the base type for URLs
- public.database – the base type for databases
- com.apple.resolvable – the base type for items which can be resolved by the Alias Manager.
These then have children which are derived from one of more parents. For example, an exchangeable address book entry which might ordinarily have the filename extension .vcf or .vcard has the UTI public.vcard, which is a child of the fundamental UTI public.contact, and the derived UTI public.text, which in turn is a child of public.content and public.data, a child of public.item. In Apple’s terminology, public.vcard conforms to public.contact and public.text.
macOS determines an item’s UTI on the basis of a combination of criteria. Some are determined by their role in the file system: a directory is of type public.folder or public.directory by definition, for example, although its properties might mean that it is one of the more specialised children of public.directory, such as com.apple.application-bundle if it meets the requirements for an app.
For documents, the most important criterion is normally the filename extension, which is looked up in the internal database of UTIs to map that document to a UTI. If the filename doesn’t have an extension, or no mapping is found for its extension, macOS can also use the classic type code inherited from Mac OS, now known as the OSType, or might peek at the opening bytes of the file to use its ‘magic number’, or any saved MIME type if it has been downloaded from a network.
Documents do sometimes end up with incorrect filename extensions, most commonly as the result of misnaming by the user or arriving with an incorrect MIME type: sometimes you save a file thinking that it will be designated as a .jpeg image (MIME type image/jpeg, mapped to UTI public.jpeg), but somewhere in that process it ends up with the extension .php (MIME text/php, mapped to UTI public.php-script).
Another common situation where an extension is misinterpreted is with multi-part files with a numeric extension such as .001: some apps which handle multi-part archives can then claim them, although their underlying data might be for individual pages in a PDF document.
The standard solution to such problems is to change the filename extension to one which will map to the correct UTI, which causes the Finder to show a dialog requesting confirmation, because of its effect on the app used to open that document.
In theory, it might be possible to generate a new definition for the mapping from filename extension to UTI, in a Property List file placed in a folder in which this will be spotted and added to the dictionary of UTIs, and that is how an app can add its own custom UTI definitions. But trying to remap existing UTIs in this way is likely to lead to problems.
There are also potential problems in handling documents which are stored in iCloud. Those which have been evicted from local storage are no longer represented by a local copy of that document, but by a specialised stub file with the extension .icloud. These have the UTI com.apple.icloud-file-fault, which conform to com.apple.resolvable and public.data. Attempting to change their extension or map them to being opened by a different app is likely to cause problems until they have been downloaded locally and reverted to their normal extension and file type.
Apple’s original documentation for UTIs envisaged three classes of UTI: those used generally across apps and by the user, which start with
public., private UTIs only meaningful to individual apps, formed from the reverse of their internet address, such as
com.apple., and dynamic UTIs starting with
dyn. In practice the distinction between public and private has been completely lost, as almost all UTIs of practical importance are of the ‘private’ form, such as com.adobe.pdf for PDF documents.
Dynamic UTIs have also become so commonly used as to be a problem. When surveying UTIs associated with the contents of most folders, meaningless UTIs such as
dyn.ah62d4rv4ge8085pt generally outnumber regular UTIs, and most of them map to unknown filename extensions or pasteboard types.
Regular UTIs, both public and private, are extremely widely used. You don’t have to look hard on a Mac now to come across 600 or more different types, and visualising even the most important is very difficult. The map below shows little more than fifty of the most common and important UTIs used in a Sierra system. This is made the more difficult because of the shallowness of their conformity tree: a large number of individual image types are direct children of public.image, for example.
Apple’s original documentation is no longer maintained, and grows increasingly inaccurate. For example, Apple’s Uniform Type Identifiers Reference gives public.url-name as the fundamental type for URL names. This has now been replaced by public.stored-url, and public.url-name is a secondary UTI which conforms not with public.stored-url but public.data.
Although public.database remains a fundamental type, few if any UTIs conform to it; most databases instead conform to public.data or public.composite-content. One fundamental type of importance in file system objects is not even a public UTI: com.apple.resolvable, although that is listed by Apple.
Inevitably, a great many new UTIs of general significance have been defined since Apple last revised this reference nine years ago. These include com.apple.heif and public.heic, for instance, which are both children of public.image.
Access to the macOS database of UTIs appears very limited. Although several commands generate dumps which include, among other data, details of UTIs, there doesn’t appear to be any way of obtaining a listing of known non-dynamic UTIs, nor of looking up a UTI from the command line. Two free utilities available from Downloads here are of use: Precize gives the UTI for individual files and items, among a wealth of other information about them, and UTIutility scans folders for UTIs and provides detailed multi-way lookup from UTI, filename extension, OSType, MIME type, or Pasteboard type.