Heroic Age of the Internet

Beowulf is a name that engenders a range of emotions from indifference to welling panic, the latter particularly if you read English at university and were dragged screaming and kicking to Anglo-Saxon classes. If your taste for the most ancient English literature was not so poisoned, then Allard and North’s Beowulf and Other Stories (ISBN 978 1 4082 8603 6) could inspire you to savour Beowulf and more.

This exceptionally long continuous literary record of the English language has depended on the survival of works written on parchment and a rich range of other physical media. Although we still commit many books and other writings to paper, the rapid decline in most publishing sectors is becoming a serious threat to our literary and cultural heritage. If we were all to tweet, blog, and otherwise publish on the Internet, but not in physical print, we could leave little for future generations to research or cherish.

Electronic publishing has many problems that make it inherently ephemeral. Much of the material spread over the world’s servers is mundane, measly, or even moronic. Maybe Twitter has embedded in it incisive works comparable to the delicious riddles in the Exeter Book, but finding them would be like rummaging for bronze buttons in a vast forest floor of fallen leaves. Vogue blogs of the moment would be impenetrably cryptic just a few years, let alone centuries, from now.

But above all, storing the gargantuan hoard that is the Web is an increasingly impossible task.

The Internet Archive at archive.org has been making a bold attempt at keeping snapshots of the Web, currently boasting 482 billion pages in its Wayback Machine. Ironically its fallback site used to be at the New Library in Alexandria, the former home of the greatest library of the classical world, which was probably destroyed successively by a series of fires during the decline of the Roman Empire – less spectacularly than the fiery destruction of Umberto Eco’s fictional monastic library in The Name of the Rose.

Internet archaeology faces a huge challenge in its desperate need for useful search tools. If anyone is ever to extract anything useful out of those billions of pages in the far future, existing techniques are already woefully inadequate. No amount of metadata or PageRank information can lead the researcher to discover what among the myriad was important, influential, or culturally significant. This is stark contrast to the situation with Anglo-Saxon literature, where all we have are a precious few fragments remaining from a once rich corpus.

We hewers of HTML and journeymen of JPEGs should now be assembling a truly Alexandrian library of the Web, gathering and tagging those works of lasting importance. So when PhD students in fifty or a hundred years time are researching, say, hypertext works of the early twenty-first century, they might come across Eastgate’s Reading Room rather than innumerable sites trying to flog treatment for erectile dysfunction or soft porn movies. This is more like the Open Directory project dmoz.org or the much-maligned Wikipedia, than Google or the Internet Archive.

Thankfully, since I first wrote this plea for posterity eight years ago, there has been considerable progress.

The UK Web Archive, sponsored by the British Library with the National Libraries of Scotland and Wales, JISC, and the Wellcome Library, have been collecting since 2004. But there are unnecessary complications over copyright (again): although there is a legal requirement for all printed books and periodicals to be provided to the UK’s designated copyright (‘legal deposit’) libraries which can then lend out on request, access to the whole of the UK web domain will only be possible from those six libraries. Other countries are doing the same, but no doubt are faced by similar legal issues.

There is also the significant gap in that many UK websites are hosted overseas and carry domain names outside those of .uk. It is quite possible that many of those will escape such archiving and be lost to the great bit-bucket in the sky. archive.org, the British Library, and others have started a task which can only get bigger and more fraught as time – and websites – go by.

The history of our heritage is strange and tortuous: the sole manuscript copy of Beowulf was almost destroyed in a fire that swept through its ill-fated repository, fatefully named Ashburnham House. It would be a great tragedy if a major cultural activity of the early 21st century were to pass into oblivion because no-one remembered to collect it properly.

Updated from the original, which was first published in MacUser volume 23 issue 17, 2007.