The Digitization of History
Notes from the talk at Trinity Hall, May 28 2008:

Research in a digital age - experience from The National Archives

Speakers: Natalie Ceeney (Chief Executive, The National Archives) and David Thomas (Director of Technology and Chief Information Officer, The National Archives)

Chair: Melissa Lane (University Senior Lecturer in History at Cambridge University, Associate Director of the Centre, and a Fellow of King's College Cambridge)

The Centre was delighted to host Natalie Ceeney and David Thomas (Director of Technology and Chief Information Officer, The National Archives) for their talk and discussion about the ways in which digitization is changing the discipline of history. The scale and vision of The National Archives’ efforts illustrates how critical it is for historians to engage with digital debates, divides and decision-making. They came inviting comments and debate from the audience, asking us to focus in particular on the following questions:

Has there been discernible change in the way academics are using digital resources?
What do you perceive to be the challenges and limitations of what we’ve described?
Will digital research encourage the humanities to become collaborative in the same way as scientific research?

We have selected some key issues of particular relevance to the historical community from a rich presentation for further discussion:

1. Born-digital records: the archive of the future
a. One of the questions which Ceeney suggested would most occupy ‘historians of the future’ is that of ‘born-digital’ records – records such as emails, agendas, memos and government directives which are never published on paper. The necessity of ‘weeding’ the large volumes of purely digital records which come into TNA’s purview has forced the archives to rethink many of the assumptions which govern paper archives, prompting new engagement with guidelines for preservation.

2. Digital search capabilities and TNA’s ‘access agenda’
a. The National Archives (TNA) is governed by Crown copyright, which allows them to take a more liberal approach to digitization of content. Because TNA have been able to embrace the concept of the digital archive more easily than most libraries, they have also been in a unique position to assess the impact of digitization on research methods. Even so, aggressive digitization can only ever hope to reach a small fraction of the archive’s vast collections: while over 80 million National Archives documents were delivered digitally in 2007 (against 37 million in 2005), and TNA hopes to provide over 100 million documents via download by 2012, these numbers represent only 5-8% of an estimated 175km of shelving. An astonishing 70% of records are inadequately catalogued and thus enormous portions of the archive remain unresearched. As such, the digitization initiative at TNA is at heart about the possibilities of searching and cataloguing huge swathes of hand-written records more effectively. TNA’s ‘accessibility’ agenda encompasses ‘traditional’ dilemmas of information science as well as the challenges of ‘born digital’ records and digital document delivery systems.

b. Search & serendipity: One aspect of TNA’s digital catalogue which should appeal to historians is the ability of the system to mimic the experience of ‘browsing the stacks’ of the physical archive. Historians have long exploited serendipitous discoveries made while walking among long rows of shelving, an excellent means of discovering similar material which may not have appeared in a bibliographic reference.

c. User behaviour remains a chief interest of TNA’s directors. Both speakers were interested by the increasingly blurred boundaries between the academic researcher and the amateur genealogist. The new methods of access that TNA provides do offer a rival strategy to Google’s deliberate anti-academic flat-search system. Yet both Ceeney and Thomas suggested that historians have as a group been somewhat reticent about joining in dialogue with TNA about their needs and ideas, more often acting as passive users or registering discontent with particular solutions after the fact. (‘Your Archives’ is available at: )

3. The ‘mash-up’ and web 2.0
a. Thomas focussed on the possibilities of the ‘next edition’ of the web, in particular the potential of the ‘mash up’ as a tool for historical teaching and research. The mash-up links embedded media files with the ‘wisdom of crowds’ approach that Wikipedia has made famous, turning viewers into participants whose responses to the mash-up themselves become a part of the archive. One interesting example of this approach which Thomas mentioned is the Vietnam Veteran’s Memorial project (at ), a public-private partnership which joins US National Archives’ military and photographic records with National Geographic photographs of the memorial, individual’s memories and stories, and media coverage. TNA has also implemented a wiki system to supplement the official catalogue, allowing users to share expertise and experience of the archive with each other. Keeping in mind the vast swathes of ‘unresearched archive’ which Ceeney mentioned, the wiki provides an interactive forum which allows viewers to become cataloguers. Far from ‘hoarding’ knowledge of particular collections, TNA has found that researchers have been generous with their specialist experience.

b. The ‘hourly’ archive
i. Leigh Denault wondered whether a part of historians' reticence to embrace digital technologies was due to the fragility of ‘Web 2.0’. Earlier iterations of digital libraries and archives were more heavily shaped by print paradigms, resulting in a static ‘web’ of permanent (ideally, if not in practice) digital documents. The mash-up, relying as it does on a kaleidoscope of different resources, is inherently evanescent. Thomas suggested that perhaps we simply need to rework our understanding of the archive itself. In practice, Denault suggested, even collections of paper records are constantly shifting, as new items arrive, old items are lost or relocated, or rediscovered, and different regimes of archivists and researchers come and go – just like the mash-up. The online mash-up simply requires, more obviously perhaps than the traditional bricks-and-mortar archive, that the story of the archive be preserved as an integral part of the resource itself. Just as cultural historians have recently engaged fruitfully with the role of ephemera in the shaping of public mindsets and everyday political interactions, we might think of the mash-up as digital ephemera, which may demand a new approach to replication or archiving.

c. Collaboration and new methods of historical research
i. History often, in practice if not always in theory, relies on scarcity. Young historians in particular tend to exploit bodies of under-utilised or misrepresented source material as a ‘hook’ for analysing the past. How then does a discipline which prizes rarities deal with an age of flat-file keyword searching and increasingly ‘open’ and accessible archives? Ceeney expressed a hope that one way in which digitization might ultimately impact the practice of history would be through suggesting new kinds of collaborative work which might take place among historians with different specialisations, allowing a more in-depth analysis which could engage seriously with quite disparate bodies of material.

4. Debated subjects: economics, long-term storage, maintenance, availability issues
a. TNA has explored a number of economic models, since existing funding models are not sufficient to cover the costs of developing and maintaining a digital archive. Private-public partnerships (PPPs) have become key in enabling the digitization of particular collections. Such partnerships however usually dictate the digitization of certain "on-message" collections over and above those which the scholarly community might require, leading Ceeney to suggest that the existing business model may indeed fail academics. Yet the archive has significant leverage in that they control access to documents which many private research institutions and companies wish to see digitally accessible, allowing TNA to negotiate terms which allow them to digitize records outside the scope of their outside funder’s initial mandate. Many collections of primarily academic interest have thus far has been digitized using small grants from organisations such as the AHRC and Wellcome Trust. All of these sources add up to a significant investment in digital services of circa £45m since 2004. As this was however accomplished without recourse to additional public funds and with heavy reliance on existing infrastructure, TNA’s large-scale projects provide an interesting model for smaller archives as well.

b. Many participants asked questions about data retention and management.TNA has used a mixture of on-site storage to data tape for less-accessed material which can be called to disk when requested by researchers, instant-access data for catalogue systems, and third-party branded websites which look and feel like TNA’s main site but allow the archives to offload their most popular material (census records in particular) to higher-volume commercial servers.

c. The speakers used the example of the 1986 BBC Domesday project to consider the role of long-term storage and maintenance at the heart of so many digitization debates. The much-vaunted project to digitize the 1086 Domesday Book on laserdisc, then the latest technology, resulted in a resource which became completely inaccessible mere decades later. Ceeney noted however that just as digitization allows a new engagement with cataloguing and ‘rediscovering’ the archive’s material, it also offers an opportunity to bring records which have been kept in an obsolete medium ‘back to life’, in this case by reverse-engineering the hardware necessary to recover the stranded data. The BBC example does provide a reminder that digitization is not at all a static process, and is thus not actually comparable with, for example, microfilming a resource. Digitization is a powerful and exciting new tool for archives, but it comes with a high maintenance cost which must include continual migration of data from older formats and hardware to ensure continuing accessibility.

LD & RW, May 2008