DIGITAL INFORMATION POSES PROBLEMS FOR CONSERVATIONISTSParis - An increasingly large share of the information produced today in practically all areas of human activity is compiled digitally and is designed to be accessed on computers.
But this enormous trove of digital information may well be lost unless specific techniques and policies are developed to conserve it.
Such losses have already occurred and will get worse unless something is done. On July 27, 2001, Reuters reported on the case of University of Southern California neurobiologist Joseph Miller who asked NASA to check some old data the Viking probes had sent back from Mars in the mid-1970s. The US space agency turned up 25-year-old computer tapes in a format that could not be read. "NASA had long forgotten" this software reported Reuters, or, as Mr Miller put it, "the programmers who knew it had died."
Mr Miller was looking for evidence of microbial life on Mars in the data, which was originally dismissed as evidence of "meaningless chemical activity". He finally had to make do with printed records that the initial NASA team had saved, which only contained about one third of the original digital data.
Preserving valuable scientific information, research data, media output, digital art, to name but a few areas, clearly poses new problems. If such material is to be accessed in its original form, technical equipment - original or compatible hardware and software - must be maintained alongside the digital files that make up the data concerned. In many cases, the multimedia components of websites, including internet links, represents additional difficulty in terms of copyright and geography, sometimes making it difficult to determine which country a website belongs to.
UNESCO has been examining these issues with a view to defining a standard to guide governments' preservation endeavours in the digital age. During the meeting of the Organization's Executive Board in May, Member States agreed on the need for rapid action to safeguard digital heritage. The debate was largely inspired by a discussion paper* compiled for UNESCO by the European Commission on Preservation and Access (ECPA), an Amsterdam-based non-profit foundation, which outlined the issues involved in digital preservation.
ECPA argues that traditional preservation methods, such as the "legal deposit" used by national libraries to ensure that copies of all printed materials are kept, are difficult to apply to digital material for a variety of reasons, notably because Web "publications" often draw on data stored on servers in different parts of the world. The sheer volume of data concerned also poses a problem. It is estimated that the internet features one billion pages whose average lifespan is extremelyshort, estimated at 44 days to two years.
Website preservation poses huge problems. Sites are constantly changed and updated while superseded materials vanish without leaving a trace. When organizations go out of business or lose interest, whole websites disappear from sight. This does not only happen with personal pages or informal sites, but also with central and official ones - like the White House site, www.whitehouse.gov, which was wiped clean when George Bush took over the presidency.
The collection of speeches and official communications of the Clinton administration disappeared overnight. The National Archives and Records Administration (NARA) saved much of the Clinton material, but a huge number of internet links to this material located on other sites have been broken.
Similarly, the first online editions of the leading Swedish newspaper Aftonbladet (August 25, 1994, to March 26, 1997), with content that is in part different from that of the paper version, have been completely lost.
Arguably the most democratic publishing medium ever, the ever-growing internet deserves to be preserved as a whole, some claim, as its pages and discussion forums can be considered a priceless mirror of society.
There are technical problems in ensuring that the digital material that is saved in archives remains accessible in its original form. The share of total information and art produced around the world on traditional supports such as the printed page, analogue tape or film, is declining yearly ascompared to material designed for computer access. Software and hardware are constantly replaced by more powerful new generations, which ultimately become incompatible with their predecessors. This means that within just a few years, material - which often includes sound and moving graphics or pictures, as well as links to internet sites and, or, databases - becomes inaccessible.
The sheer volume of data to be sifted in order to select what is worthy of preservation is staggering. "The world's total yearly production of print, film, optical, and magnetic content would require roughly 1.5 billion gigabytes of storage. This is the equivalent of 250 megabytes per person for each man, woman, and child on earth," according to a recent study by the School of Information Management and Systems at the University of California at Berkeley.
To get an idea of the amount of data this represents, it is worth bearing in mind that the typical PC hard disk sold today has a capacity of 20 to 30 gigabytes (20,000 to 30,000 megabytes). According to the University of California study, printed material of all kinds makes up less than 0.003 percent of the total storage of information, which includes still and moving pictures, both digital and analogue, the web, audio recordings etc.
Most of the other data, i.e. data that is interactive, cannot be preserved by simply being printed out and archived, it needs to be preserved on digital storage media, such as CD-Roms, which are far less durable than acid free paper or microfilm.
Another complex issue concerns copyright, including copyright of software required to access digital files. The ECPA points out that a dazzling array of rights may be associated with websites combining mixed materials from various sources and says that agreement on the principle of "the right to copy for preservation" still has to be developed.
Valuable initiatives, most of them in industrialized countries, have been undertaken to preserve digital heritage, including websites. One notableexample in the south is the Alexandria Library, the Bibliotheca Alexandrina in Egypt, which recently received the "Internet Archive" (IA), a digital library of internet sites and other cultural artifacts in digital form. It provides free access to researchers, historians, scholars, and the general public. A "Way back Machine" allows researchers to surf websites as they used to be, even when the material is no longer available on the net.
The IA also features a television and film archive and totals more than 100 terabytes of information (100,000,000,000,000 characters). Building on the existing Internet Archive at the Bibliotheca, UNESCO's Cairo office is developing a pilot project in which digital content in Arabic language is to be preserved, sortedand indexed.
The complexity of the problems involved means that the task of preservation must mobilize the producers of digital information, including software, which should, according to ECPA, take conservation into consideration as they design theirproducts. It argues that the days are gone when preservation was the sole responsibility of archival institutions. UNESCO has therefore launched a process of consultations with a view to producing guidelines and best practices so that we do not lose the fruit of invaluable work by scientists and artists and future historians are not deprived of essential information about the world of today.
Contact: Roni Amelan,
Bureau of Public Information,
Tel: (+33) (0)1 45 68 16 50