visit AESonline.orgNavigation |
Of DATs and .Dats, or Don't Trust That Format
A couple of stories from this summer provide insight into the challenges of an archivist in the digital era. But more than that, they have implications for all of us. Because we're all archivists -- digital ones at that -- these days. The sheer quantity of digital information the average computer-using person has to manage and care for is staggering. People who don't even know what a "terabyte" is find themselves in need of 1TB drives and NAS units to keep their photos, music, TV shows, and other bit-hogging files backed up. I don't know anyone, anymore, who hasn't lost data in a drive failure episode or a format-change. Most of us, however, blithely assume our old files and media are safe, certainly if we keep them "backed up" on other drives or removable media.Read more . . .
Well, that's not the case. We've been dealing with the implications of rapid changes in digital file formats and storage protocols for some time at the Center, trying to anticipate both where things are going, and how much consideration (and money) we have to devote to where things have been. We're an archive. We deal in donated or purchased collections of raw field recordings by ethnomusicologists and linguists and anthropologists. Many of these materials arrive into our care late in the lives of their collectors (or even after their collectors have died). They arrive on all sorts of media -- reel to reel tape, VHS tape, cassette tape, DV tape, and DAT tape among them. We have to archive the equipment needed to play back these formats so that we can recover them, digitize them, and keep them in usable condition. We have closets full of old reel to reel decks and cassette decks and video decks, old hard and removable drive technologies, and old software. I never throw anything of the sort away on the logic that someday we just might need to get data from a 5.25" floppy or a VHS-C tape, perhaps 10 or 20 years from now. But it's a bad idea to rely on the continued existence of playback technologies these days, or storage technologies. Maintaining an archive, even of one's personal music or photo collections, means not only "backing it up" regularly, and redundantly, and to an offsite location. It also means constant format conversion, so that your materials are stored in formats that can actually be read by current technologies. Forget about them for a year or two or ten, and you may find yourself with pristine backups -- of unrecoverable materials. Here's a couple of stories from this summer to illustrate the issues more vividly: Story 1: DAT Tape When I was starting out as a fieldworking ethnomusicologist, way back in the early 1990s, most of us (grad students, anyway) worked with cassette tape recorders for our research. The standard setup tended to be a Sony TCD5 or WMD6C recorder, Maxell XLII tape, and a stereo condenser mic or a pair of lavs. This was reliable, decent sounding technology. The tapes were easy to get anywhere in the world. The format had been around for 20+ years, and for fieldworkers, it had replaced bulky, heavy, and expensive choices like the Nagra IV-S reel to reel recorder (if you've never picked up one of those beauties, you should, and then show some respect for our teachers who used to carry 25-pound machines on their backs into rain forests and deserts, not to mention all the tape needed). Around the mid-1990s, a technology that had been growing in popularity as a digital medium for recording studios during the 1980s finally became both portable and affordable enough to become the gold standard and the first widely accepted digital format for field research: the Digital Audio Tape format, or "DAT." DAT tape had many virtues: it made pristine recordings free of tape noise, like any digital medium, that could (in theory) be copied digitally without generational loss. DAT recorders -- even the popular Sony portables most of us used in the 1990s, like the PCMM1 and the D8 -- were professional machines, with adjustable input levels, sampling rates, and quality mic preamps. When they fell below $1000 around 1993 or so, they were also relatively affordable (a Sony WMD6C cassette recorder cost nearly $500 at the time). Finally, they made uncompressed digital recordings, at high bitrates and sampling rates. This gave them a significant advantage over the main competing digital formal (that developed into a consumer format somewhat later), the minidisc. (Beyond that, consumer grade minidisc recorders were also crippled by manufacturers so that they could not transfer their digital recordings to a computer without reconversion to analog audio, in real time, out of a fear of piracy and bootlegging; DATs were not hobbled this way because it was assumed only professionals would use them; some even defeated the SCMS copy protection system originally included with "prosumer" grade DAT equipment, which is why I bought a Sony PCMM1 in 1994, for $800.) Thousands of linguists, ethnomusicologists, journalists, and bootleggers made the move to DAT tape in the 1990s. Many of us -- and I am one -- built large collections of primary recordings on DAT tape. DAT was also used (in a somewhat different way, using somewhat different versions of the same media) for computer tape-backup systems, and it is still used today for this purpose, quite widely. But then the problems started. DAT tapes, it began to become clear, were fragile. The data on them was digital, not analog, so that any damage to the tape destroyed all the data, not just the data on damaged parts of the tape. Older DAT tapes often refused to play back. It turned out that the DAT format required a very precise alignment of heads to tape for playback, and that tapes could be finicky about playing back across different machines or machines from different manufacturers. The heads on the recorders most of us used wore out much more quickly than the heads we were used to on analog gear. This was a huge problem, because in some cases, only the original machine on which a recording had been made would be able to play back any particular recording. Both tapes and decks turned out to be very sensitive to humidity and physical damage, making them riskier choices for tough fieldwork environments than we had thought. And then, DAT was gone. Within the last few years, it has become impossible to buy a new DAT deck, professional or prosumer grade. Audio-grade tape has become hard to find. Flash recorders, CD recorders, and hard drive recorders (and improved minidisc recorders) have become cheaper and more viable and popular options, and most fieldworkers now use pocket-sized flash recorders that are far more durable, cheaper, and easier to use -- plus they upload and download from a computer with ease, via USB. But thousands of recordists have huge collections of DAT tape recordings made in the 1990s. Ethnomusicologists of my generation and the next generation (trained in the late 1990s) have drawers and boxes full of DAT tapes, and aging personal portable recorders with extensive head wear. Many -- I speak from personal experience -- don't think about the preservation of those recordings, assuming DAT is a professional standard, in such wide use that it will always be supported, and assuming that "digital" recordings won't degrade the way older analog tape recordings did. Because DAT tape can only be digitally transferred in "real time," (i.e., the same way an analog tape is converted, by playing it all the way through), converting DATs to other formats is a time-consuming and expensive process. The digital I/O cables used to connect older DAT recorders to other digital processors are hard to find and expensive (later, professional rack-mount versions often do have optical or coaxial digital connectors, but many older machines use proprietary 7-pin cables no longer manufactured by Sony, although they can be purchased custom made by aftermarket suppliers like the wonderful Core Sound). These cables are necessary because, as I said, many old DAT tapes will only play back correctly on the machine on which they were recorded, often a portable Sony deck with only a 7-pin connector. As an ethnomusicological archive, we expect to be seeing collections of DAT tape coming in to our archive for years to come, as scholars retire, move on to new projects, or die, and begin turning over personal collections to archives like ours for safekeeping and research use. Realizing this, two years ago I scrambled to buy one of the last new professional DAT decks still on the market at that time, and I managed to get the last one from the last supplier who had them. I had our older pro DAT deck serviced and repaired, and as many of our older portables as I could as well. I put most of these away for the future, and prayed I was doing enough to anticipate that future within our budgetary means. Earlier this summer, we had a chance to test the waters. One of our faculty members had an extensive collection of DATs documenting priceless, wonderful field recordings. But converting those has not been a simple process. Some of them were made on *data*-grade DAT tape, which is much thinner and longer than audio-grade tape. Some were made at 44KHz and some at 48KHz sampling rates. We found that some tapes played in one of our two pro decks, and some in the other, but often any given tape would not play on any given deck, though all seem to play on the PCMM1 deck on which they were recorded; that is an aging deck however, and its heads had been well used in the making of the original recordings. And our only option for digital I/O on that deck was a Core Sound 7-pin cable. The entire process was slow, complex, and nerve-racking for the recordist, especially. Ethnomusicologists spend years making one-of-a-kind recordings that can never be replaced. Watching those recordings teeter on the brink of unrecoverability is terrifying. The moral of the story, on one level, is to get moving on DAT conversions, NOW. Time is the enemy here. Tapes and decks are degrading with time. Equipment and cables are getting harder to find. I can imagine a time in the future when a DAT archive will be impossible to convert, or very, very expensive to convert. But the bigger moral of the story is that formats change, more quickly than ever. It is no longer possible to think of a field recording as comprising its physical medium; the recording comprises a formatted digital file. Often, these formats are proprietary, meaning they belong to some company. They can disappear or be withdrawn, as well as becoming obsolete. It is now necessary to think of format conversion as a fundamental dimension of any archiving and backup strategy, and for every serious recordist to stay constantly abreast of developments in digital media formats. I recommend immediately converting any field recordings to several of the most common digital formats -- .pcm, .aiff, and high-bitrate .mp3 (the last is compressed, so definitely not a primary backup medium unless your original recordings were also compressed). Along with storing field recordings in multiple copies in multiple locations (long standard practice with analog archives), it is now necessary to back up recordings (video or audio alike, and photographs, or text, for that matter) on multiple media formats, and in multiple file formats. A master hard-drive copy, a backup hard-drive copy in a different location, an optical media (CD preferred over DVD) copy or two, and as many different file formats as you have room to store (with NAS and hard drive storage coming down to less than 25 cents a gigabyte, that should drive an ever-increasing quantity of digital real estate -- our lab now has 14TB of hard drive storage, up from 8TB a year ago, and 4TB the year before that -- storage is the cheapest part of the equation). Now on to .dat So, that's a parable of DAT audio tape. But what is a .dat? It's a proprietary compressed data format, used for almost nothing but Outlook attachments. Here's a story to scare the living daylights out of anyone who relies on email as an archive of one's life, as I do. If you're like me, you have thousands and thousands of vital documents in your email archive, going back years (in my case, to 1992). All your contacts and addresses and drafts of papers and pictures of nieces and nephews, and all the rest -- safely buried in your email inbox and saved message folders. For most of us, there's not much to worry about. Most modern email applications use (or can convert to) the now-standard Unix mbox format to store "mailbox" directories. But some of us may have to worry, and may not know it. A few weeks ago, my brothers and I bought a new iMac for our mother's birthday present. She had been a Windows PC user for years, and had been using Earthlink as her ISP and email provider. She had been using Earthlink's "Total Access" suite to manage her email, web browsing, etc. On a PC, that software suite uses a proprietary (and very weak) email application called "Earthlink Total Access Mailbox." However, on the Apple platform, Earthlink's "Total Access" suite does not include an email application; rather, it uses Apple's proprietary (and excellent) Mail.app to manage email. What a relief . . . Not. After setting up the iMac, I began to move my mother's files and settings from the PC. Moving her bookmarks was simple enough, entailing copying an html file from one machine to the other. Moving her address book was easy too -- a simple export as .csv (comma separated data) and and simple import into the Apple Address Book app. But then came the mail. Earthlink's "Total Access Mailbox" is a bad program, and badly hobbled. It allows a user to export archived mail in only two formats: .csv and .dat. In a .csv export, all attachments are lost, along with other important header data. However, even if one does settle for that, no other mail application I know of can simply "import" a .csv file. One must open the .csv in a database application like Excel or Access or Filemaker and adjust the fields manually, before making a series of further conversions. Even then, it's difficult. But Earthlink's .dat format export is even more useless. For while it preserves header information and attachments, it uses a proprietary compression scheme that cannot be imported into any other application other than a new installation of TA Mailbox -- on a PC. I was astounded to discover this. I called through to Earthlink's tech support, and after talking to several "support" people in Bangalore or Chennai or wherever, finally was connected (after hours of arguing) with their "Delta Force" tech support, who told me what I had begun to suspect: I was screwed. It was impossible to convert an Earthlink TA Mailbox .dat export to any format that could be subsequently imported into Apple's Mail.app, even using an intermediate email application to process the conversion. Because, in fact, you can't import Earthlink's .dat format archive into ANY other email application. The best they could offer was to tell me I had to resend each email message -- and mind you, we're talking thousands of messages -- back to my mother's email account, to be redownloaded to the Mac via Mail.app. We're talking about tens of hours of work. And in the end, this process would strip each message of its original header data -- the sender's name, the time and date of the message, and the original subject line. Needless to say I was angry, and I hung up the phone on the "Delta Force" tech in shock. Holy cow. I googled. I posted questions on tech websites. I experimented with various combinations of applications. Nada. Zip. Nothing. I tried specialized applications like the marvelous emailchemy, which can convert almost every known format of email into any other. But not Earthlink's .dat format. In the end, I came upon a solution that -- sort of -- works (though I've seen no one else recommend it -- and if you google this problem, you'll find a lot of people have it). I remembered that AOL email supported IMAP, and discovered that Total Access Mailbox for the PC could connect to my AOL account's IMAP server. Cool. So I batch transferred thousands of emails from TA Mailbox to my AOL account. When I'm done with this, it should be simple enough to use Apple's Mail.app to download these messages from the IMAP server (I use Apple's Mail.app to read my own AOL email as well, so I know this works). Granted, I will lose the original time/date data from the message headers by doing this. But at least senders and subject lines will remain the same. Even so, there are problems. A few hundred of the several thousand messages I needed to move are not transferable to AOL because of yet another format problem I won't go into. Those will have to be moved by hand, one at a time, with all header data lost. Now, I'm a geek. I spend much of my time thinking about data compression and format conversion. I like problems like this, up to a point. I can usually solve them. But my mom is like most computer users, and not a geek. I shudder to think of the millions of Earthlink customers who have email archived in TA Mailbox. Earthlink is not the world's healthiest company. Let's suppose they go belly up like Sunrocket just did. Suddenly, thousands and thousands of people will have this problem, even if they stay on the PC platform. As it is, any Earthlink user who chooses to "switch" to the Apple experience from Windows XP/Vista will encounter this problem if they were trusting enough to use Earthlink's awful, hobbled, incompatible Total Access Mailbox for the prior few years. Earthlink, it turns out, does not really support the Mac platform, all claims to the contrary aside. Who knew? Now you know. Among other things, I'd like to warn anyone choosing a new ISP to avoid Earthlink, or if you must use them as an ISP or DSL provider, to avoid using their Total Access Suite and Mailbox apps on a PC. Use Thunderbird or some other POP-mail compatible mail reader so you don't have this problem in the future. (And be forewarned as well that Earthlink has awful customer phone support; most of the "techs" I talked to did not know as much as my mother about the software they were supposedly "supporting," and knew even less about the Mac platform.) But beyond using this as an occasion to gripe about how bad my experience with Earthlink was, I want to recommend that everyone consider a robust email backup strategy that keeps email in more than one application, in more than one place. There are lots of ways to do this, but perhaps the simplest is simply to open a GMail account (with 2GB of free storage, enough for most email users) and have *all* your incoming and outgoing mail forwarded to that account as a backup. Granted, there are privacy concerns with Google, and if they concern you (as they perhaps should), you can use another ISP/mail provider (even AOL gives you a couple of gigs of storage now, or Yahoo) or simply use a second email application as your backup option (Mozilla's Thunderbird is a good choice, since it's stable, simple, uses the standard mbox format, and is likely to be around for a long time given the success of the Mozilla projects). The limits of this are apparent: if you run it on the same client machine as your primary email account, it is no safer in the event of a drive failure, so you need to back up both your primary and secondary mail application archives regularly to an offsite medium or removable optical disks or a backup hard drive you keep far away from the primary machine except when backing up). You may also have to come up with a solution to saving archives of sent mail unless your ISP allows you to keep it copied on the server and then downloaded to more than one application. Or you can use some of the exciting new personal data manager software choices -- I am currently evaluating an amazing application called DevonThink Pro, which I'll write about here soon -- which have some real advantages for managing increasingly huge and unruly collections of digital data. Or, finally, you can use the safest form of backup of all: print out hard copies of all priceless and irreplaceable email, text, and photographic documents, and store them somewhere immune from fire and water damage. In an emergency, such materials can always be scanned back in to a digital format. But never trust a format not to change, to be universally readable forever, or convertible to another format. Whether you're a professional archivist dealing with esoteric audio materials, or a casual computer user with a lot of email you can't afford to lose, backing up now also means converting formats, often. Please feel free to write with questions about this post and I will try to post more detailed answers and descriptions here later.
Prof. Aaron Fox
Update 8/29/07: From The New York Times:
"ATLANTA, Aug. 28 (AP) — . . . Internet service provider [Earthlink] . . . said it would cut 900 jobs — about half its work force — and close four offices in an effort to reduce operating costs."
|
Upcoming eventsSearchEventsToday's Weather in New York |