Two weeks ago, my Dad shipped me a box that, to my joy, contained the original Apple II Prince of Persia source code archive I’d stowed away 20 years ago and had given up for lost.
Despite my eagerness to see what’s on those disks, I’ve yet to pop them in a drive. As readers of this site have cautioned me, digital media degrade with age; if the disks are in a fragile state, normal handling could damage them further and even render them unreadable.
In today’s guest post, digital archivist Jason Scott explains why reading 20-year-old floppy disks is trickier than it sounds — and why he’s volunteered to fly from NY to LA on Monday with special equipment to tackle the job himself.
Monday will be an exciting day. Much like opening a long-sealed sarcophagus, I truly have no idea whether we’ll find what we’re hoping for, or just data dust. For anyone who wants to share the suspense, we’ll be live-tweeting our progress. Hashtag: #popsource. (I wanted to use #sourcecode, but it was taken!)
Meanwhile, here’s Jason’s story, offering a glimpse behind the scenes of a profession whose existence I couldn’t have foreseen or imagined when I was making Prince of Persia in the 1980s: Digital archeologist.
I first heard about Prince of Persia in a somewhat strange fashion; a high school friend said that David’s older brother was working on a new game to follow up his big hit Karateka. I asked what it was about, and he said it was something about Persian princes and acrobatics. I left it at that, but I knew it’d be great, if Karateka was any indication.
I went to Horace Greeley High School after Jordan, and knew his brother, David, who graduated the same year as me. David was the motion model for Prince of Persia. Jordan was this talented figure somewhere out in the fog of the real world, who was making actual, sold-everywhere games with a company I really liked and respected (Broderbund), and was basically living the dream I hoped to live one day: game developer.
(My own dream was fulfilled — I did work for a short time at Psygnosis, makers of Wipeout, as a tech support phone monkey, and another year stint at a startup game studio, before moving on to other places in the computer world.)
It wasn’t until a couple years ago that I moved away from jobs like system administration and backup-watcher into the world of computer history and documentary filmmaking, where I am now. As one of the Adjunct archivists of the Internet Archive, I seek out new collections of data and help preserve current ones — anything from digitized books and audio to long-forgotten shareware CD-ROMs and obscure information files uploaded years ago. It’s a great time, and most importantly, it affords me the flexibility to travel when I’m needed somewhere.
So this was why, when Jordan announced he’d gotten back the Prince of Persia disks he had in his own collection, a lot of friends of mine started linking me to the article and saying “Well?” It was a perfect fit. I had seen Jordan for a few moments after his recent appearance at GDC, so it made sense to have us talk about my coming in to oversee the retrieval of data from the disks. What a nice journey — from hearing the game was being worked on in my youth to helping make sure Jordan’s work lasts for future generations!
Pulling data off dead media in the present day is both easier than it ever has been, and as frustrating as ever. (When I say “dead,” I mean the format. You can’t really go down to the local store and buy a box of 5.25″ floppy disks any more, nor would you want to — a USB stick will give you well over a million times the space and cost you almost nothing.) Thanks to a lot of work by a lot of different people, pulling the data off these floppies can now be as simple as putting it into a vintage disk drive, or a modified recent one, and pulling the individual sectors right into a file that can go into the internet in seconds. But just as it’s so trivial to do this, any clever tricks done to the floppy that made sense way back then could make it a puzzle wrapped in a goose chase to extract. Not to mention, these discs are old — in this case, at least twenty years old, and they’re just magnetic flaps of plastic sealed inside a couple of other sheets of plastic. A lot can go wrong, and no extraction is guaranteed.
It’s the Friday before I hop into a plane in NY — ironically, just miles from where Jordan’s disks had rested comfortably in the back of a closet for 20 years — to Los Angeles, where he works and lives these days. Once I arrive there, I’ll be joined at the site by someone I reached out and tapped due to his reputation within and outside the Apple II community: Tony Diaz. He’s one of a tireless group of vintage hardware and software collectors working to ensure an entire swath of computing history isn’t lost to the shadows. With a collection of Apple-related hardware that is likely one of the largest in the world, accompanied by attempts to catalog and document as much of it as possible, I knew Tony would be the best partner in this project. Tony will be bringing over a pile of Apple II hardware, maintained and cleaned, ready to take these vintage floppies in.
However, not all of these disks are off-the-shelf in terms of their formats. Since Jordan did work with a commercial game company, and because there were attempts to prevent wholesale
duplication of these for-sale games at the time, some these floppies have various levels of “copy protection” on them — modifications in how the data is written, in-code checks to analyze the floppy disk’s state and run or not run based on the result, and so on. I’m not here to start a debate on whether this was the right or wrong move at the time — there’s plenty of screen space spent on that discussion elsewhere. But it does translate to a headache for the present day when a straight disk read doesn’t just “work.”
Enter pieces of hardware such as the DiscFerret, CatWeasel, and Kryoflux — all of them modern hardware dedicated to pulling magnetic readings of the floppy disks, eschewing any cares about operating system, structure and copy protection. Think of them as taking a magnetic photograph of the disk. There’s quite a bit of science involved and a lot of debates on what the best approach is for getting the data, but on the whole, the principle is the same: make a floppy drive read the magnetic flux of the floppy, not unlike how a medical scanner approaches the human body, and from that “image,” pull out what the data setup is on the floppy. This resulting magnetic image is huge, size-wise, relative to the original floppies — these 140k (that’s kilobytes) floppies will have a multiple-megabyte magnetic read result from it. But we’re in the space-car future; that mass of data is nothing to us now.
This week, the DiscFerret team has been working overtime, pulling some all-nighters to test and fabricate a hardware setup to do the magnetic readings, and that machinery was packed and FedExed to Jordan yesterday. The in-depth details of what hurdles have to be taken into account with some floppy drive hardware is outside the scope of this already-long post, but rest assured, there are hurdles, and success is not guaranteed.
And let’s make that clear — we have no idea what’s on these floppies! When we bring them in, they could be completely empty (although that is really, really, really unlikely). Factors from quality of manufacture to storage method to phase of the moon could lead to there being lost data. But be assured we’re going in with the respect these artifacts deserve.
See everyone in La-La land!