Monday, November 20, 2006

Apple's Backup 3 - Hopeless Junk

What good is backup software that can't perform a restore? That's what I've been wondering in the week since the main hard drive in my PowerMac G5 died. Having lost another drive earlier in the year for which I had no backup, I'd finally setup a proper backup system, using external SATA drives as the destinations, and Apple's Backup version 3.1 (a part of the ".Mac" package) to perform nightly incremental backups. So, my latest drive failure seemed like one failure too many for the year, but one for which I was prepared. Or so I thought until I tried to restore my files. Then I found that Backup 3.1 would only restore a certain number of the files early in the backup set before it would crash. Any attempt to restore other files from the set resulted only in a crash. Specifically, Backup would proceed to very slowly consume about 2.1 GB of virtual memory, and then it would crash, writing the following to "backup.log":

Backup(1104,0x1ace800) malloc: *** vm_allocate(size=1069056) failed (error code=3)
Backup(1104,0x1ace800) malloc: *** error: can't allocate region
Backup(1104,0x1ace800) malloc: *** set a breakpoint in szone_error to debug

As best I can tell, this means that between the 2.1 GB of virtual memory it used-up (as observed in Activity Monitor), it also consumed a further 1.9 GB of combined real, private and shared memory, and then died because its 32-bit address space was exhausted. Brilliant. Thank you so much, Apple, for backup software that can backup, but not restore. That's innovative, alright, but not in a good way.

Needless to say, I submitted many crash reports, and a proper bug report, complete with a plea for help, but none has been offered in the following week.

I have, however, salvaged a fair number of files by manually mounting the ".sparseimage" files hidden inside the backup "files", but with seventy incrementals since the last full backup, sifting through everything by hand is a major problem. And files with resource forks, along with executables, can't be salvaged that way at all. Oh, thank you, Apple.

So, based on this experience, let me warn anyone else out there who's using Apple's Backup program: Don't. If your backup sets are small, it may well work for you, but once they grow large enough, you'll reach the same point I did, where it'll crash rather than restore your files. And you won't know that you've reached that point until it's too late.

S.M.A.R.T.?

As an aside, the drive that died (a 250 GB Maxtor that Apple sold me with the G5) had a controller that supports SMART (Self-Monitoring, Analysis and Reporting Technology), and I had installed the DiskWarrior extension that routinely checks the SMART status of the mounted drives and is supposed to display an alert when a drive reports that it is beginning to fail. No such warning appeared. This is the second SMART-equipped mechanism that I've lost (but the first since installing the DiskWarrior extension), and I haven't seen SMART do a thing. Has anyone out there actually seen SMART do its stuff?

Aperture Vaults

The one thing that did work well in this mess was the backup scheme ("vaults") that Apple's Aperture team integrated into that application. There was no problem restoring from the vault I'd created on one of my backup drives, and because support for vaults is well integrated into Aperture, I'd (1) actually setup a vault (the splash screen gently nags you to do so), and (2) kept it current. If any members of the Aperture team happen to read this, let me just say: Thank you. And please go beat the metaphorical crap out of whoever's responsible for the Backup application; they make everybody else at Apple look bad.

28 comments:

  1. I think it's bad when someone who works for .Mac QA tells you "Remember: It's called 'Backup', not 'Restore.'"

    ReplyDelete
  2. Anonymous10:54 PM CST

    I have an old G3 iBook with a Toshiba drive; System Profiler, Disk Utility and SMARTReporter all say it is failing. Too bad it won't say why or when...

    ReplyDelete
  3. Anonymous11:20 PM CST

    I've seen SMART work before. It can't always detect a crash before it comes, though. Basically, if your drive passes the SMART checks, you might be OK or you might not. If your drive fails the SMART checks, replace it immediately. SMART is meant to try and give you an early warning, but a passing grade doesn't mean there's nothing wrong.

    ReplyDelete
  4. I've had two bad hard drives in client machines caught by SMART monitoring, so it's evidently not _totally_ ineffective.

    ReplyDelete
  5. I have twice had laptop drives fail that SMART showed as failures. Unfortunately I did not have the DiskWarrior monitor (I wish I did) but experienced some strange behavior and went to Disk Util and saw the SMART status was failing. Each time I managed to salvage "some" data but I have a feeling that an earlier alert would have been a big boon since they were both slow death situations. In other words, the drive could retrieve most data but it was slow as mollases and with an eventual total failure. Maybe I should spring the cash for DiskWarrior. It just seems so expensive though. I wish they had a $20 rent option where I could download and use for a week...

    ReplyDelete
  6. I have actually seen S.M.A.R.T. status tell me something useful. The problem with it is that you have very little time from SMART failure to HD failure. In my case it was about 15 minutes from me seeing the warning (then copying a Home folder) to the HD never working again.

    Sorry about your misfortune. I hate Backup also. That's why I use Deja Vu.

    ReplyDelete
  7. AFAIK, OS X doesn't have a gui smart monitor (no idea if there's a BSD util though). TechTool has one, but I'm not up to Intel, so I don't know if it's Universal or not. But FWIW, I once had a drive complain that it had detected it was about to fail - on a server running windows.

    ReplyDelete
  8. So sorry to hear about your backup troubles. I have tried several backup programs, and I quickly filed Backup in the junk for my part. The clear winner for me is SuperDuper, which you get from Shirtpocket Software (http://www.shirt-pocket.com/SuperDuper/). Brilliant software, does exactly what it should, and the files are saved in proper disk images. Try it, you wont be disappointed.

    ReplyDelete
  9. Anonymous2:08 AM CST

    I use SuperDuper. I have a 250 GB internal drive on my Mac, and *three* external 250 GB drives. SuperDuper is dumb and simple: it just makes a bootable clone of my internal drive on each external. One is updated daily, one twice a week, one every few weeks (it's stored offsite). Other than the offsite one, it's all automated. I can verify the backup just my clicking on the drives on the desktop and looking at them, or by booting up from them.

    I can't go back and restore my data to an arbitrary date, and if something goes missing or bad and I don't notice it for three months, it's going to be written over and gone. But on the other hand, if my internal drive dies, I just boot from the external and I'm instantly in business, no restore, no installing apps, nothing.

    ReplyDelete
  10. Have a look at SMARTReporter - I use it and have seen drive failures before they became critical. It sits in the menubar and is convenient.

    As mentioned above - SMART only looks at certain drive parameters (which WILL change over time as the drive wears out), and uses a combination of them to warn you if the drive is failing in a predictable way. This is useful, but not complete.

    ReplyDelete
  11. diskutil info / gives you information on the root volume, diskutil info /Volumes/volumeName works for any other mounted volume.

    You can use grep or awk to filter the SMART status:

    diskutil info / | awk '$1 ~ "SMART" { print $3}'

    ReplyDelete
  12. Anonymous4:43 AM CST

    There is in fact a nice, free smart status reporter for the Mac called SmartReporter:
    http://www.versiontracker.com/dyn/moreinfo/macosx/23232

    It didn't indicate the failure of my Powerbook drive beforehand, but as others have pointed out, that is probably not a sign of smart not working.

    Alex

    ReplyDelete
  13. I've had two Hard Disks fail the SMART and both eventually died. One was obvious: loud clicking, several kernel panics. It lasted long enough to make a back up. The other one just died unexpectedly, I saw a SMART status as failing in Disk Utility, shut down to boot into target disk mode and back up and it wouldn't start up. The bad thing was, this was two weeks after the first one went and I hadn't got round to backing up.

    As for Apple Backup, it's widely regarded as the worst product they make. Try to use something else until Time Machine comes along.

    ReplyDelete
  14. I've seen many SMART errors, and plenty which have saved a client's data, but never on a Mac. I guess Apple is just going to ignore the obvious shortcomings of Backup for now and devote all their attention to TimeMachine.

    ReplyDelete
  15. For those who don't have DiskWarrior,
    try SMARTReporter, which is free.

    http://homepage.mac.com/julianmayer/

    ReplyDelete
  16. SMARTReporter does provide a menu bar icon that shows different colors for "verified", "unknown", and "failing" states of your drive.

    ReplyDelete
  17. Scott Frazer12:33 PM CST

    This truly sucks, but it does illustrate a really good point.

    Incremental backups shouldn't be used exclusively.

    at least once a week you should be doing a full backup, then do incrementals the rest of the time.

    ReplyDelete
  18. I've finally weaned myself almost completely off of Backup, except for that "Personal Data & Settings" plan, which I have saving stuff to my iDisk.

    The mistake I made was to back my wife's iBook up to a LaCie firewire drive without reformatting it from the factory default fat-32 to HFS+, and as a result any backups that were over 4GB (ie. her iPhoto library) is impossible to mount - Backup simply fails with an "internal application error." I've written to Apple, filed bug reports, etc., and not gotten any help either. If anyone has a solution for that, by the way, I'd love to hear it. Apple should at the very least show a warning if someone attempts to use the software with a fat32 volume.

    ReplyDelete
  19. Anonymous5:00 PM CST

    You need to test the "restore" function of any backup system before you consider it installed.

    Buy a blank external firewire drive, restore to that, and then make sure you can boot from it. Then store that hard disk in another building, in case your house burns down, floods, or someone breaks in and steals your electronics.

    ReplyDelete
  20. Anonymous6:03 PM CST

    I've used 'smartmontools' (http://smartmontools.sourceforge.net/) before to further diagnose drives beyond their basic SMART status.

    At one time I KNEW I had a hard drive problem because of a funny noise it would make now and then. The SMART status would always show as 'verified' in Disk Utility. After some searching around, I found smartmontools and gave it a try.

    It's a command line util that gives you all sorts of info about your drive. It also has a drive monitoring tool that can e-mail you if it detects an abnormality (which I was interested in since the machine in question was a home machine, and I wanted to know about any issues while I was at work).

    It has a learning curve, of course, but I found it to be fairly easy to use, and it reported problems that apparently weren't enough to trigger the SMART status to report anything other than 'verified'. It was enough evidence for me to look into a replacement drive.

    ReplyDelete
  21. Anonymous9:06 PM CST

    I use and love Superduper.

    You probably already have tried it, but I have had some amazing sucess using dd to recover data from the dead disk.

    There is a good thread here:

    http://www.macosxhints.com/article.php?story=20050302225659382

    and I have had success with this:
    dd bs=512 if=/dev/rXX# of=/some_dir/foo.dmg conv=noerror

    use diskutil list to get the drive device number, drive should NOT be mounted when you run dd.

    ReplyDelete
  22. smartmontools works extremely well. The trick is remembering that the smart status is worthless because drive vendors have chosen to have the drive status change from "OK" only when you've already lost data (I've heard that this reduces warranty claims for marginal disks which seems short-sighted but depressingly plausible).

    We use smartmontools on all of our *nix systems - since it's not made by a drive vendor smartd sensibly reports an error as soon as the drive reports its first physical failure (even if the failed sector was recovered successfully & remapped). Since using this we've managed to avoid any painful data loss even though a fair number of drives have died - in most cases I've been able to simply clone the failing system using rsync and boot the clone w/o incident.

    ReplyDelete
  23. I had exactly the same experience with Backup - I could not restore files without a crash. Like you, I ended up going to the Backup stored files and restoring some things from there (which worked happily).

    I gave up on Backup and use SuperDuper to other drives, and also have mirrored external drives that I swap out once every other week or so.

    ReplyDelete
  24. I've said it before and I'll say it again:
    There is absolutely no excuse for not using the best backup solution out there, namely rsync.
    Apple, as part of their campaign to destroy anyone's confidence in the mac platform, shipped a completely fscked up version with Tiger, but, thanks to the miracle of Open Source, you can easily acquire versions that work properly.

    Full details for how to use rsync to backup (and restore) either locally or over a network, along with helpful scripts that check for all the things that can go wrong and warn you can be found here:
    http://name99.org/wiki99/index.php/Backup_Hardware

    Maynard Handley

    ReplyDelete
  25. Anonymous6:25 AM CST

    I have a bank of hard drives in my PowerMac tower - a mix of SCSI and ATA. Disk Warrior, TechTool Pro and Disk Utility all consistently reported a S.M.A.R.T. failing-disk condition for one of the ATA drives. TechTool Pro indicated that this was a "threshold exceeded" problem. Nevertheless, the drive still seemed to perform well and all read/write functions performed correctly.

    Believeing I was on a countdown, I successfully backed up all the data and ordered a replacement drive. I then decided to scrub the failing drive to prevent unauthorized data extraction once it had been dumped.

    Since wiping and reformatting the drive, I am now no longer getting the S.M.A.R.T. failing status report. I conclude that either (a) the original failure indicated too many failing blocks which reformatting has automatically excluded and the S.M.A.R.T. status has been reset, or (b) that the S.M.A.R.T. status can be adversely affected by software formatting issues, though this seems unlikely.

    Can anyone throw any light on this?

    ReplyDelete
  26. You might want to monitor the console log for disk i/o errors (bad sectors) that might occur before SMART detects a problem (if at all).

    ReplyDelete
  27. Anonymous11:29 AM CST

    Hi,

    and for those with musical talent the "Backup Song".

    ReplyDelete