Since I rebuilt my systems with FreeBSD 8.1, I’ve been hounded by an error message during weekly level 0 dumps. This only happens on my /home partition, which is significantly larger than all the others combined, and only happens on the full weekly backups. The daily level 1 backups all work flawlessly. Given what I’ve learned, I’m thinking it’s just b/c the level 1 backups are done too quickly…
The Problem
The error message, “(da0:umass-sim0:0:0:0): AutoSense failed” is followed by a slew of write messages
(da0:umass-sim0:0:0:0): AutoSense failed
g_vfs_done():da0s1[WRITE(offset=19495206912, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495337984, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495469056, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495600128, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495731200, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495862272, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495993344, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19496124416, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495075840, length=131072)]error = 5
(da0:umass-sim0:0:0:0): lost device
(da0:umass-sim0:0:0:0): Synchronize cache failed, status == 0xa, scsi status == 0x0
(da0:umass-sim0:0:0:0): removing device entry
/backup: got error 6 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error
cpuid = 1
Uptime 2h25m37s
Cannot dump, Device not defined or available
Automatic reboot in 15 seconds - press a key on the console to abort
… a kernel panic, and a dead system. The keyboard doesn’t respond, so it just sits there until the machine is hard reset manually.
Bad drive? Possibly. But that would make two in a row, so I’m leaning more towards something system-related, rather than drive related.
That led me to this post on the FreeBSD forums, and that then led me to this post elsewhere on the googlewebs. The latter indicates a difference in the way soft-updates are handled in 8.x vs. 7.x.
A Solution
So… I turned off soft-updates with:
tunefs -n disable /dev/da0s1
Trying the same command with the drive mounted threw the error:
tunefs: /dev/da0s1: failed to write superblock
I knew it wouldn’t work, I just wanted to see what exactly would happen.
My only question, which the posts I found did not answer, was whether to turn soft-updates off on the source /home partition, or the target USB backup drive. I opted for the target given that it’s only used for backups rather than day-to-day I/O operations, and it’s the quicker and easier than rebooting into single-user mode to disable soft-updates on my /home partition. So I tuned the drive, crossed my fingers and launched the backup process again.
The result:
...
DUMP: 30.73% done, finished in 2:26 at Sun Feb 6 13:45:28 2011
DUMP: 33.06% done, finished in 2:21 at Sun Feb 6 13:45:39 2011
(da0:umass-sim0:0:0:0): AutoSense failed
g_vfs_done():da0s1[WRITE(offset=87491444736, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491575808, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491706880, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491837952, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491969024, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87492100096, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87492231168, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87492362240, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=114688, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87096983552, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87289675776, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87482368000, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491313664, length=131072)]error = 5
DUMP: 35.36% done, finished in 2:17 at Sun Feb 6 13:46:03 2011
DUMP: 37.73% done, finished in 2:12 at Sun Feb 6 13:46:00 2011
...
DUMP: 96.69% done, finished in 0:07 at Sun Feb 6 13:45:57 2011
DUMP: 99.17% done, finished in 0:01 at Sun Feb 6 13:45:42 2011
DUMP: DUMP: 39140287 tape blocks
DUMP: finished in 12691 seconds, throughput 3084KBytes/sec
DUMP: level 0 dump on Sun Feb 6 10:12:12 2011
DUMP: DUMP IS DONE
No kernel panic. No hard reset required. It just picks up where it left off and goes along it’s merry way. I’m fairly confident that file integrity is being maintained, but I’ll be testing that to be sure.
Other than soft-updates prevented it from recovering from the loss of the USB drive, I’m not sure exactly what the problem is. Why was the USB drive lost to begin with? Is it a timeout issue? An I/O issue related to too much data in the pipe? A RAM issue? I’ve two more GB’s to install, but I’ve been waiting to get more duration data to compare against before installing it.
For now, though I’m going to watch it closely, I’m considering the issue tentatively solved.