Update to Malware Protecting Script

In an attempt to be a little more friendly in terms of bandwidth to the strapped folk over at Malware Domains, I’ve retooled the script I wrote about in the post To Be Protecting Against the Malware.

I’ve added some lines to take advantage of remote zipped files (.zip), which will help them by reducing the number of bits we’re pulling from them.

I’ve added some lines to copy the downloaded malware zones file to other servers behind my firewall, which will help them by not making individual connections from each server to pull the files. I just set up a cron job on each internal “slave” server to bounce named every morning timed for after this process is complete.

Here’s the updated code. It is, as is my wont, rather verbose. It is considerably more verbose than other examples out there that take care of this same problem, but as I said, such is my wont.

The URLS array is filled with fake hosts right now b/c the zipped format is still in testing. When the folk at malwaredomains.com think it’s ready for public consumption, I’ll put the real hosts back in.

Also, it’s relatively untested, and I expect to be tweaking it. Use at your own risk.

On the “master” server, I’m using this…

#!/usr/local/bin/bash

# To know where script is running
HOSTNAME=$( hostname )

# To put file where named can see it
BLACKHOLEDIR=/var/named/etc/namedb/blackhole

# To name file so we know what named seeing
TMPZONEFILE=tmp.malwaredomains.zones
ZONEFILE=malwaredomains.zones
ZONEFILEBACKUP=malwaredomains.zones.bak

# To get updated file from remote server
URLGRABBER=/usr/local/bin/curl
USERAGENT="Malware Domain Grabber ( ${HOSTNAME}; unix; BASH )/0.1"

# To keep quiet while am getting file
URLGRABBEROPTS="-s -f"

# To know where file is hosted
URLS=( host1 host2 host3 host4 )
TIMESTAMPFILE=timestamp

# To know how to decompress the file
UNZIPCMD=/usr/local/bin/unzip
UNZIPOPTS="-o -qq"

# To copy files to other servers so that we are only
# pulling the files once, though we have multiple
# DNS servers in house

HOSTS=( host1 host2 host3 host4 )

# MOUNTCMD: The mount command
MOUNTCMD=/sbin/mount
UMOUNTCMD=/sbin/umount

# FSTYPE: The filesystem type of the mounted partition
FSTYPE=nfs

# MOUNTDIR: The directory that the dumps will be written to
MOUNTDIR=/mnt

# To control bind
NAMEDCMD="/usr/sbin/rndc reload"

#==============================================================

# Get start time so we can know how long this thing runs
START=$( date +%s )

# Make our working directory the location of the blackhole files
cd ${BLACKHOLEDIR}

# Copy the current timestamp file to ${TIMESTAMPFILE}.old so we can
# make a comparison between what we have and what's out there now.
if [ -f ${BLACKHOLEDIR}/${TIMESTAMPFILE} ]; then
	cp ${BLACKHOLEDIR}/${TIMESTAMPFILE} ${BLACKHOLEDIR}/${TIMESTAMPFILE}.old
fi

# Attempt to download the timestamp file and zone file from each mirror.
# Break out of the loop at the first successful download of a zone file,
# otherwise, try each one in turn

# Assume there are no updates available
NEW=0

for URL in "${URLS[@]}"; do
	echo "Attempting to download from ${URL}"
	echo " Checking timestamps..."
	${URLGRABBER} ${URLGRABBEROPTS} -A '${USERAGENT}' -o ${BLACKHOLEDIR}/${TIMESTAMPFILE}.zip ${URL}/${TIMESTAMPFILE}.zip

	if [ $? -ne 0 ]; then
	echo "  ... timestamp download from ${URL} failed! Code: $?"
	# Move on to next URL so we keep the timestamp/zonefile pair intact
	continue
	else
	if [ -f ${BLACKHOLEDIR}/${TIMESTAMPFILE} ]; then
		# Unzip the new timestamp file over the old old one
		${UNZIPCMD} ${UNZIPOPTS} ${BLACKHOLEDIR}/${TIMESTAMPFILE}.zip

		# Do a little cleanup
		rm -f ${BLACKHOLEDIR}/${TIMESTAMPFILE}.zip

		OLDTIMESTAMP=$( cat ${BLACKHOLEDIR}/${TIMESTAMPFILE}.old )
		NEWTIMESTAMP=$( cat ${BLACKHOLEDIR}/${TIMESTAMPFILE} )

		if [ ${OLDTIMESTAMP} -ge ${NEWTIMESTAMP} ]; then
			echo " ... no new updates."
			# No new updates on this server... but how well are the various mirrors
			# kept in sync?  Let's try the others. This is a tiny transfer, and it's
			# only once a day, so it's pretty cheap.
			continue
		fi
	else
		# Timestamp file does not exist. Create it.
		${UNZIPCMD} ${UNZIPOPTS} ${BLACKHOLEDIR}/${TIMESTAMPFILE}.zip
		rm ${BLACKHOLEDIR}/${TIMESTAMPFILE}.zip
	fi
	fi

	# Backup and copy file to final location for named to find
	# (via "include" directory in named.conf)
	echo "Backing up zone file"
	cp ${BLACKHOLEDIR}/${ZONEFILE} ${BLACKHOLEDIR}/${ZONEFILEBACKUP}

	echo "Retrieving new zone file from ${URL}..."
	${URLGRABBER} ${URLGRABBEROPTS} -o ${BLACKHOLEDIR}/${ZONEFILE}.zip ${URL}/${ZONEFILE}.zip

	if [ $? -ne 0 ]; then
		echo "  ... zonefile download from ${URL} failed!  Code: $?"
		# Oops.  Try the next server.  If this is the last, then ${NEW} is still
		# set to 0, and we'll be done. Better luck tomorrow...
		continue
	else
		# We have a new timestamp, and were able to download the zone file from
		# the same server we downloaded the timestamp from.  Set ${NEW} to 1 and
		# get out of the loop. No need to check further.

		echo "Unzipping new zone file..."
		if [ -f ${ZONEFILE}.zip ]; then
			${UNZIPCMD} ${UNZIPOPTS} ${BLACKHOLEDIR}/${ZONEFILE}.zip
			rm ${ZONEFILE}.zip
			# Rename the zone file temporarily to allow sed to work on it later, and
			# and in that process, rename it back to the name that named knows.
			mv ${ZONEFILE} ${TMPZONEFILE}
		else
			echo "No new zone file..."
			exit
		fi

		NEW=1
		break
	fi
done

# If ${NEW} hasn't been set, then we either error'd out of all servers, or there are no
# new files. Either way, we're done.
if [ ${NEW} == 0 ]; then
	exit 1
else
	# Disable name checking for only those domains with underscores,
	# so we don't have to turn off name checking globally.

	SEARCH='_'
	FIND='blockeddomain.hosts";};'
	REPLACE='blockeddomain.hosts"; check-names ignore;};'

	# Get a count of the zones from the last update
	OLDZONECOUNT=$( cat ${BLACKHOLEDIR}/${ZONEFILEBACKUP}|grep "^zone"|wc -l )

	echo "Disabling checking on domains with underscores"
	sed "/${SEARCH}/ s/${FIND}/${REPLACE}/g" ${BLACKHOLEDIR}/${TMPZONEFILE} > ${BLACKHOLEDIR}/${ZONEFILE}
	rm -f ${BLACKHOLEDIR}/${TMPZONEFILE}

	# Get a count of the zones from the current update
	NEWZONECOUNT=$( cat ${BLACKHOLEDIR}/${ZONEFILE}|grep "^zone"|wc -l )
	echo "${OLDZONECOUNT} Previous Zones"
	echo "${NEWZONECOUNT} Current Zones"

	echo "Reloading named"
	${NAMEDCMD}

	if [ $? -ne 0 ]; then
		echo "  ... failed! Restoring zone file"
		cp ${BLACKHOLEDIR}/${ZONEFILEBACKUP} ${BLACKHOLEDIR}/${ZONEFILE}

		echo "Reloading old zones in named"
		${NAMEDCMD}

		if [ $? -ne 0 ]; then
			echo "  ... failed again!! You'll want to see to that."
		fi
	fi

	echo "Copying files to other internal network servers..."

	for HOST in "${HOSTS[@]}"; do
	DUMPDEVICE=${HOST}:${BLACKHOLEDIR}
	MOUNTRESULTS=$( ${MOUNTCMD} | grep "${DUMPDEVICE} on ${MOUNTDIR}" )

	if [ "${MOUNTRESULTS}" == "" ]; then
		echo ""
		echo "Mounting ${DUMPDEVICE} on ${MOUNTDIR}"
		${MOUNTCMD} -t ${FSTYPE} ${DUMPDEVICE} ${MOUNTDIR}
		if [ $? = 1 ]; then
			echo " ... failed. Files will not be copied."
			continue
		else
			echo " ... succeeded"
		fi
	else
		echo "${HOSTNAME}:${DUMPDEVICE} already mounted on ${MOUNTDIR}"
	fi

	# Copy the files to ${MOUNTDIR} as a temporary file. On the remote server,
	# we'll manage bouncing named if necessary.
	echo ""
	echo "Copying ${BLACKHOLEDIR}/${ZONEFILE} to ${TMPZONEFILE}"
	cp ${BLACKHOLEDIR}/${ZONEFILE} ${MOUNTDIR}/${TMPZONEFILE}
	if [ $? = 1 ]; then
		echo "... Failed to copy ${ZONEFILE}! You might want to see to that."
	fi
	# Umount the backup filesystem
	echo ""
	echo "Unmounting ${MOUNTDIR}"
	${UMOUNTCMD} ${MOUNTDIR}
	if [ $? = 1 ]; then
		echo " ... failed. You might want to see to that."
	else
		echo " ... succeeded"
	fi
	done

	END=$( date +%s )
	RUNTIME=$(( ${END} - ${START} ))
	H=$(( ${RUNTIME}/3600 ))
	M=$(( ( ${RUNTIME}/60 ) % 60 ))
	S=$(( ${RUNTIME} % 60 ))

	echo "Malware zonefile download on ${HOSTNAME} complete in"
	echo "${H} hrs, ${M} mins and ${S} secs (${RUNTIME} secs)"

	exit
fi

On the “slave” servers, I’m using this…

#!/usr/local/bin/bash

# To put file where named can see it
BLACKHOLEDIR=/var/named/etc/namedb/blackhole
ZONEFILE=malwaredomains.zones
TMPZONEFILE=tmp.malwaredomains.zones

# To control bind
NAMEDCMD="/usr/sbin/rndc reload"

if [ -f ${BLACKHOLEDIR}/${TMPZONEFILE} ]; then
	echo "New zone file exists..."
	# Rename the zone file to back it up
	echo "Backing up current zone file."
	mv ${BLACKHOLEDIR}/${ZONEFILE} ${BLACKHOLEDIR}/${ZONEFILEBACKUP}
	# Rename the tmp file to the name the daemon can find
	echo "Replacing it with the new zone file and removing the temp file."
	mv ${BLACKHOLEDIR}/${TMPZONEFILE} ${BLACKHOLEDIR}/${ZONEFILE}

	# Reload named.
	${NAMEDCMD}

	if [ $? -ne 0 ]; then
		echo "    ... failed! Restoring zone file"
		cp ${BLACKHOLEDIR}/${ZONEFILEBACKUP} ${BLACKHOLEDIR}/${ZONEFILE}

		echo "Reloading old zones in named"
		${NAMEDCMD}

		if [ $? -ne 0 ]; then
				echo "    ... failed again!! You'll want to see to that."
		fi
	fi
else
	echo "No update.  Quitting..."
fi

To Be Protecting Against The Malware

Last night, my wife called me into the office with an alarming “It says it’s infected with malware!” Needless to say (and yet I’m going to say it anyway) I hurried into the room to see what the hullabaloo was all about.

Sure enough, there was a window exclaiming the existence of not one or two, but quite a few malware infections.

It fooled her, and damn if that stupid pop-up didn’t nearly fool me too! Truth be told, it did, if only for a second. Those malware serving fake malware pop-up warnings are clever.

It got me to thinking.

Then Osama bin Laden was shot in the head, and malware peddlers started leveraging our insatiable appetite for news about it (the sick bastards).

That got me thinking more.

It reminded me of the malware peddlers that took advantage of the quake in Japan recently. Now those are some seriously sick bastards.

Those events all in quick succession and all that thinking led me to this.

A little ditty that downloads the bind formatted zone file from MalwareDomains.com, moves it to where Named can see it, and reloads Named zone files if the download is complete. I’d verify the file if they provided an md5 of the zones file. But they don’t. Not that I could find, anyway.

I don’t even begin to hope to eliminate the risk of malware infected sites, but I think this is a positive step towards cutting off malware source domains which might, in turn, help against sites on legitimate domains that happen to be infected. As of today, May 3rd, 2011, there are nearly 10,000 domains in the latest file. That has to be nearly all of them.

Right?

I’ll try it out for a while and see what happens.

BTW, this only works if you’re running your own DNS. If not, you’re at the mercy of your ISP or whatever DNS you choose to use. There are plenty of options out there, and they’re not all horrible.

First, the script, which pulls down the latest malware domains zones file from malwaredomains.com, fixes some problems with underscores in the subdomains, copies the fixed zones file to the named chroot, and reloads the named configs.

#!/usr/local/bin/bash

# To know where script is running
HOSTNAME=$( hostname )

# To put file where named can see it
NAMEDDIR=/var/named/etc/namedb

# To name file so we know what named seeing
ZONEFILE=malwaredomains.zones

# To have a file for sed to work on
TMPZONEFILE=tmp.malwaredomains.zones

# To get updated file from remote server
URLGRABBER=/usr/local/bin/curl

# To keep quiet while am getting file
URLGRABBEROPTS="-s -S"

# To know where file is hosted
#URL=http://www.malwaredomains.com/files/spywaredomains.zones
URL=http://mirror1.malwaredomains.com/files/malwaredomains.zones

# To control bind
NAMEDCMD="/usr/sbin/rndc reload"

#==============================================================

# Get start time so we can know how long
START=$( date +%s )

# Get directory we're running from
SCRIPTDIR=$( dirname $0 )

cd ${SCRIPTDIR}
if [ $? -ne 0 ]; then
    echo "ERROR: Unable to cd to ${SCRIPTDIR}! AbOrTinG!!"
    exit 1
fi

# If we were executed like "./whatever.sh" - set SCRIPTDIR to the pwd
if [ "${SCRIPTDIR}" == "." ]; then
    SCRIPTDIR=$( pwd )
fi

echo "Script is running from ${SCRIPTDIR}"

# Download the zones file in bind format to a temporary location.
# We don't want to overwrite what we already have until we're sure
# the download worked

echo "Downloading file from ${URL}"
${URLGRABBER} ${URLGRABBEROPTS} -o ${SCRIPTDIR}/${ZONEFILE} ${URL}

# Check for errors.  If the file downloaded, then move on, but if not
# we don't want to reload named without the previously updated
# malware domain list

if [ $? -ne 0 ]; then
    echo "    ... download failed! Error: $?"
    exit 1
else
    # Disable name checking for only those domains with underscores,
    # so we don't have to turn off name checking globally.
    SEARCH='_'
    FIND=';};'
    REPLACE='; check-names ignore;};'

    echo "Disabling checking on domains with underscores"
    sed "/${SEARCH}/ s/${FIND}/${REPLACE}/g" ${SCRIPTDIR}/${TMPZONEFILE} > ${SCRIPTDIR}/${ZONEFILE}

    # Get a count of the zones from the last update
    OLDZONECOUNT=$( cat ${NAMEDDIR}/${ZONEFILE}|grep "^zone"|wc -l )

    # Copy file to final location for named to find
    #(via "include" directory in named.conf)
    echo "Copying file from ${SCRIPTDIR} to ${NAMEDDIR}"
    cp ${SCRIPTDIR}/${ZONEFILE} ${NAMEDDIR}

    if [ $? -ne 0 ]; then
        echo "    ... failed! AbOrTinG!!"
        exit 1
    fi

    echo "Reloading zones in named"
    ${NAMEDCMD}

    if [$? -ne 0]; then
        echo "    ... failed! You'll want to see to that."
    fi

    # Get a count of the zones from the current update
    NEWZONECOUNT=$( cat ${NAMEDDIR}/${ZONEFILE}|grep "^zone"|wc -l )
    echo "${OLDZONECOUNT} Previous Zones"
    echo "${NEWZONECOUNT} Current Zones"
fi

END=$( date +%s )
RUNTIME=$(( ${END} - ${START} ))
H=$(( ${RUNTIME}/3600 ))
M=$(( ( ${RUNTIME}/60 ) % 60 ))
S=$(( ${RUNTIME} % 60 ))

echo "Malware zonefile download on ${HOSTNAME} complete in"
echo "${H} hrs, ${M} mins and ${S} secs (${RUNTIME} secs)"
exit 0

Then, the cron job to update the list on a daily basis:

35 0 * * * /root/bin/malwaredomains/malwaredomains.sh 2>&1 | mail -E -s "Malware Domain Named Update" me@here.com

Then, the blackhole host file that all those zones in the malwaredomains.com download refer to. Careful with this one, and you’ll want to replace the domains with something a little more relevant:

$TTL    86400           ;one day
@ IN SOA ns0.example.net. hostmaster.example.net. (
        2011050100  ; serial number YYYYMMDDNN
        28800       ; refresh 8 hours
        7200        ; retry 2 hours
        864000      ; expire 10 days
        86400       ; min ttl 1 day
)
        NS      ns0.example.net.
        NS      ns1.example.net.
        A       127.0.0.1
*   IN  A    127.0.0.1

Finally, the line in the named.conf file (in my case, in the internal view) to call on the recently downloaded zones file:

include /etc/namedb/malwaredomains.zone

That should do it!

This is what I receive in my inbox after every update (daily for me):

Script is running from /root/bin/malwaredomains
Downloading file from http://mirror1.malwaredomains.com/files/malwaredomains.zones
Disabling checking on domains with underscores
Copying file from /root/bin/malwaredomains to /var/named/etc/namedb
Reloading zones in named server
reload successful
   10116 Previous Zones
   10116 Current Zones
Malware zonefile download on [hostname] complete in
0 hrs, 0 mins and 2 secs (2 secs)

Biting the NAS Bullet

UPDATE 2011.03.09 – I think I got a handle on it. I’m still pursuing the subject of this post, but I’m no longer worried about the backups.

I’m done.

I’ve given up on USB based backup solution. Sunday morning has become my standard “find out what went wrong with the full backups last night and see what I can do to fix them.” I tire of it. Granted, the failures this weekend were because I ran out of room on my little 80GB USB drives. Totally my fault. It was just a matter of time. I wasn’t paying attention and the backups failed. Fortunately, that’s all that happened, as opposed to something more insidious. At least it wasn’t some sort of kernel panic, or soft-updates issue again.

I could easily solve it by spending a few bucks on a larger drive, but that would just be another stopgap. I want a solution that will carry us a few years and then some.

So, I’m thinking NAS*. Something that would serve my family’s needs (which amounts to my wife and I at this point, but we’re really hoping for a little papoose sometime here real soon). That means a lot of storage space. That means seamless connectivity with our existing machines, and that means dead simple to use.

I could spend a few hundred dollars on hardware and many hours putting together my own FreeNAS server from pieces parts (or any one of a number of other free options). Or I could spend a few hundred dollars and a few minutes on an OOTB solution.

I’m leaning towards the OOTB solution.

Sure, it’s not as proudly geeky as a home grown solution, but my gorgeous wife doesn’t appreciate geekery as much as some of you and I do. She appreciates things that work and work now. If I’m going to spend this amount of time and money, she has approval powers – it’s just part of that thing called Happily Married. Frankly, the older I get, the more I agree with her. So, OOTB NAS it is.

So far, though I’m still keeping my eyes and mind open, I’ve narrowed my choices down to:

Synology Disk Station DS410
Synology Disk Station DS410j
Netgear ReadyNAS NV+
Netgear ReadyNAS Ultra
Netgear ReadyNAS Ultra Plus
QNAP Systems 419P+
Thecus N4200Eco/Pro
Seagate BlackArmor NAS 400

Each of them fits my base requirements:

  1. Interoperability between Windows, Mac and *nix machines
  2. Function as a print server
  3. Four drive bays for RAID5 or better (hot swappable a huge plus)
  4. Small physical footprint

Each of them will do the job. So now, it’s a question of features, performance, future-resistance and of course, price.

I’ll be researching each of these models (and any others that come across my screen in my research) over the next couple of weeks (or less).

* Yeah, I know NAS != backup. This is just a step in the right direction. For backups of the NAS, I’ll grab a big 1 or 2TB disk, throw it into my dev server, and rsync the data from the NAS to it. I’ll keep my backups scripts running for my server data, but I’ll point them over NFS to the NAS, rather than to flakey USB drives.

rc.d != magick

There are certain things over which I want tight-fisted control, and other things over which I want neither control nor intimate knowledge.

When it comes to keeping my FreeBSD systems up to date, I relinquish control for the most part and let the ports system do the work, with portmaster and portaudit running the show for me. Sure, I run them manually when I need to based on nightly portsnap runs, with a close eye on what’s going on, but I let them take the reins of upgrading and auditing.

But when it comes to Apache and supporting modules (php, mod_perl), I want to do things the way I want to do them, not the way the ports system wants to do them. I want to compile them myself, with the options I want, and put the whole thing where I want it. I’m sure the ports system allows for that, but I’ve not dug in deep enough to figure it out yet.

That’s worked for me. For the most part.

The one part that hasn’t worked for me has been getting Apache to start at system startup as part of the rc.d framework. That is, until today. I finally hunkered down and figured it out. It’s nowhere near as magickal or mysterious as I initially thought.

Here’s what’s in my rc.conf file:

http_enable="YES"
http_flags="-k start"

Here’s what’s in my /usr/local/etc/rc.d/httpd startup script:

#!/bin/sh

# PROVIDE: httpd
# REQUIRE: NETWORKING SERVERS DAEMON LOGIN
# KEYWORD: shutdown

. /etc/rc.subr

name="httpd"
basedir="/home/www"
rcvar=`set_rcvar`
command="${basedir}/bin/${name}"
extra_commands="config"

pidfile="${basedir}/logs/${name}.pid"
required_files="${basedir}/conf/${name}.conf"

start_precmd="${name}_prestart"
config_cmd="${name}_config"

httpd_prestart() {
	if [ -f ${pidfile} ]; then
		echo "${pidfile} exists.  Deleting..."
		rm -f ${pidfile}
	fi
}

httpd_config() {
	echo "Apache configtest..."
	${command} -t
}

load_rc_config ${name}
run_rc_command "$1"

It took a few reboots of my dev server to get it right (that’s one reason one has a dev server), but it’s working like a champ now, and I don’t have to worry about manually starting Apache anymore.

That said, I’m sure there are improvements that could be made, and I welcome any suggestions (do I need all those requires? I don’t know… but it works to wait for them).

To make things a little easier on me I’ve set up the following aliases (and these date back many many years when I was fussing with Apache and conf files on an hourly basis):

alias apstart /usr/local/etc/rc.d/httpd start
alias apstop /usr/local/etc/rc.d/httpd stop
alias aprestart /usr/local/etc/rc.d/httpd restart
alias apconfig /usr/local/etc/rc.d/httpd config
alias aptest "/usr/local/etc/rc.d/httpd status; ps aux | grep httpd | grep -v grep"

Somewhere in my archives, I have an httpd.conf file written entirely in perl… maybe I’ll dig that out some day, just to see it again…

But first, I want to get all this working in jails.

Backups Failing with “(da0:umass-sim0:0:0:0): AutoSense failed” Errors.

Since I rebuilt my systems with FreeBSD 8.1, I’ve been hounded by an error message during weekly level 0 dumps. This only happens on my /home partition, which is significantly larger than all the others combined, and only happens on the full weekly backups. The daily level 1 backups all work flawlessly. Given what I’ve learned, I’m thinking it’s just b/c the level 1 backups are done too quickly…

The Problem
Backup Ills 01
The error message, “(da0:umass-sim0:0:0:0): AutoSense failed” is followed by a slew of write messages

(da0:umass-sim0:0:0:0): AutoSense failed
g_vfs_done():da0s1[WRITE(offset=19495206912, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495337984, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495469056, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495600128, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495731200, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495862272, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495993344, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19496124416, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495075840, length=131072)]error = 5
(da0:umass-sim0:0:0:0): lost device
(da0:umass-sim0:0:0:0): Synchronize cache failed, status == 0xa, scsi status == 0x0
(da0:umass-sim0:0:0:0): removing device entry
/backup: got error 6 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error
cpuid = 1
Uptime 2h25m37s
Cannot dump, Device not defined or available
Automatic reboot in 15 seconds - press a key on the console to abort

… a kernel panic, and a dead system. The keyboard doesn’t respond, so it just sits there until the machine is hard reset manually.

Bad drive? Possibly. But that would make two in a row, so I’m leaning more towards something system-related, rather than drive related.

That led me to this post on the FreeBSD forums, and that then led me to this post elsewhere on the googlewebs. The latter indicates a difference in the way soft-updates are handled in 8.x vs. 7.x.

A Solution

So… I turned off soft-updates with:

tunefs -n disable /dev/da0s1

Trying the same command with the drive mounted threw the error:

tunefs: /dev/da0s1: failed to write superblock

I knew it wouldn’t work, I just wanted to see what exactly would happen.

My only question, which the posts I found did not answer, was whether to turn soft-updates off on the source /home partition, or the target USB backup drive. I opted for the target given that it’s only used for backups rather than day-to-day I/O operations, and it’s the quicker and easier than rebooting into single-user mode to disable soft-updates on my /home partition. So I tuned the drive, crossed my fingers and launched the backup process again.

The result:

Backup Ills 02

...
DUMP: 30.73% done, finished in 2:26 at Sun Feb 6 13:45:28 2011
DUMP: 33.06% done, finished in 2:21 at Sun Feb 6 13:45:39 2011
(da0:umass-sim0:0:0:0): AutoSense failed
g_vfs_done():da0s1[WRITE(offset=87491444736, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491575808, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491706880, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491837952, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491969024, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87492100096, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87492231168, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87492362240, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=114688, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87096983552, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87289675776, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87482368000, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491313664, length=131072)]error = 5
DUMP: 35.36% done, finished in 2:17 at Sun Feb 6 13:46:03 2011
DUMP: 37.73% done, finished in 2:12 at Sun Feb 6 13:46:00 2011
...
DUMP: 96.69% done, finished in 0:07 at Sun Feb 6 13:45:57 2011
DUMP: 99.17% done, finished in 0:01 at Sun Feb 6 13:45:42 2011
DUMP: DUMP: 39140287 tape blocks
DUMP: finished in 12691 seconds, throughput 3084KBytes/sec
DUMP: level 0 dump on Sun Feb 6 10:12:12 2011
DUMP: DUMP IS DONE

No kernel panic. No hard reset required. It just picks up where it left off and goes along it’s merry way. I’m fairly confident that file integrity is being maintained, but I’ll be testing that to be sure.

Other than soft-updates prevented it from recovering from the loss of the USB drive, I’m not sure exactly what the problem is. Why was the USB drive lost to begin with? Is it a timeout issue? An I/O issue related to too much data in the pipe? A RAM issue? I’ve two more GB’s to install, but I’ve been waiting to get more duration data to compare against before installing it.

For now, though I’m going to watch it closely, I’m considering the issue tentatively solved.

Near Native FreeBSD Full and Incremental Backups to a Removable USB Storage Drive

UPDATE 2011/03/09 – I updated the code to backup to an NFS mount, and to include the “-h 0″ flag to skip all nodump flags. That was causing me serious problems.

Summary

I’ve given quite a bit of thought to backup procedures at home since my FreeBSD 8.1 box dropped my mirrored filesystem. The signs of impending apocalypse were there, I just didn’t pay them proper heed. Fortunately, all of my data was salvaged; unfortunately, I lost all the custom PHP code I wrote over the last 6 months, my wordpress themes, plugins and modifications, and everything else that actually DID anything with all that data. So, while I’ve been rewriting that, I’ve been giving equal, if not more attention to backing it up. I’ll catch up again, but before I do that, I’ll make sure I won’t fall behind again.

I did a few searches for FreeBSD backup solutions, and rolled my own little backup script using dump. It was decent, but it didn’t do everything I wanted as well as I wanted it to. Every night was a full backup, and there were no incrementals. I had to implement some pretty inelegant code to accomplish a couple things simply b/c I didn’t know how else to do it. So I kept looking and eventually zeroed in on David Andrzejewski’s work. He clearly states what he put out there is a use-at-your-own risk kind of script. I took it anyway as a starting block, and fleshed it out for my own purposes.

My requirements were similar to his, with the exception that I don’t have a cloud based storage account at the time of this writing and instead will be using a removable USB connected storage drive.

Project Goals

  • Run with native or easily accessible tools.
  • Full off-system backup of entire system once a week.
  • Incremental off-system backup of entire system nightly.
  • Separate off-system backups of individual critical files to make future restores easier.

Future Goals

  • Play with ${DUMPCACHE} to see how it affects the time to execute in my environment. Drop it back to 8MB for a week. Ramp it up to 64MB for a week. Recommended is 32MB, but it’s a party! Let’s see what happens.
  • Continue monitoring and fine tuning the hardware, OS environment and script to ensure maximum performance and stability. I haven’t recompiled a kernel in a while, maybe I’ll see about that.
  • These are “as money allows” goals. I’m sure my wife is getting tired of me spending money on hardware. Then again, she does appreciate that I have a hobby that keeps me off the streets and out of the brothels.
    • Continue looking for consumer level, but sufficiently robust NAS solutions featuring RAID5 mirroring and access via secure and/or open protocols (ssh, smb, rsync, etc.) to replace (or augment) the removable drives I’m using now. No Windows-Only solutions please.
    • Evaluate cloud based storage for off-site backups. I’m looking at SpiderOak right now at the recommendation of a friend. I like their zero-knowledge solution and pricing, but more research is required. We’ll have upwards of 500GB of storage requirements, so we’ll have to weigh the monetary costs of cloud storage and bandwidth usage carefully against the risk of my solution failing when (!if) I need to restore. For the moment, I’m relatively comfortable with dumping the filesystems to removable drives, and keeping certain ultra-critical bits of recovery text (bsdlabels, fstabs, choice config files, etc.) in Google Docs.

My Environment

Two physically identical servers built from the ground up running FreeBSD 8.1. Each system houses a 150GB system drive (/dev/ad4s1) and a 500GB data/storage drive (/dev/ad6s1), and runs with 2GB of RAM.

I have /, /usr and /var mounted individually on the 150GB drive, and /home (containing /users and /www) mounted on the 500GB drive. I thought about getting separate drives for /www and /home, but decided I didn’t want to deal with planning for storage allocation. Instead I created /home/www for web files, and /home/users for user accounts. It’s not exactly standard, but it’s not unprecedented, and I make it work.

Off my “production” server I’ve hung a 2.5″ 320GB USB2.0 removable drive. Off my “development” server I’ve hung a 2.5″ 100GB USB2.0 removable drive. I’ll adjust the size of the drives as needed. That’s just what I had on hand. Both were UFS formatted using fdisk.

During the backup job, those drives are mounted at /backup. The rest of the time they’re plugged in, but not mounted.

The Script

You’re welcome to this, but be warned, if it borks up your machine, destroys your pr0n collection, or sends terrifying space monkeys into your engine room, don’t blame me. Use at your own risk. There, now that I’m all disclaimed…

#!/usr/local/bin/bash

# Much appreciation to David Andrzejewski, and the work he started at
# http://www.davidandrzejewski.com/2010/03/01/freebsd-backup-using-dump-and-duplicity/
# I'm sure his current script/processes far outstrip this, but this
# is my (f)stab at it

# Version: 0.5
# * Provides 1 set of full backups and 6 associated incrementals
# * Backup files stored on mounted USB drive only

# I would like to see...
# * Writing to NAS with RAID5 and standard access (ssh, SMB, etc.)
# * Retrieve the cloud based storage interaction I stripped out

# DUMPLVL: provided via a command line flag ${1})
# WEEKDAY: provided via a command line flag ${2})

# HOSTNAME: The host being backed up. Used in informational messages
HOSTNAME=$( hostname )

# FSLIST: The list of file systems that will be dumped along with the
# name of the dump Example: /dev/ad4s1a=root will dump the /dev/ad4s1a
# volume and name it DDD.root.dump.levelN.bz2 where "N" is the dump level
# and "DDD" is the weekday
FSLIST="/dev/ad4s1a=root /dev/ad4s1d=var /dev/ad4s1f=usr /dev/ad6s1d=home"

# BSDLABEL_PARTITIONS: The list of partitions to run `bsdlabel` on
# This will be saved in the backup directory during runtime as
# ${WEEKDAY}.bsdlabel_${PARTITION}.txt
BSDLABEL_PARTITIONS="ad4s1 ad6s1"

# DUMPDEVICE: The location the files will be dumped to
DUMPDEVICE=sosaria:/home/dumps/${HOSTNAME}

# DUMPDIR: The directory that the dumps will be written to
DUMPDIR=/backup

# STAGINGDIR: The directory where dumps are stored before being written
# to ${DUMPDIR}
STAGINGDIR=/home/dumps/stage

# ARCHIVEDIR: The local directory dumps are stored after being written
# to ${DUMPDIR}
ARCHIVEDIR=/home/dumps/${HOSTNAME}

# NODUMP_DIRS: List of directories to set the nodump flag
NODUMP_DIRS="/usr/ports /usr/obj /usr/src /home/www/logs /home/www/src /home/dumps"

# DUMPCACHE: The amount of memory to give dump
DUMPCACHE=32

# DUMPFLAGS: The flags to feed dump
DUMPFLAGS="uanL -h 0 -f"

# FSTYPE: The filesystem type of the mounted partition
FSTYPE=nfs

# These should be standard

# BSDLABELCMD: The bsdlabel command
BSDLABELCMD=/sbin/bsdlabel

# DUMPCMD: The dump command
DUMPCMD=/sbin/dump

# MOUNTCMD: The mount command
MOUNTCMD=/sbin/mount

# UMOUNTCMD: The mount command
UMOUNTCMD=/sbin/umount

##---------------------------------------------------------------------
# Shouldn't have to edit anything below here

# Get the start time so we can gauge how long this is taking. Useful in
# tweaking ${DUMPCACHE}
START=$( date +%s )

# Get the directory we're running from
SCRIPTDIR=$( dirname $0 )

cd ${SCRIPTDIR}
if [ $? -ne 0 ]; then
       echo "ERROR: Unable to cd to ${SCRIPTDIR}! Aborting!"
       exit 1
fi

# If we were executed like "./whatever.sh" - set SCRIPTDIR to the pwd
if [ "${SCRIPTDIR}" == "." ]; then
       SCRIPTDIR=$( pwd )
fi

echo "Script is running from ${SCRIPTDIR}"

# Check the command line to make sure we have what we need from it
# First check for the dump level
if [ "${1}" == "" ]; then
       echo "Must specify dump level. Aborting!"
       exit
else
       DUMPLVL=${1}
fi

# Sanity check
if [ "${DUMPLVL}" == "" ]; then
       echo "ERROR: For some reason DUMPLVL never got set! Aborting!"
       exit 1
fi

# Then get the weekday name off the command line
if [ "${2}" == "" ]; then
       echo "Must specify weekday name. Aborting!"
       exit
else
       WEEKDAY=${2}
fi

# Sanity check
if [ "${WEEKDAY}" == "" ]; then
       echo "ERROR: For some reason WEEKDAY never got set! Aborting!!"
       exit 1
fi

# Create the flag file so we can't run more than one instance
if [ -f "${SCRIPTDIR}/myself.flg" ]; then
       echo "Script running?! ${SCRIPTDIR}/myself.flg exists! Aborting!"
       exit 1
else
       echo "Touching myself at ${SCRIPTDIR}/myself.flg"
       touch ${SCRIPTDIR}/myself.flg
fi

# Check for the existance of ${STAGINGDIR}
if [ ! -d "${STAGINGDIR}" ]; then
       mkdir ${STAGINGDIR}
       if [ $? = 1 ]; then
               echo "Could not create ${STAGINGDIR}!  Aborting!"
               echo "Removing ${SCRIPTDIR}/myself.flg"
               rm -f ${SCRIPTDIR}/myself.flg
               exit 1
       fi
fi

# Check for the existance of ${ARCHIVEDIR}
if [ ! -d "${ARCHIVEDIR}" ]; then
       mkdir ${ARCHIVEDIR}
       if [ $? = 1 ]; then
               echo "Could not create ${ARCHIVEDIR}!  Aborting!"
               echo "Removing ${SCRIPTDIR}/myself.flg"
               rm -f ${SCRIPTDIR}/myself.flg
               exit 1
       fi
fi

echo ""
for DIR in ${NODUMP_DIRS}; do
       echo "Setting nodump on ${DIR}"
       chflags -R nodump ${DIR}
done

echo ""
echo "Dump Level: ${DUMPLVL}"

# Preserve a copy of root's crontab (/root/crontab is
# manually created with `crontab -l > ~/crontab` with every change
echo ""
echo "Copying /root/crontab to ${STAGINGDIR}/${WEEKDAY}.root_crontab"
cp -f /root/crontab ${STAGINGDIR}/${WEEKDAY}.root_crontab

# Preserve a copy of fstab
echo "Copying fstab to ${STAGINGDIR}/${WEEKDAY}.fstab.txt"
cp -f /etc/fstab ${STAGINGDIR}/${WEEKDAY}.fstab.txt

# Preserve a week's worth of bsdlabel copies for each partition
for PARTITION in ${BSDLABEL_PARTITIONS}; do
       echo "Writing bsdlabel for ${PARTITION} -> ${STAGINGDIR}/${WEEKDAY}.bsdlabel_${PARTITION}.txt"
       ${BSDLABELCMD} ${PARTITION} > ${STAGINGDIR}/${WEEKDAY}.bsdlabel_${PARTITION}.txt
done

# Dump the filesystems!
for FSITEM in ${FSLIST}; do
       # Get the devicename
       FS=$( echo ${FSITEM} | awk -F= '{ print $1 }' )
       # Get the filesystem name
       NAME=$( echo ${FSITEM} | awk -F= '{ print $2 }' )
       DUMPFILE=${WEEKDAY}.${NAME}.level${DUMPLVL}.dump
       echo ""
       echo "Dumping ${FS} to ${STAGINGDIR}/${DUMPFILE} at dump level ${DUMPLVL}"
       echo ""
       echo "${DUMPCMD} -C${DUMPCACHE} -${DUMPLVL}${DUMPFLAGS} ${STAGINGDIR}/${DUMPFILE} ${FS}"
       ${DUMPCMD} -C${DUMPCACHE} -${DUMPLVL}${DUMPFLAGS} ${STAGINGDIR}/${DUMPFILE} ${FS}
done

# Test for an existing backup device mount and either use the existing
# mountpoint or mount our backup directory

MOUNTRESULTS=$( ${MOUNTCMD} | grep "${DUMPDEVICE} on ${DUMPDIR}" )

if [ "${MOUNTRESULTS}" == "" ]; then
       echo ""
       echo "Mounting ${DUMPDEVICE} on ${DUMPDIR}"
       ${MOUNTCMD} -t ${FSTYPE} ${DUMPDEVICE} ${DUMPDIR}
       if [ $? = 1 ]; then
               echo "  ... failed. Aborting!"
               echo "Removing ${SCRIPTDIR}/myself.flg"
               rm -f ${SCRIPTDIR}/myself.flg
               exit 1
       else
               echo "  ... succeeded"
       fi
else
       echo "${HOSTNAME}:${DUMPDEVICE} already mounted on ${DUMPDIR}"
fi

# Copy the files to ${DUMPDIR} and archive them to {$ARCHIVEDIR}
cd ${STAGINGDIR}
echo ""
for FILE in *; do
       echo "Copying ${FILE} to ${DUMPDIR}"
       cp ${FILE} ${DUMPDIR}/${FILE}
       if [ $? = 1 ]; then
               echo "... Failed to copy ${FILE}! You might want to see to that."
       else
               echo "Moving ${FILE} to ${ARCHIVEDIR}"
               mv ${FILE} ${ARCHIVEDIR}/${FILE}
       fi
done

# Get a snapshot of how the dump directory looks for verification
echo ""
echo "Recent Additions to ${DUMPDIR}:"
echo ""
ls -lt ${DUMPDIR} | tail -n +2 | head -n 8

# Umount the backup filesystem
echo ""
echo "Unmounting ${DUMPDIR}"
${UMOUNTCMD} ${DUMPDIR}
if [ $? = 1 ]; then
       echo "  ... failed. You might want to see to that."
else
       echo "  ... succeeded"
fi

# Clear the running flag
echo ""
echo "Removing ${SCRIPTDIR}/myself.flg"
rm -f ${SCRIPTDIR}/myself.flg
if [ -f "${SCRIPTDIR}/myself.flg" ]; then
       echo "  ... failed. You might want to see to that."
else
       echo "  ... succeeded"
fi

echo ""
echo "Backup of ${HOSTNAME} Complete"

END=$( date +%s )
RUNTIME=$(( ${END} - ${START} ))
H=$(( ${RUNTIME}/3600 ))
M=$(( ( ${RUNTIME}/60 ) % 60 ))
S=$(( ${RUNTIME} % 60 ))

echo "It took ${H} hrs, ${M} mins and ${S} secs with -C${DUMPCACHE} (${RUNTIME} secs)"
exit 0

The Crontab

Here’s how I’ve set up my crontab. Like Mr. Andrzejewski, I opted to keep the specifics regarding the type of backup and the day it’s run in cron, rather than build it into the script. While it does make for a slightly longer crontab, it simplifies the logic in the script considerably. At the end of the day, I just feel better about telling the script what kind of backup to run (full or incremental), and the weekday name to embed in the resulting filenames, rather than letting it determine it itself. It’s a control thing.

# Daily Backups of filesystems
# Full backups on Sunday. Incremental backups every other day.
30 0 * * 0 /root/bin/backup/backup_script.sh 0 Sun 2>&1 /dev/null | mail -s "System Backup" dvicci
30 0 * * 1 /root/bin/backup/backup_script.sh 1 Mon 2>&1 /dev/null | mail -s "System Backup" dvicci
30 0 * * 2 /root/bin/backup/backup_script.sh 1 Tue 2>&1 /dev/null | mail -s "System Backup" dvicci
30 0 * * 3 /root/bin/backup/backup_script.sh 1 Wed 2>&1 /dev/null | mail -s "System Backup" dvicci
30 0 * * 4 /root/bin/backup/backup_script.sh 1 Thu 2>&1 /dev/null | mail -s "System Backup" dvicci
30 0 * * 5 /root/bin/backup/backup_script.sh 1 Fri 2>&1 /dev/null | mail -s "System Backup" dvicci
30 0 * * 6 /root/bin/backup/backup_script.sh 1 Sat 2>&1 /dev/null | mail -s "System Backup" dvicci

This will finally result in a list of files looking something like this come Sunday morning. Sort to taste.

backup/Sat.usr.level1.dump
backup/Sat.var.level1.dump
backup/Sat.root.level1.dump
backup/Sat.fstab.txt
backup/Sat.bsdlabel_ad6s1.txt
backup/Sat.bsdlabel_ad4s1.txt
backup/Sat.root_crontab.txt
backup/Fri.home.level1.dump
backup/Fri.usr.level1.dump
backup/Fri.var.level1.dump
backup/Fri.root.level1.dump
backup/Fri.fstab.txt
backup/Fri.bsdlabel_ad6s1.txt
backup/Fri.bsdlabel_ad4s1.txt
backup/Fri.root_crontab.txt
backup/Thu.home.level1.dump
backup/Thu.usr.level1.dump
backup/Thu.var.level1.dump
backup/Thu.root.level1.dump
backup/Thu.fstab.txt
backup/Thu.bsdlabel_ad6s1.txt
backup/Thu.bsdlabel_ad4s1.txt
backup/Thu.root_crontab.txt
backup/Wed.home.level1.dump
backup/Wed.usr.level1.dump
backup/Wed.var.level1.dump
backup/Wed.root.level1.dump
backup/Wed.fstab.txt
backup/Wed.bsdlabel_ad6s1.txt
backup/Wed.bsdlabel_ad4s1.txt
backup/Wed.root_crontab.txt
backup/Tue.home.level1.dump
backup/Tue.usr.level1.dump
backup/Tue.var.level1.dump
backup/Tue.root.level1.dump
backup/Tue.fstab.txt
backup/Tue.bsdlabel_ad6s1.txt
backup/Tue.bsdlabel_ad4s1.txt
backup/Tue.root_crontab.txt
backup/Mon.home.level1.dump
backup/Mon.usr.level1.dump
backup/Mon.var.level1.dump
backup/Mon.root.level1.dump
backup/Mon.fstab.txt
backup/Mon.bsdlabel_ad6s1.txt
backup/Mon.bsdlabel_ad4s1.txt
backup/Mon.root_crontab.txt
backup/Sun.home.level0.dump
backup/Sun.usr.level0.dump
backup/Sun.var.level0.dump
backup/Sun.root.level1.dump
backup/Sun.fstab.txt
backup/Sun.bsdlabel_ad6s1.txt
backup/Sun.bsdlabel_ad4s1.txt
backup/Sun.root_crontab.txt