Moving the Home Network

Part of the packing, cleaning and moving marathon of the last week, which is to continue through the coming weekend, was the move of the family computers and servers. This meant the following very simple steps:

  • the set up of the internet connection at the new place,
  • changing DNS upstream to point to the new IP,
  • shutting down and physically moving the servers,
  • physically reconnecting them,
  • connecting the wireless router to the new cable router,
  • configuring the new cable router to account for the wireless router,
  • configuring the external IP on the wireless router,
  • firing up the servers,
  • enjoy fun website availability and internal DNS goodness.

Every step went well and easy, all the way up until that last step.

Though all the laptops in the house connected just fine to the Comcast cable modem/router, the two FreeBSD servers simply would not. From a logical network topology standpoint, nothing had changed. None of the IP addresses, including the gateway and the subnet, had changed. As far as the server interfaces were concerned, they’d just been turned off and turned back on. Let me emphasize that – from a logical network topology standpoint, nothing had changed.

I had it set up like so:

Net -> Comcast Router -> Servers & Laptops.

But they weren’t working. One worked briefly, but then quit. The other never would work.

If the cat5 cable (any cat5 cable) was plugged in, a ping to the (same as before) gateway would result in “sendto: Host is down”. If the cable wasn’t plugged in, I’d get a “sendto: No route to host”. Clearly there was some awareness going on, and the NIC was functioning at some level, because pinging the assigned IP, localhost, or 127.0.0.1 would all return successful. I couldn’t get any response from the gateway, however, and no other working machines could get responses from the servers. It was weird.

So, I got on the phone with Comcast to ask them about any incompatibilities with the router and FreeBSD. I got some good information about my usable external public IP (unrelated to the problem at hand), and some completely bogus information about having to use static IPs within the DHCP scope on their router (yeah but… what?!). The first level tech support wanted to help, but he just didn’t have the expertise, and so he escalated me to the next level (who is well past their 48 hour self-imposed deadline at the time of this writing). I decided to change things up a bit.

I went to this setup:

Net-> Comcast Router -> Wireless Router -> Servers & Laptops.

Note that this is exactly the same as I had at my old house, with the substitution of the Comcast Router for the Surewest cable modem. Everything from the wireless router on back is identical.

You know what? The laptops all worked (I love knowing even rudimentary networking), but the servers still didn’t work. Having eliminated the router, cables and network configs, the fact that it doesn’t work anymore with the exact same setup as was at the other house tells me it’s a another type of hardware problem.

So I yanked the gigabit NICs right out of the servers and went back to the on-board 100baseTX ports and… get this – it worked just fine.

What I’m concluding is that the D-Link GigaExpress DGE-530T card doesn’t work well with the BIOSTAR N68S3+ and the Diablotek EL Series PSEL400 400W ATX PSU. I base that conclusion in part b/c, in addition to flat out not working anymore, there are times when the machines won’t power on when the DGE-530T is installed without some creative combinations of the case power button and the PSU switch. When those cards aren’t installed, there are no issues. What, I didn’t mention that before? My bad.

Given that when I put these servers together, I did so with as little cash outlay as possible. I’m thinking I’ve been bit by the “get what you pay for” principle. In time, I’ll beef them up a bit with better components. But for now, I’m just happy to be back online enjoying fun website availability and internal DNS goodness.

The Dying of a Beloved Laptop

… well, maybe not quite yet. Vertical lines have started showing up the screen, though. First there was just the one, way over to the right of the screen and out of the way. This morning, a 2nd showed up right down the middle. Ugh.
Laptop LCD Screen Lines
My research points to a couple of things that could be the problem. Some say it’s a connector issue. Others claim LCD. I’ll do more research, but I’m inclined to think it’s the screen. Regardless, I reckon there will come a time in the nearish future where I’ll be forced to make a choice. The way I see it, I have a few:

  1. I can simply do without.
  2. I can buy a replacement LCD for ~$180 and install it myself or, with a little more money, pay someone else to do it.
  3. I can buy a new gaming class laptop.
  4. I can buy a non-gaming class laptop.
  5. I can take up the task of building a new desktop computer.

Anyone have any guesses which way I’m going to go?

I won’t simply do without. Nope. No can do. Next…

I’ll consider replacing the LCD, but the laptop is 5+ years old now, and is definitely showing its age. It’s had a good run, but I have to face the notion that it might be time to put it to pasture. Or at least relegate it to more mundane tasks that don’t require a full time monitor – or better yet, donate it to someone who can’t afford their own new laptop… it’ll still more than serve basic needs, and I have just the person in mind.

I’m not interested in a new gaming-class laptop. The ROI just isn’t high enough for me. If I’m going to spend that much cash, I want more machine that what’s offered in a laptop. I want more performance, more customizability, more upgradeability, and frankly, more control.

Nor am I interested in a non-gaming class laptop. The ROI might be better, but the performance just won’t bet there. I. Need. The. Fast.

That pretty much leaves me with building my own desktop. I’ve been batting around the idea of building a new computer for a few months now. It’s been 5+ years since I’ve upgraded that aspect of what my wife likes to call my “Command Center”, so it’s about time (and as much as things have stayed the same, my oh my, but have they changed!). I’ve put together a rough budget with a range of costs per component, and I could spend as little as a few hundred, or as much as a few thousand. We have a lot of expenses hitting all at once right now, but after a few months and things have settled down a bit (around my birthday), I think I might be able to make a go at it, even with the addition of a home NAS solution (which I’m still debating, to be honest).

I just hope that my laptop screen holds out until then. I’ve seen what happens when this issue is left alone, and it’s not pretty…

Backups Failing with “(da0:umass-sim0:0:0:0): AutoSense failed” Errors.

Since I rebuilt my systems with FreeBSD 8.1, I’ve been hounded by an error message during weekly level 0 dumps. This only happens on my /home partition, which is significantly larger than all the others combined, and only happens on the full weekly backups. The daily level 1 backups all work flawlessly. Given what I’ve learned, I’m thinking it’s just b/c the level 1 backups are done too quickly…

The Problem
Backup Ills 01
The error message, “(da0:umass-sim0:0:0:0): AutoSense failed” is followed by a slew of write messages

(da0:umass-sim0:0:0:0): AutoSense failed
g_vfs_done():da0s1[WRITE(offset=19495206912, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495337984, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495469056, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495600128, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495731200, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495862272, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495993344, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19496124416, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=19495075840, length=131072)]error = 5
(da0:umass-sim0:0:0:0): lost device
(da0:umass-sim0:0:0:0): Synchronize cache failed, status == 0xa, scsi status == 0x0
(da0:umass-sim0:0:0:0): removing device entry
/backup: got error 6 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error
cpuid = 1
Uptime 2h25m37s
Cannot dump, Device not defined or available
Automatic reboot in 15 seconds - press a key on the console to abort

… a kernel panic, and a dead system. The keyboard doesn’t respond, so it just sits there until the machine is hard reset manually.

Bad drive? Possibly. But that would make two in a row, so I’m leaning more towards something system-related, rather than drive related.

That led me to this post on the FreeBSD forums, and that then led me to this post elsewhere on the googlewebs. The latter indicates a difference in the way soft-updates are handled in 8.x vs. 7.x.

A Solution

So… I turned off soft-updates with:

tunefs -n disable /dev/da0s1

Trying the same command with the drive mounted threw the error:

tunefs: /dev/da0s1: failed to write superblock

I knew it wouldn’t work, I just wanted to see what exactly would happen.

My only question, which the posts I found did not answer, was whether to turn soft-updates off on the source /home partition, or the target USB backup drive. I opted for the target given that it’s only used for backups rather than day-to-day I/O operations, and it’s the quicker and easier than rebooting into single-user mode to disable soft-updates on my /home partition. So I tuned the drive, crossed my fingers and launched the backup process again.

The result:

Backup Ills 02

...
DUMP: 30.73% done, finished in 2:26 at Sun Feb 6 13:45:28 2011
DUMP: 33.06% done, finished in 2:21 at Sun Feb 6 13:45:39 2011
(da0:umass-sim0:0:0:0): AutoSense failed
g_vfs_done():da0s1[WRITE(offset=87491444736, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491575808, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491706880, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491837952, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491969024, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87492100096, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87492231168, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=87492362240, length=131072)]error = 5
g_vfs_done():da0s1[WRITE(offset=114688, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87096983552, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87289675776, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87482368000, length=16384)]error = 5
g_vfs_done():da0s1[WRITE(offset=87491313664, length=131072)]error = 5
DUMP: 35.36% done, finished in 2:17 at Sun Feb 6 13:46:03 2011
DUMP: 37.73% done, finished in 2:12 at Sun Feb 6 13:46:00 2011
...
DUMP: 96.69% done, finished in 0:07 at Sun Feb 6 13:45:57 2011
DUMP: 99.17% done, finished in 0:01 at Sun Feb 6 13:45:42 2011
DUMP: DUMP: 39140287 tape blocks
DUMP: finished in 12691 seconds, throughput 3084KBytes/sec
DUMP: level 0 dump on Sun Feb 6 10:12:12 2011
DUMP: DUMP IS DONE

No kernel panic. No hard reset required. It just picks up where it left off and goes along it’s merry way. I’m fairly confident that file integrity is being maintained, but I’ll be testing that to be sure.

Other than soft-updates prevented it from recovering from the loss of the USB drive, I’m not sure exactly what the problem is. Why was the USB drive lost to begin with? Is it a timeout issue? An I/O issue related to too much data in the pipe? A RAM issue? I’ve two more GB’s to install, but I’ve been waiting to get more duration data to compare against before installing it.

For now, though I’m going to watch it closely, I’m considering the issue tentatively solved.

Windows 7 on my Alienware Aurora M9700

Well, I did it. I bit the bullet, ordered me a copy of Windows 7 Ultimate 32bit/64bit, and started the installation saga yesterday at around 3:30pm. As of this afternoon, I’ve achieved what I hope is system stability. Nearly all of my devices are accounted for, and the system boots without apparent issue.

The first step was, of course to install Windows 7 using the supplied DVD media.

That done, I tackled the video drivers, b/c even though the default drivers support 1280×1024, the native screen resolution of the Alienware Aurora m9700 is 1920×1200. I’m a little spoiled by all that real estate, to say nothing of the sharp crispiness that comes with using the native resolution. There was a problem, however.

Alienware doesn’t have a video driver for the 32bit version of Windows 7. The closest they have is Vista. LaptopVideo2Go does, and they include the modified .inf files that you may have heard about if you’ve attempted any after market laptop video driver updates. nVidia themselves also have drivers. But no matter what video driver I tried, be it the old and dusty Vista driver, or the bright and shiny drivers from LV2G or nVidia, the problem remained the same… right before presenting the logon screen, the laptop screen would go black, and wouldn’t light up again unless I sent the laptop into hibernate (or possibly standby), and then woke it up. However, after waking it up, the keyboard refused to work. So, I had the option of a black screen, or no keyboard. If I had no password on my account, I could have possibly just passed through to a Windows desktop, but I doubt the keyboard would have been available. Having no password isn’t an option, either.

I tried a few things, but I found a forum link somewhere (now lost), that described the same problems, and pointed me to a BIOS update. I noted that I was running BIOS v3.17, and the link was to v3.18 (Alienware & Local if the official link changes [855KB]). Trusting that it wouldn’t brick my machine, I downloaded the new version, ripped the .iso to a CD, and flashed my BIOS.

Lo & Behold! That fixed my issues completely. Aside from a momentary panic from a non-fatal checksum mismatch error on POST, it went perfectly. About 4 hours of trial and error (mostly error) were finally over. I’m now running driver version 7.15.11.7948 dated 1/30/2009, from the 179 series that nVidia determined in their wizardly wisdom was the best driver for my machine.

I haven’t tried running dual monitors yet. I was more concerned initially with getting SLI support running (which I have). Dual monitors won’t run with SLI enabled, and the first time I tried it, the screen flickered manically, and eventually resulted in a BSOD. I’ll try it again some day, but now I’m happy that it’s working, and don’t want to screw it up just yet.

So, that mostly solved, I went about tackling the fact that I had no audio support. That was a relatively easy fix. I pulled the Audio 5.12.01.5500 driver from Alienware’s support site (Alienware & Local if the official link changes [24MB]). I installed it, rebooted, and my ears were graced with wonderful sounds. Easy peasy. 15 minutes.

This morning, I went up to Best Buy to pick up a new mouse, because neither Logitech nor Windows 7 has full support for my 10 year old Logitech MouseMan M-BD53. I need my customized buttons. So as a replacement, I picked up Logitech Wireless M510. So far, after about an hour, I’m pleased. I thought about a rechargeable, but I went with a standard battery powered option because I didn’t want to find myself stuck without a mouse while it was on the cradle.

I’m not completely done yet, because I still have two instances of “Base System Device” and one instance of “PCI Modem” that aren’t recognized by the OS. I’ve seen nothing in the way of instability, however, since last night, so I’m not going to sweat them too much until I do. I suspect they have something to do with mainboard chipsets. However, like I said, things are working as is.

One last note. Flashing the BIOS reset the RAID settings. I was running RAID 1 and noticed this morning that my second drive was sitting idle and offline. I was able to go back into the BIOS, enable RAID, and rebuild the mirror in no time flat, though. It’s now running perfectly.

Driver issues aside, I’m pretty impressed with Windows 7. Something about having to invoke administrator rights when I install new software makes me feel safer. I’m getting a handle on Libraries, and am well on my way to customizing my set up to way I like it. Aero will take some getting used to, but so far, I don’t see any reason to quit using it.

Here’s to hoping Windows 7 continues to get along with my Aurora M7900…

Update Later That Same Day… I’m nothing if not daring where computers are concerned… I risked another string of BSOD’s and an unbootable brick of a laptop and attempted dual screenies on an unproven setup. So far, so good. :)

Update 12/31/2011 A kind reader moving his own m9700 from XP to Win7 did a little research and found that those missing “Base System Device” instances in the Device Manager were actually for the media card reader. I pulled down the drivers from Alienware’s Help Site Vista section (Alienware & Local 3.5MB), and wouldn’t you know it, duder was right. Now all that remains is the PCI Modem, which is also available on the help site, but I just don’t have any reason at all to install that. Thanks, man!

Cycling Safety, Printing with Cups & HPLIP, and Vino

All in all, not a bad weekend. I’d fixed our printing issues with Samba/Cups/HPLIP before 11, and I’d slept in until 9:30 (yes, I clearly needed sleep). It was, I think, a permissions issue, but it was a confusing issue.

Brokey:

lrw-rw---- 1 root cups 9 Sep 24 17:04 ugen0.2 -> usb/0.2.0
crw-rw-rw- 1 root cups 0, 128 Sep 24 17:01 0.2.0

Workey:

lrw-rw---- 1 root cups 9 Sep 24 17:04 ugen0.2 -> usb/0.2.0
crw-rw-rw- 1 root cups 0, 128 Sep 24 17:01 0.2.0

Yeah. I don’t see a difference either. Still, I was getting the following errors:

prnt/backend/hp.c 745: ERROR: open device failed stat=12: hp:/usb/photosmart_7350?serial=XXXXXXXXXXXXX

and

printer-state=5(stopped)
printer-state-message="/usr/local/libexec/cups/backend/hp failed"
printer-state-reasons=paused

whenever I tried to print (not the real serial number).

Then I chown root:lp both ugen0.2 and usb/0.2.0, and printing was magically working again. And yet, they still show as being root:cups. Go figure. Please. Go figure, because aside from some weird corruption in some place I don’t yet know exists, I can’t figure it out myself.

So, that done, I moved myself downtown to the KC Public Library for the first of two League of American Cyclists Smart Cycling KC: Traffic Skills 101 classes. Though there wasn’t anything in there that I didn’t already know, it was still a good time, and worth the time spent. Some really good discussion ensued around traffic law and cyclists place in it, and how to best defend ourselves out there. I’d be interested in the more advanced classes, so I’m going to keep my eyes open for those.

Now, I’m about to settle in for the evening with my beautiful wife, who’s been shopping nearly all day, with a bottle of wine and a weeks worth of DVR’d television stories.

Hardware Bug

I’ve got the bug again.
I can’t help it. I remember having so much fun building my own computers 7-10 years ago. I’d find the fastest video card I could (it was all about the monster gaming rig back then), then the best motherboard to support it, a top-of-the-line CPU (I preferred AMD back then) to run it, and fast memory to carry it all. Once that was nailed down, I’d go for speedy hard drives, a rockin’ sound card, optical drives, fancy internal cabling, and finally an easy to work with aluminum case with a power supply beefy enough to run it all to hold all the guts.

Back then, it was all about pushing the most polygons in the least amount of time for maximum framerates.

I operated that way for years, until I got tired of lugging around the heavy rig to LAN parties. So I opted for my first pre-fab computer in the form of a desktop-replacement laptop. I’ve used it steady and with very few problems for the last 5 years or so. Ironically, once I finally decided to go with an easily transportable laptop for LAN parties, the LAN parties fizzled out. No matter, I still love having a laptop around for general portability.

To this day, I’ve not owned a desktop that I haven’t either put together entirely on my own from the motherboard up, or at least heavily modified one way or another. Nor will I. I won’t – no nay never – buy a pre-fab monstrosity from Office Buy, or Best Depot, or some Corner Geek Shop.

A laptop? Sure. A desktop or server? No. Nay. Never.

Now, having updated my server to the latest version of FreeBSD, I’ve got the bug again. All that playing around with the guts of FreeBSD, relearning this and that, up and woke the bug up again. Which is good and convenient, because I’ve had some interesting fixed disk issues with the new kernel.

Hardware issues
Part of the upgrade involved utilizing the onboard Promise RAID on the Gigabyte GA-7DXR. I’m not convinced that’s the root of my problems, but I’m not convinced it ain’t. For starters, and most likely completely unrelated, I’m getting the following errors in dmesg:

GEOM: ad0: partition 1 does not start on a track boundary.
GEOM: ad0: partition 1 does not end on a track boundary.
GEOM: ad0s1: geometry does not match label (16h,63s != 16h,255s).
GEOM: ad5s1: geometry does not match label (255h,63s != 16h,63s).
GEOM: ad7s1: geometry does not match label (255h,63s != 16h,63s).

I suspect FreeBSD’s installer for those messages, actually. But, since the upgrade, I’ve had two spontaneous and unannounced reboots. The first time, there were no indications of anything amiss in the logs. The second time, I found this:

ar0: WARNING - mirror protection lost. RAID1 array in DEGRADED mode
kernel: unknown: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=765023
kernel: unknown: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=765023

Followed immediately by a 7 hour gap in logging clearly indicative of another hard reset. Pretty sure that’s RAID related.

That 7 hours ended when I noticed the server stuck in POST at an angry FastBuild screen demanding attention, and had to rebuild the array in order to get past POST. It worked, and all is up and running again, but with diminished confidence.

The research I’ve had time for has yielded sparse results, indicating either that I have a serious problem that needs immediate attention and I’d better have solid backups or I’m screwed to Taiwan and back, OR… it’s nothing serious and has been showing up for the last few FreeBSD releases.

I’ll dig into fdisk, atacontrol, smartctl and sysctl in more depth this weekend to see what that turns up, and then I’ll turn my attention to hardware research.

Server Plans
When some funds clear up, I’m going to build a new server to operate as a media center/file server for the family. It’ll be a beefy box with built in data redundancy, lots of drive space, backup power, and not much in the way of gaming potential.

I may entertain MythTV or something like it to replace the rental DVR (and then some).

So, the bug is back, but it’s purpose is vastly different now. Framerate has taken a distant backseat to reliability now… well, at least until Diablo III comes out…

Google Contact Screw 2010

I had some serious issues with my Google Contacts yesterday. Duplicates, triplicates, quadruplicates, quintuplicates, by the Gods, even unto decuplicates there were. Moreover, those that had been cloned were not truly cloned, but merged, munged, and otherwise spliced together with other contact entries in the most perplexing and mystifying (to say nothing of frustrating) ways.

To paint an unfortunately inadequate picture… this phone number was placed with that name and another address entirely. That address was truncated and duplicated within the same (but wrong) contact several times over. That name was repeated across a dozen entries, many duplicates, but others different in subtle and aggravating ways. The deeper I dug into it, the more baffled and dejected I became. What had happened? There was no rhyme! There was no reason! There was only chaos where before had been beautiful order!

I feared the damage was irreversible, and for one who has taken some measure of pride in a complete and accurate address book (a consequence of the wedding planning), this was sore news indeed. I steeled myself for hours, days, weeks and even months of corrective work.

From about 6:30pm until 10:00pm, I worked. My first step was to stop the hemorrhaging. I uninstalled Google Sync, which I use to sync contacts and calendar entries between my Blackberry and my Google account. Having become rid of that, I moved to erase my entire BB address book using the Desktop Manager (the easiest way to accomplish the task short of wiping the BB entirely). Curiously, it wouldn’t let me b/c it was set for wireless synchronization. Navigating to the proper screen, I found that it was! Dismay! Moreover, it was set to allow duplicates! Horror! I disabled duplicates forthwith, and shut off wireless synchronization with a strong exclamation point (or “bang” if you will). Thus eliminating the possibility of additional corruption, I moved to my Google Contacts list.

The tedium begins. The immediately obvious saving grace, and clear starting point was the fact that the illegimate entries did not belong to any existing group (other than the All Contacts group). I always assign my contacts to a group (it’s a geeky organizational thing). This made it easy to identify and trim those contacts that weren’t present prior to the Google Contact Screw 2010. But my work was only beginning. While I had fewer contacts to correct, the corruption, and thus the correction, was egregious. So I commenced.

I removed the obvious duplicates. I eliminated duplicate street address entries. I recreated destroyed address entries. I moved e’mail addresses and phone numbers to their correct place. I cross referenced with my guest book listing from the wedding, and from old exports of years gone by. I spent the better part of two hours fixing the obvious errors. I did what I could to bring my contact list back from chaos and into some semblance of order. I can’t say with any degree of certainty that it’s error free now, but I think it’s close. It’ll take months, I’m sure, to weed out all the errors, especially for those contacts not contacted that much, but I’m off to a good start.

Having cleaned it up as best I could, I reinstalled Google Sync and enabled only the Calendar sync portion. Having confirmed that my calendar entries were syncing properly, I enabled native wireless syncing for my address book, making sure to disallow duplicates.

So far, so good.

Here’s what I think happened… I think for reasons still wholly unclear, that the native wireless contact sync and the Google Sync contact sync suddenly started attacking each other. In their struggle, contacts were corrupted, destroyed, duplicated and otherwise rendered unrecognizable. For reasons also still wholly unclear, I think this fight started b/c the native wireless syncing was enabled, having been previously disabled, and that duplicates were, by default, allowed.

The result was that my contact list was damaged nearly to the point of no return.

I’m not the only person to have had this problem, though… and if I follow through with some research, I may find another culprit. I’m certainly open to the truth if it differs from my own theory.

I’m not sure which I would rather… disable native address book syncing and enable Google Sync, or enable native syncing and disable Google Sync. I think, for the time being, I will enable native syncing and monitor for a reset to the apparently default “Allow Duplicates.”

My next task will be to find a way to automatically, likely via cron, perform daily exports of my address book so that, in the event that this happens again, I will have a backup to restore from. So far, python looks to be the most likely candidate, given the Google Contacts API.

The Acpi 2.0 _PCT object returned an invalid value of 7

So, I’ve been fighting a stuttering issue with my laptop, an Alienware Aurora m9700, for the last few months now. Whenever I played a movie file it would start stuttering badly after about 20 seconds, and then every 10-20 seconds for 2-3 seconds. In .mov files, this affects the sound, but in all others, it affects only the video. The same held true for games. I was unable to play any games (on a gaming laptop!) due to the crippling drop in FPS every 10-20 seconds.

I tried reinstalling the OS a few times. I tried updating this driver or that. I fiddled with BIOS settings. I fiddled with video card settings. I spent hours researching it online using variations on the terms “stuttering video xp seconds fps chop ‘frame rate’” and a host of others.

Two nights ago I was looking through the Event Log on a whim on a completely unrelated topic, having all but given up on the stuttering issue. At that point, I was all but convinced it was an SP3 issue, and there was nothing I could do about it.

But I came across this: “The Acpi 2.0 _PCT object returned an invalid value of 7″. Hmm… Acpi… power management… and my fan has been going mad the last few months. More than I remember it going when I first got the laptop. Still could be SP3… I’ve heard it jacks with power management. But I keep looking through the error log, and this error seems to coincide with the stuttering incidents.

A google search for the error string, and I come across a lot of things related to (get this) stuttering during video playback and gaming.

From looking around, it could be one of the following:
  • Broken heat pipe
  • Unseated CPU heat sink requiring the reapplication of Arctic Silver (or similar heat sink paste)
  • Dust on the heat sink and/or vents
  • Faulty CPU
  • Faulty motherboard controller
  • Faulty drivers

Huh. Some of those are pretty expensive. So I think about it, and decide to start with the easiest. Grabbing a screwdriver, and air canister and a flat surface, I take the bottom panel off the laptop, revealing the cooling pipes, heat sinks, and other various whatnots. I do this 1. to inspect things b/c I still love the look of computer circuit boards, and 2. to give the dust a place to go after I hit it with air. I do so, and nearly blind myself with the dust that saturated the air in my kitchen.

Huh.

Coughing and hacking, I put the panel back on, boot her back up again, and notice immediately that the fan is a LOT quieter. It’s odd, actually, I could actually barely hear the HD clicking. That had been drowned out by an overworked fan just minutes before. I head over to http://www.thehuntforgollum.com/ and load it up. It used to say “Your CPU may be too slow to process HD video, would you like to view the low res version?” right about the time it would start stuttering. Even the “low res” version would say that same thing.

I watched, my anticipation mounting, expecting the video to stop and stutter any second… or at least expecting the fan to kick in loudly… neither happened!

Months of frustration and trouble, and all b/c the heat sink and vents were full of dust.

I wonder how much longer I could have gone before I actually damaged something.

Alienware Aurora M9700 Part Replacements Needed

It seems the parts that have gone bad on my Alienware laptop are hard to come by.

I need the following:

  1. LI ION Rechargeable Battery Model # W83066LC
  2. Keyboard Model # MP-03753US-839

So far, everywhere I’ve looked shows both parts as out of stock. Bah.

Noah was kind enough to offer assistance replacing the keyboard once I found the one I wanted for the price of a beer. I couldn’t get it apart, so I asked for help. Thing is, I had to get it apart to find out which exact model I needed. Knowing now how to get it apart, we’ll just have to make time for that beer outside of laptop repair. I done figured it out. Thanks again, all the same, Noah! Keep riding!

Question: Laptop Repair in KC

Anyone know where I can take my Alienware Aurora m9700 to have the keyboard repaired in the Kansas City area? The control key on the left side is dead. I’d fix it myself, but I can’t figure out how to get the thing apart, and don’t want to start wrenching at things.

I’m not interested in negative comments about [pick your vendor/brand]. I’m really just looking for positive comments regarding trusted shops where you’ve had good experiences. It’s out of warranty, so I expect to have to pay.

Thank you!