Windows-Now blogger Robert McLaws details a horror story that no network guy or gal ever wants to face. He was using beta
Hyper-V software in the wild when two hard drives in his datacenter failed within a week. One of his critical servers went offline completely. Backups failed (not entirely the fault of Hyper-V, McLaw's admits). But only at that point did he discover some critical flaws in Hyper-V when it comes to recovering data.
He writes:
Well, as the drive failed and Hyper-V came crashing to a halt, it removed all traces of the Virtual Machine that hosted all of my client sites. In addition, it deleted the snapshot of the server that runs Windows-Now, hence some of the broken images that are coming up at the moment (more on that shortly).
The next several days were extremely frustrating, as I learned a very hard lesson about deploying beta software in the wild. For example, in Hyper-V, "snapshots" are completely misleading. They are not full backups of the VHD file, like one might expect. No, instead, they are like hybrid differencing/undo disks... which would be all well and good if the documentation explained that, but it doesn't. The problem with that is, the changes to the VHDs are not committed unless you explicitly do so. So in the event of a catastrophic failure, you're basically screwed ... I feel I need to stress this point: I hadn't lost a single hard drive in 5 years, I lost 2 inside of a week. The wonderful people at ServerBeach have all but eliminated faulty hardware, which leads me squarely to virtualization solution I was using."
Now, there's no question a person can fault McLaws for using beta software on critical servers. On the other hand, a beta these days is really the closed-source software industry's method of doing quality control. Such companies (with Microsoft the leader here) throw their code over the transom to the public and tell them to pound on it to find the bugs. (Think how well this would go over if internal enterprise IT developers worked the same way - where critical bugs were only discovered when they crashed the business's live data center.) If a software developer isn't going to open its source code so that bugs can be found through intelligent development, then it needs to own up to some responsibility for the "beta" code it writes and releases to the public.
More importantly, what does McLaw's experience say about Hyper-V itself? Is it - will it - be ready for prime time when Microsoft releases it? (As of December, Microsoft said Hyper-V will be available within six months of the release of Windows Server 2008.)
See also, from Mitchell Ashley: Getting Ready For Hyper-V
Virtualization Center Series: Microsoft Hyper-V Release Candidate In Server 2008
Go to Microsoft Subnet for more news, blogs, opinion. More from Microsoft Subnet
A step by step on how to add a role to Server Core
How to get started with Server Core in WS2008
Let's Get Reacquainted & How to Optimize Windows Server 2008 for Branch Office Communications (Four Part Series)
Windows Server 2008 Management and Maintenance tips
Fabulous giveaways from Microsoft Subnet and Cisco Subnet All Micronet blog posts
Sign up for the bi-weekly Microsoft newsletter. (Click on News/Microsoft News Alert.)
The Microsoft Subnet blog is the official blog of the Network World's Microsoft Subnet community, managed by editor Julie Bort. Microsoft Subnet is the independent voice of Microsoft customers and is your gateway to daily Microsoft news, blogs, opinion, books, prize giveaways and more. Visit the Microsoft Subnet index page daily, and while you are there, subscribe to the Microsoft newsletter. The newsletter includes news generated by the Microsoft Subnet community as well as other Microsoft news stories published by Network World.
(OS community)
(Microsoft RSS feed)
The opinions expressed in this Weblog are those of the writer and may not represent the opinions of Network World.
|
|
Beta Hyper V and McLaws
It is unquestionably the case that software companies have increasingly taken the position that "the world is my beta tester."
It is also the case that no one forces anyone to use a beta version in a production environment. While some software is less mission-critical than others, Hyper-V clearly opens the door to the kind of massive failure he experienced.
So, to say he can be "faulted" for his decision to use it in a production environment is charitable.
Beta testing is just that: an examination of performance of code that has not yet been ajudged worthy of release.
I'm sure the Microsoft development team is pleased to have his experience to chew over. It is precisely the kind of event that testing is supposed to reveal.
But, to ask whether anything might be concluded about the final release of product based on what happened to him with the beta code doesn't make a whole lot of sense to me.
beta versus final release
Yes, every time not-yet-final code crashes and burns, that's "nice" for the software makers in that problems are discovered before they proclaim the code fully cooked. But there is an overall issue of trust in the software industry, thanks in large part to Microsoft and how it's used its beta programs. They fix found bugs (we hope), but at some point, they simply release 1.0 code and there remains little-to-no trust that the 1.0 version is hearty. IF beta code wasn't buggy in HUGE ways -- but was intended to mostly get feedback on features/functionality/documentation, think how much trust they would create in new software. People would try it and like it, instead of try it and have a bad experience. We hope that Microsoft addresses all the technical problems that Microsoft-lover McLaws encountered. But there is tremendous pressure for Microsoft to release Hyper-V at the point (whatever point that is) where Microsoft decides that it is "cooked enough."
Beta Hyper V and McLaws
I agree with the previous comment. Beta code IS beta code, which means it is not quite ready for primetime as in production use. Sometimes I feel like a beta tester with some of the "Gold" code I have to run in production. It is simply reckless and irresponsible to run beta code in a production or mission critical environment. To wonder if the software developer needs to take some responsibility for the user's irresponsible "testing" of the beta code is typical of today's mentality that forces clothes iron makers to add warning labels not to iron clothes while wearing them. If you do something irresponsible or foolish (such as putting beta code in production), then don't try to fix the blame elswhere- the responsibility is squarely with the person that made the decision!
Actually, how about the
Actually, how about the standard reply from tech support: we have a new patch/version that should take care of the problem. Please let us know if you have any problems.
Unfortunately, tech support now mainly doesn't actually solve any problems other than by sending newer versions of a file and hoping it works. It's understandable. It's quick and it sometimes work without having to spend much time solving the problem (and it alleviates the training that companies provide to their staff - ie., if any).
So sometimes, even if you don't want to go to a new version, if you run into a bug, you may be forced into it.
It seems that people are now in a rush to do everything quickly, without regards to quality. If the big players create that kind of pressure, how is the small guy going to fight it off?
This is backup "101" folks...
Hard drives fail.
Software, even released "stable" software has hiccups.
OS's crash. Users delete files (I don't think I need "system32" dir anymore).
The point here is that backups should be taken regularly. Using snapshot technology as a replacement for backup is taking complacency, adding stupidity and raising it to the N'th power.
Additionally, backups should be taken *inside* each virtual machine; it's the only *reliable* way to ensure that the backup is coherent. There are ways to take backups of VMs from the host, and at best they're 99% reliable. But it's the last 1% that will make you lose your job every time.
Nothing to do with Hyper-V
The failure had nothing to do with Hyper-V. Every snapshot that I have heard of is some sort of differencing mechanism that requires the original data to be available. I use snapshots all the time - they're a great way to recover a file or directory if something happens to them, but they are not backups, and do not protect against drive failure. The situation is no different from running Windows 2003 as a file server and then writing a scary article how Windows 2003 isn't ready for prime time because shadow copies didn't save some poor fool from data loss when the disk array went down. I chalk the headline up to ignorance/sloppiness instead of deliberate FUD. Headline should be: "How relying on snapshots burned Robert McLaw's datacenter".
Agreed - Nothing to do with Hyper-V
Seriously, this is a great example of why I don't read blogs by people who inevitably claim to be experts in the field and then do stupid, bone-headed, non-standard practices such as throwing his clients data and trust out on BETA software and then blaming the OS for his stupidity in not creating backups. Since when had it ever, EVER, become industry standard by the experts to not create backups, even in RAID environments? I’ve only been doing this for 11+ years, but, I recall the first thing I do after any server install is to setup Backup software and recovery methods. And then Actually test them. TEST backups? I’m cracking myself up here thinking about all my wasted time setting up and Testing backups, when I could have just thrown everything on a Beta OS and called it a day. Hey, maybe then, I could have time to blog and become considered an “A-list” blogger by giving advice for everyone else to follow and not actual follow industry standard practices myself…
Well said. This guy is
Well said. This guy is obviously not the sharpest pencil in the box. I always wondered how these guys had all this time to blog and still do their job.
And you are correct, I am a consultant and have walked into many clients and found out they were not doing backups because they had RAID. I bet all the drives they were using were from the same lot, too.
Couldn't have said it
Couldn't have said it better. It is a matter of an individual not understanding the terminology. Which means that Beta or not, he did not know enough about the product he was using and his clients paid the price. I don't mean to sound overly harsh because I understand how difficult it is to keep up to speed with constant change and new products, but perhaps he was spreading himself to thin or not allocating enough time to really learn about the technology he was using. I'd hate to see what would have happened if something like a real Beta bug showed up. Would he even know how to troubleshoot the problem and how much downtime could his clients have been subjected to then? The backup situation could have been solved very simply with backup software, or scripting a copy backup of the VHD files themselves to another RAID volume or device.
I'm sorry he got burned...
A few thoughts about his experience with the Beta could be prevented:
As the comments point out, Hyper-V snapshots are a way provide high speed recovery of failed software failures (a bad windows update from last night, application database upgrade, etc.). Like Windows 2003 (0r 2008) Volume Shadow Copy Services which caches "previous versions" of files...you can't just rely on that for a backup solution in hardware failures or DR solution.
Another note is the online backup solution he uses doesn't support VSS which is the Microsoft approved backup API solution requirement for consistent backups of virtual machines. A solution like EMC's Mozy work well for this...but i's a little bit pricier then his current solution.
More Info on Mozy:
http://www.gilham.org/Pages/Online-Hosted-Online-Microsoft-Server-Desktop-Backup-Solution-Exchange-Sharepoint-SQL-Service.aspx
A more traditional local disk or tape based solution would also work...Microsoft DPM 2007 or Veritas Netbackup are a VSS API aware backup tools.
Comment cross posted from:
http://www.gilham.org/Blog/Lists/Posts/Post.aspx?ID=243
Post new comment