Kogonuso.com

Kindle crashes and broken PowerShell: Something isn’t right with Windows 10 testing

2016-09-01

Peter Bright

Last week, we learned that the Windows 10 Anniversary Update caused trouble for many webcam users. Today, it's the turn of Kindle owners to cry foul, with numerous reports that plugging a Kindle into a Windows 10 machine with the update will make the PC crash with a Blue Screen of Death.

This problem has more than a hint of the same feeling as the webcam issue: it's the kind of thing that shows up quickly when using Windows 10 on a primary system but is going to be much more obscure if you only tested the Windows Insider previews in a virtual machine or secondary system. Such systems are much less likely to be plugged in to all the many peripherals and gadgets that primary machines are. Microsoft's own advice is that the Insider previews should not be installed on your "everyday computer." That's good advice; the quality of the builds released to the Insider program is far too inconsistent to make it a good option for a machine that you depend on. But that has consequences: the Insider program is going to consistently miss this kind of hardware interaction.

Investigation of the issue and development of a fix is apparently underway.

As if that weren't bad enough, Microsoft has pushed out a bad update that breaks important PowerShell functionality. The update, KB3176934, is the latest cumulative update for the Anniversary Update and was released on August 23. It breaks two key PowerShell features: Desired State Configuration (DSC), which is used to deploy configurations to systems, and implicit remoting, a feature that makes it easier to use PowerShell commands that are installed on remote systems.

The reason that these things have broken is remarkable. The Cumulative Update doesn't include all the files it needed to. One missing file breaks DSC; a second missing file breaks implicit remoting. A revised package that includes these missing files will be released on August 30; although Microsoft recognizes that the problem exists, it isn't apparently important enough to rush out a fix, so it'll have to wait for the next Tuesday.

Once again, this speaks to an extraordinary lack of testing on Microsoft's part. Ensuring that a cumulative update includes all the files it's supposed to and doesn't break core PowerShell functionality is basic stuff.

It's déjà vu all over again

None of this is excusable. I wrote last Friday that issues like the webcam problem would "inevitably recur" due to the problems of Microsoft's current testing regime: lack of internal testing (the people who did this were laid off); Insiders not testing on real systems (because they're advised not to use it on their primary PCs); and Insiders tending to give poor feedback (they're not professional testers, and Microsoft's very weak release notes give no indication of what things have been changed and hence need testing in the first place).

I expected a recurrence; I just didn't expect it to happen the very next week.

Microsoft has radically changed the way it delivers patches and updates to its operating system. Windows 10 sees a series of semi-regular bigger-than-Service Pack-size updates, along with monthly cumulative fixes. Dedicated testing at Microsoft seems to be largely eliminated in favor of the Insider program and multiple release tracks. Individual piecemeal hotfixes similarly have been done away with in favor of the larger cumulative updates. The ability to delay or defer fixes has been greatly curtailed.

In many ways, these changes in software updating and delivery are for the better. Incremental updates, as opposed to the old world of one major release every three years, enable the software to get better, faster. They make it easier to support new hardware technology and new usage scenarios, and they give the company much greater ability to respond to user demands in a timely manner. These are good things. The consolidated patching and mandatory updating are more controversial—users who need PowerShell to work properly have to reject a lot of non-PowerShell fixes as well, even if they want those—but I think, long-term, will prove to be advantageous by providing greater consistency between systems and hence more straightforward testing. Multiple release channels Long Term Servicing branch, Current Branch for Business, Current Branch, Insider Release Preview, Insider Fast, and Insider Slow make it much easier for people to test the software ahead of time and, in principle, shake out the bugs.

But it's equally clear that the process as it is right now isn't working. Too many errors are slipping through, and they're errors that put the new policies under the spotlight. The PowerShell issue wouldn't be a problem if it weren't incorporated into a cumulative update; affected users could just roll back the broken PowerShell package while keeping everything else. Problems with common hardware interactions that ought to be tested in a lab are escaping unnoticed.

At precisely the time that Microsoft needs to be instilling confidence in its patching process by reassuring especially enterprise customers that the rolling releases and cumulative updates are nothing to fear, it is doing the opposite. Niggling issues around hotfixes are nothing new—you can always find one or two people complaining that the latest patch doesn't work properly—but the new approach threatens to turn those minor gripes into showstoppers, eroding trust in the way Windows is developed, tested, and deployed.

Microsoft's challenge here is considerable. Nothing else on earth has to support the same diversity of hardware and software compatibility as Windows. Weird corner cases and obscure hardware complaints are, on some level, inevitable. Upending a development process that has, for literally decades, been geared toward a big release every three years is no small undertaking, and it's not surprising that Microsoft is taking some time to get things right.

Equally, though, it's clear that the company hasn't got it right yet, and it's not clear that it's even on the right track. Microsoft needs to figure out a way to steady the ship.

Seeking inspiration

While there are no good direct equivalents to copy from, there are some large projects that have successfully adopted an "as a service" delivery model and development process, and Microsoft might do well to seek some inspiration from them. Specifically, the process used by Google for Chrome and Mozilla for Firefox has been proven to work well.

While the scope of a browser is substantially smaller than that of Windows (trivially so, because Windows contains, among its many components, two entire Web browsers), the browsers face similar pressures. Like Windows, their userbase is substantially nonexpert, meaning that the robustness of the final product is paramount; non-expert users can't be forced to use complex workarounds, and they aren't ever going to pore over release notes or create detailed bug reports. Developers are also faced with a big dollop of software compatibility, because they have to work with the extant Web, including all the sites out there with broken code, buggy JavaScript, corrupt images, and worse.

Like Windows, both browsers are also developed and delivered using a continuously updated "service" model.

However, the approach that these projects have taken is very different from that of Windows. The Windows scheme (if we ignore the "old" versions that are oriented toward corporate users) has two major streams: "stable" and "Insider." "Insider" delivers a steady stream of builds to the "Fast" channel, representing the latest build of the next major update to Windows. Occasionally, builds from the "Fast" channel are also propagated to the "Slow" channel. Both "Fast" and "Slow" represent progress toward the same major update. Those major updates are released sporadically, with two shipping (versions 1511 and 1607) so far.

The browsers, by contrast, are much more regimented. Their "stable" channels see a "major" release every six weeks or so (with the inevitable consequence that those "major" releases are typically quite minor). Their "beta" channels represent progress toward the next stable release, and their "dev" or "developer" channels represent progress toward the next beta release. As such, the beta and dev channels are a vision of what will ship six or twelve weeks into the future.

These different shipping styles also have different quality levels. Windows may not have the same three-year release cycle it used to, but as an outside observer, it feels like development still proceeds in much the same way. Under the "old" Windows development process, when Microsoft would ship perhaps a couple of betas and then a couple of release candidates, we would see quality improvement over that process, with each subsequent build becoming less buggy and more polished as the release date neared. The "new" process condenses the timelines substantially—development of a major release happening over six to nine months, rather than three years—and it gives us many more interim builds, but the same quality progression appears to remain. Early builds will tend to be rough around the edges, with long lists of both bugs and new features. As the release date nears, those bugs will be closed out and the software will get better.

By contrast, the Chrome and Firefox beta and dev channels tend to remain production-grade. Occasionally the Chrome dev channel will throw out a build that has a crippling bug, rendering it broken for a day or two, but this is rare. My own experience is that the Chrome dev channel (which I use as my browser of choice on desktop systems) is more robust than the Edge stable build and certainly much higher quality than the Windows Insider fast builds.

By having a dev channel that is good enough for daily use and running more than a release ahead of the stable build, both Google and Mozilla can ensure they can collect abundant real-world usage data to detect bugs early enough in development that there's time to do something about them before they make it into a stable release.

The smaller, more regular builds also mean that there is less pressure to get a feature into any particular build. While Windows is much better now than it used to be—postponed features can be shipped with the next major update, rather than having to wait three whole years—there's still pressure to ship now, fix later, to avoid a wait of six or more months.

As such, the approach that these projects have taken appears to directly address some of the problems with Microsoft's current development process.

This is not to say that Microsoft can easily adopt this model, or something close to it (if nothing else, a two-month cycle would fit better with the Patch Tuesday scheduling than a six-week one), but the companies using this model are managing to do something that Redmond is currently struggling with: delivering mass-market desktop software on a continuous basis with consistent high quality. The current development process plainly has problems, and Microsoft needs to very publicly fix it: the success of Windows 10 depends on it.