Wireless

Outage fuels Skype network upgrades

Feb 22, 2011 5:33am

There's no way I can prove this, but it's a reasonable assumption that at least some telco executives felt that the Skype blackout of December 22 was an early Christmas present.

Maybe not. After all, telcos may pride themselves on network reliability, but as Asian carriers know from the Boxing Day earthquake of 2006 and the earthquake/typhoon double-punch of 2009, networks can fail far beyond a routine reroute.

However, that was physical damage from natural disasters. Skype's problems were technical. Making matters worse was that it was Skype's own P2P architecture that let it down - the same architecture that Skype has often touted as making it more reliable than proper telco networks.

Oh dear.

To summarize the official explanation from Skype CIO Lars Rabbe:

The failure was related to "supernodes" in the Skype network - computers that serve as phone directories to help Skype users find each other. Due to a cluster of support servers handling offline instant messaging becoming overloaded, and a bug in a widely used version of a Skype for Windows client, between 25% and 30% of Skype's supernodes (i.e. the ones running the same buggy Windows client) crashed. The resulting surge of traffic on the remaining supernodes, exacerbated by millions of users restarting their crashed Windows clients at the same time, essentially forced most of the remaining supernodes to shut down in self defense.

Results: Skype was unavailable for many users for at least 24 hours, and it took Skype engineers two days to build enough extra supernodes to bring everything back to normal.

Rabbe said Skype was taking measures to prevent a similar failure in future - chiefly, working on better ways to get automatic software fixes to its users (evidently the Windows "bug" had been detected before the failure, and Skype already had a fix for it) and improving its software testing procedures, as well finding ways to detect supernode problems more quickly.

Some critics will undoubtedly characterize the episode as a serious blow to Skype's credibility as a communications provider. Overall, though, I don't see Skype suffering too much from this - not as long as it follows through with its goals to make the improvements necessary to keep this type of failure from happening again. In a way, Skype is lucky this happened while its user base was still mainly consumers using the free service. Either way, when Skype next updates its subscriber figures, they're not likely to be lower.

Charm offensive

One reason for that may turn out to be the way Skype handled the crisis - by keeping users updated via its blog and YouTube, posting a full post-mortem of what happened, and providing credits for paid users affected by the blackout. During a scheduled press conference at this year's CES event in Las Vegas, CEO Tony Bates - rather than dodge the issue - kicked off proceedings by bringing it up, explaining what happened and apologizing for it. Compare that to the way Steve Jobs handled the iPhone 4's antenna issues, and you can see why Skype's charm offensive would convince most users to stick around.

That said, Skype may have its work cut out for it in terms of its ambitions to target enterprises. Certainly some CIOs aren't likely to entrust critical business communications to a P2P network after this.

Still, that's one reason why Skype has been courting operators to partner with them on the mobile front. Skype may be able to boost a cellco's voice and SMS business by "x" percent, but Skype also benefits from having access to the kind of network reliability that traditional networks already have.

That will be even more important as Skype pushes further into mobile video via its January acquisition of mobile video start-up Qik. It will be interesting to see what role the Great Supernode Crash of 2010 will play as Skype goes to the negotiating table with operators this year.