It was December 2012, and Doug Burger was standing in front of Steve Ballmer, trying to predict the future.
The prototype was a dedicated box with six FPGAs, shared by a rack full of servers. If the box went on the frizz, or if the machines needed more than six FPGAs—increasingly likely given the complexity of the machine learning models—all those machines were out of luck. Bings engineers hated it. They were right, Larus says.
So Burgers team spent many more months building a second prototype. This one was a circuit board that plugged into each server and included only one FPGA. But it also connected to all the other FPGA boards on all the other servers, creating a giant pool of programmable chips that any Bing machine could tap into.
That was the prototype that got Qi Lu on board.
He gave Burger the money to build and test over 1,600 servers equipped with FPGAs. The team spent six months building the hardware with help from manufacturers in China and Taiwan, and they installed the first rack in an experimental data center on the Microsoft campus. Then, one night, the fire suppression system went off by accident. They spent three days getting the rack back in shape—but it still worked.
Over several months in 2013 and 2014, the test showed that Bings “decision tree” machine-learning algorithms ran about 40 times faster with the new chips. By the summer of 2014, Microsoft was publicly saying it would soon move this hardware into its live Bing data centers. And then the company put the brakes on.
Searching for More Than Bing
Bing dominated Microsoft’s online ambitions in the early part of the decade, but by 2015 the company had two other massive online services: the business productivity suite Office 365 and the cloud computing service Microsoft Azure. And like all of their competitors, Microsoft executives realized that the only efficient way of running a growing online empire is to run all services on the same foundation. If Project Catapult was going to transform Microsoft, it couldnt be exclusive to Bing. It had to work inside Azure and Office 365, too.
The problem was, Azure executives didn’t care about accelerating machine learning. They needed help with networking. The traffic bouncing around Azure’s data centers was growing so fast, the service’s CPUs couldn’t keep pace. Eventually,
people like Mark Russinovich, the chief architect on Azure, saw that Catapult could help with this too—but not the way it was designed for Bing. His team needed programmable chips right where each server connected to the primary network, so they could process all that traffic before it even got to the server.
The first prototype of the FPGA architecture was a single box shared by a rack of servers (Version 0). Then the team switched to giving individual servers their own FPGAs (Version 1). And then they put the chips between the servers and the overall network (Version 2). WIRED
So the FPGA gang had to rebuild the hardware again. With this third prototype, the chips would sit at the edge of each server, plugging directly into the network, while still creating pool of FPGAs that was available for any machine to tap into. That started to look like something that would work for Office 365, too. Project Catapult was ready to go live at last.
Larus describes the many redesigns as an extended nightmare—not because they had to build a new hardware, but because they had to reprogram the FPGAs every time. That is just horrible, much worse than programming software, he says. Much more difficult to write. Much more difficult to get correct. It’s finicky work, like trying to change tiny logic gates on the chip.
Now that the final hardware is in place, Microsoft faces that same challenge every time it reprograms these chips. Its a very different way of seeing the world, of thinking about the world, Larus says. But the Catapult hardware costs less than 30 percent of everything else in the server, consumes less than 10 percent of the power, and processes data twice as fast as the company could without it.
The rollout is massive. Microsoft Azure uses these programmable chips to route data. On Bing, which an estimated 20 percent of the worldwide search market on desktop machines and about 6 percent on mobile phones, the chips are facilitating the move to the new breed of AI: deep neural nets. And according to one Microsoft employee, Office 365 is moving toward using FPGAs for encryption and compression as well as machine learning—for all of its 23.1 million users. Eventually, Burger says, these chips will power all Microsoft services.
Wait—This Actually Works?
It still stuns me, says Peter Lee, that we got the company to do this. Lee oversees an organization inside Microsoft Research called NExT, short for New Experiences and Technologies. After taking over as CEO, Nadella personally pushed for the creation of this new organization, and it represents a significant shift from the 10-year reign of Ballmer. It aims to foster research that can see the light of day sooner rather than later—that can change the course of Microsoft now rather than years from now. Project Catapult is a prime example. And it is part of a much larger change across the industry. The leaps ahead, Burger says, are coming from non-CPU technologies.
Peter Lee. Clayton Cotterell for WIRED
All the Internet giants, including Microsoft, now supplement their CPUs with graphics processing units, chips designed to render images for games and other highly visual applications. When these companies train their neural networks to, for example, recognize faces in photos—feeding in millions and millions of pictures—
GPUs handle much of the calculation. Some giants like Microsoft are also using alternative silicon to execute their neural networks after training. And even though it’s crazily expensive to custom-build chips, Google has gone so far as to design its own processor for executing neural nets, the tensor processing unit.
With its TPUs, Google sacrifices long-term flexibility for speed. It wants to, say, eliminate any delay when recognizing commands spoken into smartphones. The trouble is that if its neural networking models change, Google must build a new chip. But with FPGAs, Microsoft is playing a longer game. Though an FPGA isn’t as fast as Google’s custom build, Microsoft can reprogram the silicon as needs change. The company can reprogram not only for new AI models, but for just about any task. And if one of those designs seems likely to be useful for years to come, Microsoft can always take the FPGA programming and build a dedicated chip.
A newer version of the final hardware, V2, a card that slots into the end of each Microsoft server and connects directly to the network. Clayton Cotterell for WIRED
Microsofts services are so large, and they use so many FPGAs, that theyre shifting the worldwide chip market. The FPGAs come from a company called Altera, and Intel executive vice president Diane Bryant tells me that Microsoft is why Intel acquired Altera last summer—
a deal worth $16.7 billion, the largest acquisition in the history of the largest chipmaker on Earth. By 2020, she says, a third of all servers inside all the major cloud computing companies will include FPGAs.
It’s a typical tangle of tech acronyms. CPUs. GPUs. TPUs. FPGAs. But it’s the subtext that matters. With cloud computing, companies like Microsoft and Google and Amazon are driving so much of the world’s technology that those alternative chips will drive the wider universe of apps and online services. Lee says that Project Catapult will allow Microsoft to continue expanding the powers of its global supercomputer until the year 2030. After that, he says, the company can move toward quantum computing.
Later, when we talk on the phone, Nadella tells me much the same thing. Theyre reading from the same Microsoft script, touting a quantum-enabled future of ultrafast computers. Considering how hard it is to build a quantum machine, this seems like a pipe dream. But just a few years ago, so did Project Catapult.
Correction: This story originally implied that the Hololens headset was part of Microsoft’s NExT organization. It was not.