Last May, Sandra Rivera, a top executive at the chip giant Intel, got some alarming news.
Engineers had worked for more than five years to develop a powerful new microprocessor to carry out computing chores in data centers and were confident they had finally gotten the product right. But signs of a potentially serious technical flaw surfaced during a regular morning meeting to discuss the project.
The issue was so troublesome that Sapphire Rapids, the code name for the microprocessor, had to be delayed — the latest in a series of setbacks for one of Intel’s most important products in years.
“We were pretty dejected,” said Ms. Rivera, an executive vice president in charge of Intel’s data center and artificial intelligence group. “It was a painful decision.”
The launch of Sapphire Rapids wound up being pushed from mid-2022 to Tuesday, nearly two years later than once expected. The lengthy development of the product — which combines four chips in one package — underscores some of the challenges facing a turnaround effort at Intel when the United States is trying to assert its dominance in the foundational computer technology.
Since the 1970s, Intel has been a leading player in the small slices of silicon that run most electronic devices, best known for a variety called microprocessors, which act as electronic brains in most computers. But the Silicon Valley company in recent years lost its longtime lead in manufacturing technology, which helps determine how fast chips can compute.
Patrick Gelsinger, who became Intel’s chief executive in 2021, has vowed to restore its manufacturing edge and build new U.S. factories. He was a leading figure as Congress debated and passed legislation last summer to reduce U.S. dependence on chip manufacturing in Taiwan, which China claims as its territory.
The bumpy development of Sapphire Rapids has implications for whether Intel can rebound to deliver future chips on time. That’s an issue that could affect scores of computer makers and cloud service providers, not to mention the millions of consumers who tap into online services likely to be powered by Intel technology.
“What we want is a stable cadence that is predictable,” said Kirk Skaugen, the executive vice president leading server sales at Lenovo, a Chinese company that is planning 25 new systems based on the new processor. “Sapphire Rapids is the start of a journey.”
A newly processed silicon wafer containing Sapphire Rapids chips, at Intel’s headquarters in Santa Clara, Calif., this week. Credit…Anastasiia Sapon for The New York Times
For Intel, the pressure is on. Along with falling demand for chips used in personal computers, the company faces stiff competition in the server chips that are its most profitable business. That issue has worried Wall Street, with Intel’s market value plunging more than $120 billion since Mr. Gelsinger took charge.
Intel plans to host an online event on Tuesday to discuss Sapphire Rapids, which is named after a portion of the Colorado River. More formally, the product is called the 4th Gen Intel Xeon Scalable processor.
In an interview, Mr. Gelsinger said Sapphire Rapids had the makings of a hit, despite the delays. He picked Ms. Rivera in 2021 to take over the unit developing it, where she is using lessons from the experience to change how Intel designs and tests its products. He said Intel had conducted several internal reviews of what happened with Sapphire Rapids, and “we’re not done.”
Sapphire Rapids began in 2015, with discussions among a small group of Intel engineers. The product was the company’s first attempt at a new approach in chip design. Companies now routinely pack tens of billions of tiny transistors on each piece of silicon, but competitors like Advanced Micro Devices and others had started making processors from multiple chips bundled together in plastic packages.
Intel engineers came up with a design with four chips, each one sporting 15 processor “cores” that act like individual calculators for general-purpose computing jobs. The company also decided to include extra blocks of circuitry for special tasks — including artificial intelligence and encryption — and to communicate with other components, such as chips that store data.
The interaction among so many elements is “very complex,” said Shlomit Weiss, who jointly leads Intel’s design engineering group. “Complexity usually brings problems.”
The Sapphire Rapids team grappled with bugs, flaws caused by designer errors or manufacturing glitches that can cause a chip to make incorrect calculations, work slowly or stop functioning. They were also affected by delays in the product’s manufacturing process.
But by December 2019, the engineers had hit a milestone called “tape-in.” That’s when electronic files containing a completed design move to a factory to make sample chips.
The sample chips arrived in early 2020, as Covid-19 forced lockdowns. The engineers soon got the computing cores on Sapphire Rapids communicating with one another, said Nevine Nassif, the project’s chief engineer. But more work than expected remained.
One key chore was “validation,” a testing process in which Intel and its customers run software on sample chips to simulate computing chores and catch bugs. Once flaws are found and fixed, designs may go back to the factory to make new test chips, which typically takes more than a month.
Repeating that process led to missed deadlines. Ms. Nassif said Sapphire Rapids was designed to counter AMD’s Milan processor, which was introduced in March 2021. But it still wasn’t ready by that June, when Intel announced a delay until the next year to allow more validation.
That was when Ms. Rivera stepped in. The longtime Intel executive had successfully built a business in networking products before being appointed in 2019 as chief people officer.
“We had to get our execution mojo back,” Mr. Gelsinger said. “I needed somebody who was going to run to the fire and fix this business for me.”
In October 2021, Ms. Rivera and a top design executive established weekly Sapphire Rapids status meetings, held each Monday at 7 a.m. Those gatherings showed steady progress in finding and fixing bugs, she said, bolstering confidence about starting production in the second quarter of 2022.
Then came the discovery of the flaw last May. Ms. Rivera would not describe it in detail but said it had affected the processor’s performance. In June, she used an investor event to announce a delay of at least a quarter, which pushed Sapphire Rapids later than the launch of a competing AMD chip in November.
“We were ready to ship,” Ms. Nassif said. The final delay “was just so sad given all the effort that had gone into it.”
Ms. Rivera saw a series of lessons from the setbacks. One was simply that Intel packed too many innovations into Sapphire Rapids, rather than deliver a less ambitious product sooner.
She also concluded that the team should have spent more time on perfecting and testing its design using computer simulations. Finding bugs before they are in sample chips is less expensive, and would have made it possible to remove features to simplify the product, Ms. Rivera said. She has since moved to bolster Intel’s simulation and validation abilities.
“We used to have a lot of this kind of muscle that we let atrophy,” Ms. Rivera said. “Now we’re rebuilding.”
She also determined that Intel had scheduled more products than its engineers and customers could easily handle. So she streamlined that product road map, including pushing back a successor to Sapphire Rapids to 2024 from 2023.
More broadly, Ms. Rivera and other Intel executives have pushed the organization to develop better processes for documenting technical issues, and sharing that information inside and outside the company.
Some Intel customers say the communication has gotten better.
“Has everything gone well? No,” said Lenovo’s Mr. Skaugen, who once ran Intel’s server chip business. “But we were surprised a lot less than we were in the past.”