According to a corporate letter dated Sept. 20 seen by Reuters, as the summer of 2022 came to a close, Meta CEO Mark Zuckerberg assembled his top lieutenants for a five-hour analysis of the business’s computer power, focusing on its capacity to undertake cutting-edge artificial intelligence work.
The social media giant had a tricky issue: despite high-profile investments in AI research, it had been slow to adopt pricey AI-friendly hardware and software systems for its core business, impeding its ability to keep up with innovation at scale even as it increasingly depended on AI to support its growth, according to the memo, company statements, and interviews with 12 people familiar with the changes who spoke on the condition of anonymity to discuss internal competition.
In terms of developing for AI, we have a large gap in our tooling, workflows, and processes. Santosh Janardhan, the new head of infrastructure, wrote a memo that was put on Meta’s internal message board in September and is now being reported for the first time. “We need to invest heavily here,” it stated.
In order to support AI work, Meta would have to “fundamentally shift our physical infrastructure design, our software systems, and our approach to providing a stable platform,” the report noted.
Meta has been working on a sizable initiative to get its AI infrastructure in shape for more than a year. Details of the makeover, which included capacity constraints, leadership changes, and a shelved AI chip project, have not previously been revealed, despite the company’s public admission that it is “playing a little bit of catch-up” on AI hardware trends.
A spokeswoman for Meta, Jon Carvill, responded to questions about the memo and the restructure by saying the business “has a proven track record in creating and deploying state-of-the-art infrastructure at scale combined with deep expertise in AI research and engineering.”
As we add new AI-powered experiences to our family of applications and consumer products, we’re confident in our ability to keep enhancing the capabilities of our infrastructure to suit both our immediate and long-term needs, added Carvill. He would not say whether Meta had given up on its AI chip.
Requests for interviews made through the corporation were declined by Janardhan and the other executives.
According to company reports, the redesign increased Meta’s capital expenditures by around $4 billion a quarter, nearly doubling its spend as of 2021, and forced it to postpone or cancel previously scheduled data center expansions in four locations.
Those investments came at a time when Meta was experiencing extreme financial hardship; since November, it has been firing staff at a rate not seen since the dotcom crisis.
An arms race among tech giants to release products using so-called generative AI, which, beyond recognizing patterns in data like other AI, creates human-like written and visual content in response to prompts, has been sparked by Microsoft-backed OpenAI’s ChatGPT, which after its Nov. 30 debut surged to become the fastest-growing consumer application in history.
Five of the sources claimed that generative AI devours vast amounts of computer resources, intensifying the urgency of Meta’s capacity scramble.
FALLING BACK: Those five sources claimed that Meta’s tardy adoption of the graphics processing unit, or GPU, for AI development, was a major contributor to the issue.
Because they can do many jobs at once, GPU chips are ideally suited to artificial intelligence processing because they can quickly process billions of pieces of data.
However, GPUs are also more expensive than other chips due to chipmaker Nvidia Corp’s (NVDA.O) 80% market share and dominant position in supporting software, according to the sources.
A request for comment for this story from Nvidia was not met.
The company’s fleet of commodity central processing units (CPUs), the workhorse chip of the computing industry that has long populated data centers, was used by Meta to run AI workloads instead until last year. However, AI workloads performed poorly on commodity CPUs.
Two of those sources claim that the business also began utilizing a unique chip that it had created in-house for inference, an AI procedure where algorithms trained on vast amounts of data make decisions and produce responses to prompts.
By 2021, the two sources claimed, the two-pronged strategy had proven to be slower and less effective than one based on GPUs, which were also more adaptable in running various models than Meta’s processor.
On the effectiveness of its AI processor, Meta declined to comment.
Four of the sources claimed that as Zuckerberg steered the company toward the metaverse, a collection of digital worlds made possible by augmented and virtual reality, a capacity crunch was impeding its ability to use AI to counter threats like the emergence of social media rival TikTok and Apple’s changes to ad privacy.
Peter Thiel, a previous member of the Meta board, noticed the errors and abruptly resigned in early 2022.
According to two sources familiar with the conversation, Thiel complained to Zuckerberg and his executives at a board meeting before he left that they were too focused on the metaverse and complacent about Meta’s core social media business, which left the company open to TikTok’s challenge.
Meta opted not to remark on the exchange.
CATCH-UP: Executives switched direction and made orders for billions of dollars’ worth of Nvidia GPUs in 2022 instead of launching Meta’s own bespoke inference hardware on a massive scale as originally planned, a source claimed.
On the order, Meta opted not to comment.
By that time, Meta had already fallen behind rivals like Google, who had started deploying its own specifically designed GPUs in 2015 under the name TPU.
Executives began reorganizing Meta’s AI units that spring as well, hiring Janardhan, the author of the September message, as one of two new engineering leaders.
According to their LinkedIn profiles and a source familiar with the departures, more than a dozen executives left Meta over the months-long upheaval, representing a nearly complete transition in the leadership of the AI infrastructure.
In order to accommodate the forthcoming GPUs, which must be packed closely together with specialized networking between them as they demand more power and generate more heat than CPUs, Meta began redesigning its data centers.
The facilities needed to be “entirely redesigned,” according to Janardhan’s memo and four persons familiar with the project, the specifics of which have not yet been made public. The facilities required 24 to 32 times the networking capacity and new liquid cooling systems to regulate the clusters’ heat.
The building of data centers, which was put on hold while the company switched to the new designs, will continue later this year, according to Carvill, a spokesperson for Meta. Regarding the chip project, he opted not to comment.
TRADE-OFFS: While expanding up its GPU capacity, Meta has so far not made much of a splash as rivals like Microsoft and Google advertise the public debuts of their own generative AI technologies.
According to Chief Financial Officer Susan Li, “basically all of our AI capacity is going towards ads, feeds, and Reels,” Meta’s short video format akin to TikTok that is well-liked by younger users, the company is not currently allocating much of its compute to generative work.
Four of the individuals claim that Meta did not give generative AI products a high priority prior to the November launch of ChatGPT. Even while the business’s research lab FAIR, or Facebook AI Research, has been disseminating technology concepts since late 2021, they claimed the company was not focused on turning its highly praised research into products.
That is altering as investor interest skyrockets. In February, Zuckerberg revealed a new top-tier generative AI team that he claimed would “turbocharge” the business’s efforts in the field.
A product from Meta is expected to be released this year, according to Chief Technology Officer Andrew Bosworth, who also stated last month that generative AI was the area in which he and CEO Mark Zuckerberg were investing the most effort.
According to two people who are acquainted with the new team, its work is still in its early phases and is focused on creating a foundation model—a basic program that can subsequently be adjusted and customized for various goods.
The company has been creating generative AI products on several teams for more than a year, according to Carvill, a spokesperson for Meta. He acknowledged that since ChatGPT’s arrival, the work has advanced.