As the wave of generative AI continues to surge, computing power has become the core resource that tech giants are competing for. Recently, Google was forced to tighten its resource supply due to a sharp increase in demand for the Gemini AI platform, aiming to cope with the growing workload from developers and enterprises.

Since the spring of 2025, the number of requests for the Gemini API has doubled, making this core computing resource in short supply. To ensure fair use of the ecosystem, Google officially implemented a computing quota-based usage limit on May 17, 2026, using a rolling update mechanism similar to data plans. For requests exceeding the quota, the system will trigger a frequency limit, ensuring all partners can still receive basic call support amid resource shortages.

In these changes, Meta has been most significantly impacted. It is reported that due to Meta's demand for the model far exceeding that of other customers, Google had previously clearly stated it could not fully meet the computing quota requested by Meta. This sudden restriction directly disrupted the progress of several internal AI projects at Meta. In response to the computing pressure and to align with the company's internal strategy of reducing AI R&D costs, Meta has urgently asked employees to optimize the calling process, improving the efficiency of code and token usage.

Industry insiders point out that this resource adjustment reflects the serious reality faced by global AI computing infrastructure: although the capabilities of large models continue to rise, the speed of building foundational computing infrastructure that supports their efficient operation still lags behind the explosive growth of application demands. For companies like Meta, which deeply rely on cloud computing power, how to maintain R&D speed under resource constraints will become the key issue in the next stage of competition.