Cerebras Inference API Fully Opened: Developers Receive One Million Free Tokens Daily

On June 2, 2025, the artificial intelligence chip company Cerebras Systems announced that its inference API is now fully open to all developers, removing the previous waiting list restriction. This move marks a significant step for Cerebras in accelerating the development of generative AI applications and provides developers worldwide with efficient and rapid AI inference services.

According to Cerebras' official statement, developers can use up to 1 million tokens per day free of charge. This free quota provides developers with ample resources to build and test high-performance AI applications based on the Cerebras inference platform. Cerebras stated that its inference API significantly outperforms traditional GPU solutions in terms of speed, achieving up to 20 times faster inference than GPUs. It performs exceptionally well in real-time speech processing, video handling, complex reasoning models, and code generation scenarios. Test data shows that Cerebras' inference service can generate over 2,600 tokens per second when running the Llama4Scout model, far surpassing other GPU-based API providers.

Cerebras' inference API supports various mainstream open-source models, including Llama4 and Qwen3-32B. Developers can quickly integrate these models via simple API calls. Additionally, through collaborations with platforms like Hugging Face and Meta, Cerebras' inference API has been seamlessly integrated into these ecosystems, further lowering the barriers for developers. For example, the 5 million developers on Hugging Face only need to select Cerebras as their inference provider to directly experience its ultra-high performance.

Andrew Feldman, CEO of Cerebras, said: "We are committed to providing developers with the fastest AI inference service so they can build real-time, intelligent applications more efficiently. Opening the API and offering 1 million free tokens per day is an important step in empowering global innovation."

The full opening of this API not only offers cost-effective AI development opportunities for startups and independent developers but also provides enterprise users with efficient tools to build complex AI applications. Cerebras' high-performance inference capabilities, combined with its newly established six data centers in North America and Europe, are expected to further promote the widespread adoption of generative AI in fields such as healthcare, finance, and voice interaction.

Industry insiders pointed out that Cerebras' move may have a profound impact on the AI inference market, especially in its competition with traditional GPU suppliers like Nvidia. Cerebras demonstrates strong technical advantages with its unique large-sized wafer-scale engine (WSE-3). As inference demands continue to grow, Cerebras' open strategy may reshape the market landscape of AI infrastructure.

Study: Global AI Chipset Market to Exceed $700B with 31.8% CAGR

According to TMR Research, the global artificial intelligence chipset market size is expected to exceed $700 billion, with a compound annual growth rate of 31.8% from 2022 to 2031. The article discusses the development trends, application areas, and key players in the artificial intelligence chipset market, which is highly timely and valuable for readers interested in the artificial intelligence chipset market.

IBM Research: How AI & Automation Protect Businesses from Data Breaches

IBM's report provides sufficient evidence that artificial intelligence, automation, and threat intelligence can address data breaches throughout the lifecycle, reduce costs, and provide stronger evidence. The research found that integrating artificial intelligence and automation into security operations teams can reduce the lifecycle of data breaches by 33% and costs by 33.6%. However, currently, only 28% of enterprises widely apply artificial intelligence and automation. Many enterprises rely on legacy systems, which are easily bypassed by attackers. The significance of this article lies in emphasizing the effectiveness of artificial intelligence and automation in improving cybersecurity and calling on enterprises to widely adopt these technologies to protect data security.

Google's AGI Robot Breakthrough: 54 - Member Team's 7 - Month Work, High Generalization and Reasoning 解释：核心关键词为“谷歌AGI机器人”（Google's AGI Robot）和“新成果”（Breakthrough），标题简洁地概括了主要内容，以动词开头，符合英文习惯，且长度在规定范围内。

The robotics research team at Google DeepMind recently released a robotics project called RT-2. This project took 7 months to develop and uses a large model for training. RT-2 has capabilities such as symbol understanding, reasoning, and human recognition, and can think and complete tasks based on human instructions. By combining the large model with the robot's operational capabilities, RT-2 can accomplish tasks that involve logical leaps, such as from 'extinct animals' to 'plastic dinosaurs'. The results of this project performed well in various sub - category tests, with performance up to three times that of the previous generation of robot models. This research result demonstrates the potential of large models in robotics research and is expected to drive the development of robots in the future.

RWKV: Small Team Aims to Be Android of AI Era with Big Model

Meta Intelligence OS is a startup founded by Bloomberg. It has developed a series of large models based on the open-source model RWKV and aims to become the Android in the era of large models. The RWKV model has superior performance and low cost in inference tasks, thus attracting customers from industries such as finance, law firms, and smart hardware. The business model of Meta Intelligence OS is model customization based on private data and internal AI Agent development. The company hopes to solve the problems of API call latency and data security by deploying large models on terminal devices. Currently, RWKV versions are available on Windows, Mac, and Linux computers, and Android and iOS versions are also in development. Meta Intelligence OS is raising funds and collaborating with chip companies and computing power platforms to create benchmark customers. Luo Xuan said that the decisive battlefield for large models is on hardware, and both terminal devices and the cloud require dedicated chips.

SoftBank and Intel Team Up to Develop New Energy-Efficient AI Memory Chip with Power Consumption Halved

Recently, SoftBank and Intel have jointly developed a new AI-specific memory chip aimed at significantly reducing power consumption to provide more efficient support for Japan's AI infrastructure. According to Nikkei Asia, the goal of their cooperation is to design a new type of stacked DRAM chip. The wiring method of this chip will be different from the popular High Bandwidth Memory (HBM) on the market, expected to reduce power consumption by about 50%. Image source note: The image was generated by AI, and the image rights service provider is Midjourney.

Cerebras Inference API Fully Opened: Developers Receive One Million Free Tokens Daily

Related Recommendations

Study: Global AI Chipset Market to Exceed $700B with 31.8% CAGR

IBM Research: How AI & Automation Protect Businesses from Data Breaches

RWKV: Small Team Aims to Be Android of AI Era with Big Model

SoftBank and Intel Team Up to Develop New Energy-Efficient AI Memory Chip with Power Consumption Halved