Amazon SageMaker AI Launches Real-Time Inference Endpoints Compatible with OpenAI API

Recently, Amazon SageMaker AI announced the launch of a real-time inference endpoint that supports OpenAI-compatible APIs. Users can simply change the endpoint URL to call models on SageMaker AI using tools such as the OpenAI SDK, LangChain, or Strands Agents, without needing additional client customization, SigV4 wrapping, or code rewriting.

This update opens up an /openai/v1 path for SageMaker AI endpoints, which can accept chat completion requests and return responses directly, including streaming output. All endpoints and inference components that use standard SageMaker AI APIs and SDKs have been enabled with OpenAI endpoints. By changing the URL, users' existing applications can seamlessly integrate.

SageMaker AI offers rich features that allow users to build multi-step AI agent workflows on their own infrastructure, such as using Strands Agents or LangChain. Users' agents can call models using the same OpenAI interface as their original framework, while the inference process runs on their own GPU instances. Additionally, users can host multiple models on the same SageMaker AI endpoint, such as Llama for general tasks, fine-tuned Mistral models for specific domains, and small models for classification—all accessible through the same OpenAI SDK.

To use these features, users need certain prerequisites, including having an AWS account and the appropriate permissions, installing the SageMaker and OpenAI Python SDKs, and preparing models stored in Amazon S3. In addition, using the SageMaker AI OpenAI-compatible endpoint requires Bearer Token authentication, and the SageMaker Python SDK includes tools to generate tokens, simplifying the authentication process.

In practice, users can easily deploy single-model endpoints or inference component endpoints to host multiple models on a single endpoint. With the OpenAI Python SDK, users can simply call these models to obtain the required inference results. The introduction of this new feature enables seamless integration between SageMaker AI and existing AI applications, providing users with a more efficient and flexible inference solution.

Key Points:
🌟 New OpenAI-Compatible API: The real-time inference endpoints of SageMaker AI now support the OpenAI API, allowing users to call models by simply changing the URL.
🛠️ Multi-Model Hosting: Users can host multiple models on the same endpoint, accessing them using the same OpenAI SDK.
🔑 Simplified Authentication Process: Supports Bearer Token authentication, making it easy for users to securely access SageMaker AI endpoints.

Preparations for Hong Kong IPO: Moonshot AI Officially Begins to Dismantle VIE Structure, Striving for a $20 Billion Capital Market

After completing a $2 billion funding round, Moonshot AI initiated an organizational restructuring, dismantling the VIE and red chip structure to remove regulatory obstacles for its Hong Kong IPO. The core of the restructuring involves converting offshore entities into domestic joint ventures, ensuring compliance with regulations for Chinese technology companies listed overseas, paving the way for the Hong Kong listing.

Google's AI Programming Tool for Android Helps Users Easily Create Applications

Google has launched the Android version of AI Studio, a programming tool that simplifies app development and is now available for pre-registration on Google Play. This tool enables users to build applications using artificial intelligence by following simple instructions through an intuitive interface and smart suggestions, making it suitable for both beginners and experienced developers.

Tencent Meeting Launches AI Simultaneous Interpretation Function: Real-time Translation Delay Reduced to 3 Seconds

Tencent Meeting has officially launched the AI Simultaneous Interpretation function, offering real-time Chinese-English translation to all users for the first time, aiming to improve communication efficiency in cross-border meetings and remote collaboration. The function keeps the translation delay within 3 seconds, achieving near-synchronous speech and translation, effectively solving issues of delay and interruption in traditional simultaneous interpretation, helping participants communicate more smoothly and avoid information loss and misunderstandings.

15 people created a movie in 14 days! ByteDance's Seedance 2.0 demonstrates the disruptive power of AI at Cannes

At the French Cannes Film Festival, ByteDance's Volcano Engine launched the video generation model Seedance 2.0 and demonstrated its commercial application in film production. Eight AI films created based on this model were showcased, including the world's first 95-minute AI feature film 'HELL GRIND' produced by the US platform Higgsfield, marking the acceleration of generative AI into the mainstream film industry.

Advertising as a Service: The AI-Driven Redefinition of Google Search, Integrating Paid Recommendations into the Chat Flow

At the 2026 I/O conference, Google announced the most significant restructuring of its search business in 25 years, fully integrating Gemini 3.5 Flash and launching an AI-driven new ad format. These ads break the traditional boundaries between search ads and results, evolving from passive displays to active conversational services. With Gemini's reasoning capabilities, commercial information is deeply integrated, enabling a smarter and more natural interactive experience.