Recently, the renowned open-source browser automation project BrowserUse officially released its first self-developed large language model - BU-30B-A3B-Preview. The model immediately attracted widespread attention and was hailed as the "new benchmark in the field of web agents," completely breaking through the barriers of AI browser operations with exceptional cost-effectiveness and real-time speed.

Model Architecture: MoE Design, Achieving "Powerful Brain, Lightweight Body"

BU-30B-A3B-Preview adopts a mixture-of-experts (MoE) architecture, with a total parameter scale of 30B (30 billion), but only activates 3B (3 billion) parameters during actual inference. This allows the model to maintain top-level intelligence while significantly reducing resource consumption, running smoothly on a single consumer-grade GPU.

The model is deeply fine-tuned based on Alibaba Cloud Tongyi Qwen3-VL-30B-A3B-Instruct, optimized specifically for browser automation scenarios, supporting multimodal input (visual + text), with a context length of up to 32K tokens, easily handling complex long web content.

image.png

Core Capabilities: Excellent DOM Understanding and Visual Reasoning

BU-30B-A3B-Preview performs outstandingly in browser operation tasks, offering comprehensive web interaction functions, including precise element positioning, clicking, scrolling, and form filling. Its remarkable DOM (Document Object Model) understanding and visual reasoning capabilities enable AI agents to "understand" page layouts and screenshots like humans, achieving highly reliable automation execution.

The official emphasized that this model is particularly suitable for building Web Agent application scenarios, such as automated testing, data collection, and RPA processes, and has reached industry-leading levels in internal benchmark tests.

Performance Highlights: Double Dominance in Speed and Cost

Official comparison data shows that BU-30B-A3B-Preview far exceeds mainstream commercial models in task completion speed and cost-effectiveness:

- On average, each operation step takes only 1.2 seconds, significantly leading in overall task completion time.

- Remarkable cost-effectiveness: approximately 200 browser tasks can be reliably completed per dollar of computing resources, which is dozens of times that of some competing models.

Due to its relatively small model size (friendly for single GPU deployment), developers can easily download and test locally without high cloud costs.

Open Source Significance: Accelerating the Development of the Web Agent Ecosystem

BU-30B-A3B-Preview is fully open source, with model weights uploaded to the Hugging Face platform, allowing any developer to freely obtain and integrate it into the BrowserUse open-source library. This marks the entry of the browser automation field into the "efficient open-source era," and is expected to promote more innovative applications.

AIbase Comment: The release of BU-30B-A3B-Preview perfectly solves the pain points of traditional large models in browser tasks, being "expensive and slow." For enterprises and developers needing large-scale web automation, this is undoubtedly a highly cost-effective choice. In the future, as the community further optimizes it, the model is expected to become the standard configuration in the Web Agent field. Interested readers can immediately go to Hugging Face to download and experience it.

Address: https://huggingface.co/browser-use/bu-30b-a3b-preview