Microsoft recently released Fara-7B, a new computer use agent (CUA) with 7 billion parameters, designed to perform complex tasks directly on the user's device. Thanks to its compact size, Fara-7B successfully overcomes the main barrier for enterprises in data security, allowing users to automate processing of sensitive workflows, such as managing internal accounts or handling company confidential data, without the information leaving the device.
Fara-7B works by visually identifying web pages, just like humans use a mouse and keyboard. The model perceives web pages through screenshots and predicts specific coordinates to perform actions such as clicking, typing, and scrolling. Unlike systems that rely on "accessibility trees," Fara-7B fully relies on pixel-level visual data, which allows it to effectively interact with complex or confusing web code.

In performance tests, Fara-7B achieved a 73.5% task success rate on the WebVoyager benchmark, surpassing larger systems such as GPT-4o (65.1%) and local UI-TARS-1.5-7B (66.4%). Additionally, Fara-7B is highly efficient, completing tasks in an average of about 16 steps, while UI-TARS-1.5-7B requires about 41 steps.
Although the release of Fara-7B is promising, it also faces risks similar to other AI models, such as potential misjudgments and execution errors under complex instructions. To address these issues, Fara-7B is trained to identify "key points," pausing and requesting user approval when personal data or consent is required, to avoid irreversible operations. Microsoft's research team designed a user interface called Magentic-UI, aiming to balance these key points and user experience, reducing user fatigue.

The development of Fara-7B also demonstrates the trend of knowledge distillation, compressing the capabilities of complex systems into more efficient small models. Future versions will focus on making the model smarter, rather than simply increasing its size, and explore learning through reinforcement learning in real-time sandbox environments.
Microsoft has provided a MIT-licensed version of Fara-7B on Hugging Face and Microsoft Foundry for users to experiment and develop prototypes, but it is not yet suitable for direct deployment in critical tasks.
Key Points:
🌟 Fara-7B is a locally running computer intelligent assistant, focusing on data security and privacy protection.
⚙️ The model processes web pages through visual methods, providing a more intuitive interaction with users, and is much more efficient than other large models.
🛡️ Fara-7B has a "key point" identification feature, ensuring users can confirm before critical operations, enhancing security.
