Apple Develops AI Agent to Assist Blind Individuals in Virtual Exploration of Street Scenes

AIbase基地

Published in AI News · 5 minute read · Jul 8, 2025

Apple recently released a paper from its Machine Learning Research Center, introducing an AI agent called SceneScout. This technology aims to provide detailed environmental descriptions for visually impaired people by analyzing street view images, helping them understand the surrounding terrain before visiting new locations.

Currently, many visually impaired people often hesitate when traveling independently because they are unsure about the specifics of unfamiliar environments. Although there are some tools such as Microsoft's Soundscape app that can provide descriptions of the on-site environment, these tools are mainly used when the user is on-site, not for prior preparation. Therefore, the landmark and navigation information that visually impaired people obtain before traveling often fails to meet their needs for background information about the environment. SceneScout was created to fill this gap.

Apple

SceneScout is an AI agent driven by a multimodal large language model, with two main functional modes. The "route preview" mode can provide detailed descriptions of elements visible along the way, such as reminding users to pay attention to tactile elements like trees by the roadside when turning. The "virtual exploration" mode allows users to freely move within street view images to obtain more intuitive environmental information.

In user studies, participants said that SceneScout greatly enhanced their understanding of the environment because this AI can obtain information that they cannot get through existing tools. The study shows that the accuracy of SceneScout's description reaches 72%, and for stable visual elements, the accuracy even reaches 95%. However, participants also provided some suggestions for improvement, such as providing personalized descriptions or adjusting the perspective of the description to be more in line with pedestrian positions.

Additionally, participants hope that SceneScout can provide real-time street view descriptions to synchronize with their walking location, and even provide visual information through bone conduction headphones while the user is moving. Using the gyroscope and compass in the device, SceneScout can point to details in the environment, further improving the user experience.

Although this paper does not represent that Apple will definitely launch related products or services, it provides us with an insight into Apple's application of this technology. In the future, combined with AI and real-time data, Apple may be able to create more convenient tools for visually impaired people.

Key points:
🌍 SceneScout is an AI agent designed to provide detailed environmental descriptions for visually impaired people, helping them understand the terrain of unfamiliar locations in advance.
🔍 This technology has two modes, "route preview" and "virtual exploration," which can provide environmental information based on street view images.
📈 User studies show that SceneScout performs well in the accuracy of descriptions, and some suggestions for personalization and real-time feedback have been proposed.

Aliyun Open-Sources Network Agent WebSailor, Surpassing Numerous Closed-Source Models

Aliyun open-sources the network agent WebSailor. Its 32B and 72B versions performed well in the BrowseComp evaluation, surpassing multiple closed-source models, ranking just behind OpenAI DeepResearch. The project has been released on GitHub with construction plans and datasets, promoting open innovation in the AI field and providing developers with a smarter web interaction tool.

Shanghai Issues 17 New Policies! Supporting the Growth of the Software and Information Services Industry, Excellent AI Projects Can Receive Up to 30% Subsidies!

Shanghai has introduced 17 measures to promote the high-quality development of the software and information services industry. From January to May 2023, the industry's revenue exceeded 690 billion yuan, representing a year-on-year growth of 20.4%. The policy highlights include rewards for enterprises with revenue exceeding 2 billion yuan and meeting growth targets; small and medium-sized enterprises can receive up to 30 million yuan in tiered rewards; small enterprises can receive growth rewards of 300,000 to 500,000 yuan. Priority is given to the AI field, with computing vouchers to reduce model application costs, and excellent AI projects can receive subsidies of up to 30%-50%. Meanwhile, it promotes the innovation and development of high-end software and digital content.

ChatGPT Launches New 'Learn Together' Feature to Drive Transformation in the Education Sector

ChatGPT launches a new 'Learn Together' feature that promotes active thinking through question-based guidance, similar to Google LearnLM. This feature could evolve into interactive learning groups, currently available only to a subset of subscription users. As an educational tool, ChatGPT is widely used for curriculum design and study assistance. The new feature aims to standardize usage and reduce academic misconduct. Although the specific release scope is not yet determined, this marks an innovative exploration of AI in the education sector, which may change traditional teaching interaction models.

Apple AI CEO Joins Meta, Attracting Industry Attention

Apple AI executive Ruoming Pang has joined Meta, joining its newly established Super Intelligence Lab with a salary of millions of dollars. Meta recently restructured its AI business, led by former Scale AI CEO Alexandr Wang, and invested 2.9 billion dollars in Scale AI. This move highlights the increasing competition among tech giants for AI talent, as Meta accelerates its layout through high salaries and strategic investments. Apple and Meta have not commented on the matter.

Microsoft Launches Deep Research: Automated Research Aids Scientific and Business Analysis

Microsoft has released the public preview of Deep Research, a new service called Azure AI Foundry. This service functions as a research assistant similar to OpenAI agents. It can automatically break down complex tasks, perform multi-turn information retrieval and verification using Bing search and GPT models, and generate audit-able research reports. The service is applicable to fields such as academia, finance, and healthcare, and supports API integration, significantly improving research efficiency. Applications are now open, and developers can integrate its automation capabilities into their own applications.