Apple recently released a paper from its Machine Learning Research Center, introducing an AI agent called SceneScout. This technology aims to provide detailed environmental descriptions for visually impaired people by analyzing street view images, helping them understand the surrounding terrain before visiting new locations.
Currently, many visually impaired people often hesitate when traveling independently because they are unsure about the specifics of unfamiliar environments. Although there are some tools such as Microsoft's Soundscape app that can provide descriptions of the on-site environment, these tools are mainly used when the user is on-site, not for prior preparation. Therefore, the landmark and navigation information that visually impaired people obtain before traveling often fails to meet their needs for background information about the environment. SceneScout was created to fill this gap.
SceneScout is an AI agent driven by a multimodal large language model, with two main functional modes. The "route preview" mode can provide detailed descriptions of elements visible along the way, such as reminding users to pay attention to tactile elements like trees by the roadside when turning. The "virtual exploration" mode allows users to freely move within street view images to obtain more intuitive environmental information.
In user studies, participants said that SceneScout greatly enhanced their understanding of the environment because this AI can obtain information that they cannot get through existing tools. The study shows that the accuracy of SceneScout's description reaches 72%, and for stable visual elements, the accuracy even reaches 95%. However, participants also provided some suggestions for improvement, such as providing personalized descriptions or adjusting the perspective of the description to be more in line with pedestrian positions.
Additionally, participants hope that SceneScout can provide real-time street view descriptions to synchronize with their walking location, and even provide visual information through bone conduction headphones while the user is moving. Using the gyroscope and compass in the device, SceneScout can point to details in the environment, further improving the user experience.
Although this paper does not represent that Apple will definitely launch related products or services, it provides us with an insight into Apple's application of this technology. In the future, combined with AI and real-time data, Apple may be able to create more convenient tools for visually impaired people.
Key points:
🌍 SceneScout is an AI agent designed to provide detailed environmental descriptions for visually impaired people, helping them understand the terrain of unfamiliar locations in advance.
🔍 This technology has two modes, "route preview" and "virtual exploration," which can provide environmental information based on street view images.
📈 User studies show that SceneScout performs well in the accuracy of descriptions, and some suggestions for personalization and real-time feedback have been proposed.