Apple's AI research team recently launched a 3D generation large model called LiTo (Surface Light Field Tokenization). This technology has overcome long-standing challenges in the field of 3D reconstruction, achieving the generation of complete 3D objects with high-fidelity lighting effects based solely on a single 2D image.

The core of LiTo lies in the innovative application of a latent space and a novel unified 3D latent representation:
Efficient Encoding: It compresses complex surface light field data into compact vector sets, mathematically describing the physical laws of an object's geometry and light interaction.
Bidirectional Mechanism: It uses an encoder-decoder architecture. The encoder extracts geometric structure and appearance features; the decoder reverses the process, accurately reproducing advanced visual effects such as specular highlights and Fresnel reflections.
Performance: Consistency of Lighting Across Multiple Viewpoints
To train LiTo, the research team used a 3D dataset containing thousands of objects. Experimental results show:
Resolving Directional Bias: LiTo strictly follows the camera coordinate system, solving the common issue of incorrect object orientation found in similar models.
Leading Metrics: In terms of multi-view lighting consistency, LiTo improves by approximately 37% over the current top model, TRELLIS.
This achievement marks a further reduction in the barriers to 3D content creation, and it is expected to provide higher quality material generation support for augmented reality (AR) and spatial computing devices (such as Vision Pro) in the future.
