Against the backdrop of rapid iteration in large models, computing costs and processing efficiency have always been focal points of the industry. Recently, Fujitsu has unveiled an innovative architecture called PHOTON (Top-down Network Parallel Hierarchical Computing), aimed at breaking through the performance bottlenecks of traditional Transformer models in complex scenarios.

The Transformer architecture, which is currently mainstream in the AI field, although powerful, often faces challenges in handling long texts or high-concurrency multi-query tasks due to frequent memory access to retrieve historical information, leading to slow processing speeds and increased GPU computational burden. Fujitsu's research team cleverly bypassed this pain point through a re-design of the PHOTON architecture.

image.png

The core advantage of the PHOTON architecture lies in its unique hierarchical processing mechanism. Unlike traditional Transformers that use token-level segmentation, PHOTON introduces semantic layering technology, which not only effectively reduces computational complexity but also significantly enhances parallel computing capabilities. In addition, during the decision-making process of multi-query tasks, the architecture achieves a streamlined workflow that requires only one inference to reach a conclusion by using "majority voting" or "best choice" strategies.

Test data shows that in small models with parameter sizes of 600M, 900M, and 1.2B, PHOTON demonstrates extremely high throughput and very low memory usage. Especially in the 1.2B parameter model, its multi-query performance reaches 475 times that of mainstream Transformer architectures, greatly optimizing resource scheduling efficiency.

Because this architecture requires less KV Cache per iteration, it means the system can support a higher number of iterations. This is a significant performance gain for intelligent agent systems that need to handle a large number of I/O processes. Although there is a slight trade-off in some quality metrics, PHOTON, thanks to its leapfrog progress in computational efficiency, offers a highly promising technical solution for reducing AI operational costs.

Currently, Fujitsu is actively promoting the application of this architecture, hoping to provide a lighter and more efficient underlying support for future intelligent scenarios through innovations in underlying algorithms.