Large models' "memory anxiety" when processing long texts is expected to become history. Recently, the Tokyo-based AI startup Sakana AI announced two breakthrough technologies: Text-to-LoRA (T2L) and Doc-to-LoRA (D2L). These technologies, through an innovative "super network" architecture, allow large models to "absorb" extremely long documents or learn new tasks without retraining, in less than a second.

For a long time, AI developers have faced a dilemma: either force long documents into a chat window (causing slow responses and high memory usage), or pay a high price to fine-tune the model. Sakana AI offers a third solution — through a one-time payment pre-training, it generates extremely small weight plugins (LoRA), achieving low-cost and efficient model adaptation.
Doc-to-LoRA: Reducing Memory Requirements from 12GB to 50MB
This is the most impressive technology released this time. When processing a document of 128,000 tokens (about 100,000 words) using traditional methods, the model requires more than 12GB of VRAM to store information. With D2L technology, the model can directly "digest" this information into a plugin of less than 50MB.
Amazing speed: Traditional technology takes 40 to 100 seconds to digest a document, while D2L takes less than 1 second.
Breaking limits: It allows the model to handle text four times longer than the original window, maintaining near-perfect accuracy in the "needle in a haystack" test.
Text-to-LoRA: Customizing AI with Common Language
Text-to-LoRA makes the model more obedient. Users just need to describe a task in natural language (e.g., "help me solve a complex math competition problem"), and the system will automatically generate a dedicated performance-enhancing plugin. Experiments show that adapters generated in this way perform even better than independent models specifically trained for the task in math and logical reasoning tasks.
Powerful Cross-Modal Technology: Let Text Models Also "See" Images
Researchers also discovered an unexpected surprise: D2L has strong cross-modal capabilities. By mapping visual information into the parameters of a pure text model, a text model that has never seen images can classify images with an accuracy of **75.03%**.
Sakana AI's series of achievements not only greatly lowers the threshold for individuals and companies to customize private AI models but also opens up a new path toward lighter and smarter general artificial intelligence (AGI).
Paper: https://arxiv.org/pdf/2602.15902
