Microsoft recently released Agent Lightning, an open-source framework designed to optimize multi-agent systems through reinforcement learning (RL). Agent Lightning can convert real agent behavior into RL transitions without changing the existing agent architecture, thereby improving the performance of large-scale language models (LLM).

Agent Lightning models agents as a decision-making process, specifically formalizing agents as partially observable Markov decision processes. An agent's observation is the current input, its action is a model call, and the reward can be either a terminal or intermediate reward. The framework extracts the call logs of the agent model, along with input, output, and reward information, filtering out unnecessary noise to generate clean transition data for training.
The framework adopts a "training and deployment decoupling" approach, with the Lightning Server handling training and service, and providing an API interface compatible with OpenAI, making it easy to call updated models. Meanwhile, the Lightning Client captures call logs in the existing agent runtime and sends the data back to the server in real time. This design maintains tight integration with tools, browsers, and other dependencies, while placing GPU training on the server layer.

Agent Lightning supports two tracking paths. The default path uses OpenTelemetry for data collection, making it convenient to send agent telemetry information to a standard collector. There is also a lightweight embedded tracker suitable for teams that do not want to deploy OpenTelemetry. Ultimately, all data is stored in the same location for training purposes.
In terms of experiments, the research team evaluated three tasks: text-to-SQL, retrieval-augmented generation, and math QA. The text-to-SQL task uses the Spider benchmark, covering over 10,000 questions and 200 databases. Retrieval-augmented generation uses the MuSiQue benchmark, built on a Wikipedia-scale index containing 21 million documents. Math QA uses the Calc X dataset, performing calculations through tool calls. Training on each task showed stable reward improvements.
Paper: https://arxiv.org/abs/2508.03680v1
Key Points:
🌟 Agent Lightning is an open-source framework that optimizes multi-agent systems without restructuring the existing system.
🚀 The framework models agents as partially observable Markov decision processes, extracting clean training transition data.
📈 Experiments show significant performance improvements of Agent Lightning on tasks such as text-to-SQL, retrieval-augmented generation, and math QA.
