Stepfun has officially released its latest open-source foundation model - Step3.5Flash. Designed specifically for agent scenarios, this model offers powerful reasoning capabilities and ultra-fast response speeds, aiming to provide developers with a smarter, more stable, and cost-effective "agent brain".

image.png

As a highly targeted lightweight model, Step3.5Flash has achieved breakthroughs in multiple dimensions:

  • Ultra-speed: The inference speed can reach up to 350 TPS (tokens per second), especially excelling in code-related tasks.

  • Performance comparable to closed-source models: In core agent application scenarios and mathematical logic tasks, its performance is comparable to mainstream closed-source large models.

  • Stability for long-chain tasks: It has the stability to handle complex, long logical chain tasks and efficiently deals with ultra-long context of 256K.

Technical Architecture: Balancing Efficiency and Depth

Step3.5Flash adopts an advanced sparse MoE (Mixture of Experts) architecture, with a total parameter count of 196 billion, but only about 11 billion parameters are activated per token. To further improve efficiency, the model introduces MTP-3 technology, enabling the prediction of 3 tokens at once, doubling the efficiency. Additionally, by combining a sliding window with global attention, the model can accurately capture key points in long texts, significantly reducing computational costs.

Real-world Testing Across Scenarios: From Code to Edge-Cloud Collaboration

In practical application demonstrations, Step3.5Flash has shown diverse capabilities:

  • Smart Programming: It can automatically write and output a high-performance visualization platform based on the WebGL2.0 engine, just from a textual description.

  • Complex Computation: Without using external tools, it can quickly complete difficult mathematical operations such as arithmetic sequence summation and factorial accumulation.

  • Edge-Cloud Collaboration: As a "cloud-based brain", it can break down users' vague needs (such as comparing prices across platforms) into specific search and scraping sub-tasks, greatly simplifying the difficulty on the local execution end and ensuring the reliability of results.

Currently, Step3.5Flash is fully available on mainstream platforms, including GitHub, HuggingFace, and OpenRouter. To lower the barrier for local deployment, Stepfun has specifically optimized the model's performance on personal workstations (such as NVIDIA DGX and Apple M4Max). Moreover, the company has announced the start of training for the Step4 model and has invited global developers to jointly define the next generation of agent foundation models.

  • OpenRouter is offering free access, upgrade your agent at 0 cost: https://openrouter.ai/stepfun/step-3.5-flash

  • GitHub download for quick deployment, build your own agent: https://github.com/stepfun-ai/Step-3.5-Flash/tree/main

  • Get model weights on HuggingFace: https://huggingface.co/stepfun-ai/Step-3.5-Flash