309B / 15B MoE
The report lists 309B total parameters with 15B active parameters, providing a substantial base while keeping active compute comparatively lean.
MiMo-V2-Flash is our high-efficiency model designed for throughput, responsiveness, and strong price-performance. For teams that need low-latency inference and budget-aware deployment at scale, MiMo-V2-Flash is positioned as the practical production layer in the MiMo-V2 family.
In the supplied MiMo-V2 research material, MiMo-V2-Flash is positioned as the extreme-efficiency member of the lineup. It balances a large MoE base with fast inference, long context, and one of the most aggressive pricing profiles in the family.
The report lists 309B total parameters with 15B active parameters, providing a substantial base while keeping active compute comparatively lean.
MiMo-V2-Flash is described as supporting 150+ tokens per second, making it well suited to high-frequency interaction layers and fast response scenarios.
The model supports a 256K-token context window, giving production teams meaningful long-context capacity without stepping into flagship-level cost.
MiMo-V2-Flash is defined in the report by a combination of MoE scale, compact activation cost, long context, and highly competitive API rates.
The supplied material attributes a 5:1 hybrid attention architecture to MiMo-V2-Flash, helping it maintain inference efficiency while still supporting long documents and multi-step interaction.
Rather than taking the flagship reasoning role, MiMo-V2-Flash is presented as the model for frequent production traffic where responsiveness and unit economics matter most.
With 15B active parameters under an MoE design, the model is framed as a practical choice for scale-sensitive deployment where inference efficiency is central.
Its 256K context window allows Flash to support richer sessions, larger prompts, and document-driven use cases without moving up to the 1M context tier of MiMo-V2-Pro.
The report explicitly recommends MiMo-V2-Flash for high-frequency foundational interactions. That makes it a practical model layer for speed-sensitive production systems.
MiMo-V2-Flash fits chat surfaces, assistant panels, frequent website interactions, and operational flows where latency and unit cost need to stay tightly controlled.
It can serve as a fast execution layer beneath a more capable reasoning model, handling routine prompts, retrieval-driven answers, and repeated structured interactions at lower cost.
For batch-like or repetitive automation scenarios, the Flash positioning in the report makes it suitable where response quality must remain strong but premium model pricing is unnecessary.
Its price-performance profile makes it the most natural candidate in the family for scenarios where request volume matters as much as absolute intelligence ceiling.
The supplied report positions MiMo-V2-Flash as the lowest-cost model in the lineup among the text-focused variants.
| Model | Input / 1M Tokens | Output / 1M Tokens | Positioning |
|---|---|---|---|
| MiMo-V2-Flash | $0.10 | $0.30 | Designed for high-frequency, low-latency scenarios with strong price-performance. |
Return to the MiMo-V2 family overview to compare MiMo-V2-Flash with MiMo-V2-Pro, MiMo-V2-Omni, and MiMo-V2-TTS.