What is MiMo-V2-Flash?

MiMo-V2-Flash is positioned in the supplied research material as the high-efficiency model in the MiMo-V2 lineup, intended for high-frequency, low-latency production workloads.

What are the key MiMo-V2-Flash specs?

The supplied report lists MiMo-V2-Flash at 309B total parameters with 15B active parameters under a Mixture-of-Experts design, a 256K-token context window, a 5 to 1 hybrid attention ratio, and 150 plus tokens per second inference.

How much does MiMo-V2-Flash API cost?

The report lists MiMo-V2-Flash at 0.10 US dollars per million input tokens and 0.30 US dollars per million output tokens, positioning it as the most cost-efficient text-focused model in the family.

MiMo-V2-Flash: Xiaomi MoE Model for Low-Latency AI

Overview

In the supplied MiMo-V2 research material, MiMo-V2-Flash is positioned as Xiaomi's efficiency-focused MoE model in the lineup. It balances a large MoE base with fast inference, 256K context, and one of the most aggressive pricing profiles in the family.

Scale

309B / 15B MoE

The report lists 309B total parameters with 15B active parameters, providing a substantial base while keeping active compute comparatively lean.

Speed

150+ TPS

MiMo-V2-Flash is described as supporting 150+ tokens per second, making it well suited to high-frequency interaction layers and fast response scenarios.

Context

256K Tokens

The model supports a 256K-token context window, giving production teams meaningful long-context capacity without stepping into flagship-level cost.

This page summarizes Xiaomi MiMo public materials and focuses on MiMo-V2-Flash as a distinct product entity for efficient, low-latency deployment and API-based production workloads.

Efficiency Profile

MiMo-V2-Flash is defined in the report by a combination of MoE scale, compact activation cost, long context, and highly competitive API rates.

Hybrid Attention 5:1

The supplied material attributes a 5:1 hybrid attention architecture to MiMo-V2-Flash, helping it maintain inference efficiency while still supporting long documents and multi-step interaction.

Built for Throughput

Rather than taking the flagship reasoning role, MiMo-V2-Flash is presented as the model for frequent production traffic where responsiveness and unit economics matter most.

Low Activation Cost

With 15B active parameters under an MoE design, the model is framed as a practical choice for scale-sensitive deployment where inference efficiency is central.

Long-Context Practicality

Its 256K context window allows Flash to support richer sessions, larger prompts, and document-driven use cases without moving up to the 1M context tier of MiMo-V2-Pro.

Recommended Use Cases

The report explicitly recommends MiMo-V2-Flash for high-frequency foundational interactions. That makes it a practical model layer for speed-sensitive production systems.

High-Frequency Interaction

MiMo-V2-Flash fits chat surfaces, assistant panels, frequent website interactions, and operational flows where latency and unit cost need to stay tightly controlled.

Production Fallback Layer

It can serve as a fast execution layer beneath a more capable reasoning model, handling routine prompts, retrieval-driven answers, and repeated structured interactions at lower cost.

Cost-Efficient Automation

For batch-like or repetitive automation scenarios, the Flash positioning in the report makes it suitable where response quality must remain strong but premium model pricing is unnecessary.

Traffic Scaling

Its price-performance profile makes it the most natural candidate in the family for scenarios where request volume matters as much as absolute intelligence ceiling.

API Pricing

The supplied report positions MiMo-V2-Flash as the lowest-cost model in the lineup among the text-focused variants.

Model	Input / 1M Tokens	Output / 1M Tokens	Positioning
MiMo-V2-Flash	$0.10	$0.30	Designed for high-frequency, low-latency scenarios with strong price-performance.

Official Resources

Primary Links

Navigation

Return to the MiMo-V2 family overview to compare MiMo-V2-Flash with MiMo-V2-Pro, MiMo-V2-Omni, and MiMo-V2-TTS.