Skip to content

Throughput

The number of requests or tokens an AI system can process per unit of time. Techniques like batching, speculative decoding, and model parallelism increase throughput. High throughput is essential for serving many concurrent users cost-effectively.

Related terms

Latency (AI)InferenceBatch Inference
← Back to glossary