Skip to content

Latency (AI)

The time delay between sending a request to an AI model and receiving the response. Lower latency improves user experience in interactive applications. It depends on model size, hardware, network conditions, and optimizations like caching and quantization.

Related terms

InferenceThroughputModel Serving
← Back to glossary