Latency (AI)

The time delay between sending a request to an AI model and receiving the response. Lower latency improves user experience in interactive applications. It depends on model size, hardware, network conditions, and optimizations like caching and quantization.