Skip to content

Inference

The process of using a trained model to generate predictions or outputs from new inputs. Inference happens every time you ask a chatbot a question or generate an image. Optimizing inference latency and cost is critical for production AI deployment.

Related terms

Model ServingLatency (AI)ThroughputGPU (Graphics Processing Unit)
← Back to glossary