Inference

The process of using a trained model to generate predictions or outputs from new inputs. Inference happens every time you ask a chatbot a question or generate an image. Optimizing inference latency and cost is critical for production AI deployment.