Skip to content

Interpretability

The ability to understand the internal mechanisms of an AI model — what features activate, how attention patterns form, and what representations the model learns. It differs from explainability in focusing on model internals rather than output-level explanations.

Related terms

ExplainabilityAttention HeadAI Safety
← Back to glossary