Attention Head
A single attention mechanism within a multi-head attention layer. Each head independently learns to focus on different relationships — some track syntax, others semantic similarity, others positional patterns. Multiple heads together capture richer representations.