Alignment

The process of ensuring an AI model's behavior matches human intentions, values, and expectations. Techniques include RLHF, constitutional AI, and direct preference optimization. Alignment aims to make models helpful, harmless, and honest.