Skip to content

Alignment

The process of ensuring an AI model's behavior matches human intentions, values, and expectations. Techniques include RLHF, constitutional AI, and direct preference optimization. Alignment aims to make models helpful, harmless, and honest.

Related terms

RLHF (Reinforcement Learning from Human Feedback)DPO (Direct Preference Optimization)Constitutional AIAI Safety
← Back to glossary