Alignment
The process of ensuring an AI model's behavior matches human intentions, values, and expectations. Techniques include RLHF, constitutional AI, and direct preference optimization. Alignment aims to make models helpful, harmless, and honest.
The process of ensuring an AI model's behavior matches human intentions, values, and expectations. Techniques include RLHF, constitutional AI, and direct preference optimization. Alignment aims to make models helpful, harmless, and honest.