SantageAI Glossary › AI Alignment
AI Glossary

What is AI Alignment?

AI Alignment is the field of ensuring that artificial intelligence systems behave in ways that are consistent with human values, goals, and intentions.

What is the core idea behind AI alignment?

Alignment is the gap between what we want and what AI optimizes for.

How do AI alignment differ from related concepts?

ConceptDifference
Alignment vs CapabilityCapability is what AI can do. Alignment is whether it should do it
Alignment vs SafetySafety focuses on preventing harm. Alignment focuses on intent and objectives
Alignment vs RLHFRLHF is one method. Alignment is the broader problem

How do AI alignment work?

What are the limitations of AI alignment?

Why are AI alignment important?

As AI systems become more capable, misalignment can lead to unintended or harmful outcomes, even when systems appear to function correctly.

How are AI alignment used in practice?

Alignment techniques include reinforcement learning from human feedback, constitutional constraints, and evaluation frameworks used by organizations like Anthropic.

Frequently Asked Questions

Why is AI alignment considered a hard problem?
Alignment is difficult because human values are complex, context-dependent, and often inconsistent. Translating them into precise objectives that machines can optimize without unintended consequences remains an open challenge.
Can aligned AI still produce harmful outcomes?
Yes. Even well-aligned systems can behave unexpectedly if the objectives are incomplete, the environment changes, or edge cases were not accounted for during training.