SantageAI Glossary › Latency
AI Glossary

What is Latency?

Latency is the time it takes for an AI system to process an input and return a response.

What is the core idea behind AI latency?

Latency defines user experience.

How do AI latency differ from related concepts?

ConceptDifference
Latency vs ThroughputLatency is response time. Throughput is volume
Latency vs SpeedLatency is per request. Speed can be aggregate
Latency vs ComputeMore compute can reduce latency, but increases cost

How do AI latency work?

What are the limitations of AI latency?

Why are AI latency important?

Latency directly impacts usability, especially in real-time applications like chat, voice assistants, and trading systems.

How are AI latency used in practice?

Critical in chatbots, trading systems, gaming AI, and customer-facing tools. Optimization techniques include caching, batching, model compression, and better infrastructure.

Frequently Asked Questions

Why are large AI models slower?
They require more computation per request, increasing processing time.
Can latency be optimized?
Yes, using techniques like caching, batching, model compression, and better infrastructure.