SantageAI Glossary › Model Serving
AI Glossary

What is Model Serving?

Model serving is the process of deploying a trained AI model so it can receive inputs and return outputs in real-time or batch environments.

What is the core idea behind model serving?

Serving is how models become products.

How does model serving differ from related concepts?

ConceptDifference
Training vs ServingTraining builds. Serving delivers
Serving vs InferenceInference is prediction. Serving manages delivery
Serving vs APIAPIs expose the model. Serving powers the backend

How does model serving work?

What are the limitations of model serving?

Why is model serving important?

Without serving, models cannot be used in real-world applications. Serving infrastructure determines the reliability, speed, and cost of AI services.

How is model serving used in practice?

Used in chat apps, recommendation systems, fraud detection, and enterprise AI tools. Major platforms include AWS SageMaker, Google Vertex AI, and Azure ML.

Frequently Asked Questions

What makes serving difficult at scale?
Handling high traffic, maintaining low latency, and optimizing cost-performance trade-offs.
Is serving only real-time?
No. It can be real-time or batch depending on the use case.