Prevod v pripravi Ta stran še ni v celoti prevedena v slovenščino. Vsebina je trenutno prikazana v angleščini. Odpri angleško različico →

Foundations · primer 02

Five shapes
most AI deployments take.

"AI in your business" is not one workload - it is one of five patterns, sometimes a few of them combined. RAG, chat, agentic, fine-tuning, classification. Each pattern has a different shape, a different failure mode, and a different infrastructure footprint. Knowing which pattern fits your problem is the difference between a deployment that scales and one that has to be rebuilt.

See the five patterns Help me identify mine

On this page Why it matters The five patterns Compare Identify yours In practice Read next

Why pattern matters

The pattern decides the infrastructure.

A chat assistant for fifty concurrent users does not need the same infrastructure as a fine-tuning pipeline that runs once a quarter. An agentic workflow that takes ten minutes on a task does not need the same infrastructure as a classifier that processes ten thousand documents an hour. They are different patterns with different bottlenecks.

Specifying the wrong pattern is the most expensive mistake in early AI deployment. Buying a fine-tuning rig for a problem that wants RAG. Buying chat-class hardware for a workload that turns out to be agentic. Sizing for one pattern and discovering, six months in, that the actual workload is two patterns blended.

This page is the short, vendor-agnostic answer to the question your AI consultancy partner will ask you on day one of the engagement: what shape is this deployment?

The five patterns

In order of how often they show up.

The first three account for the overwhelming majority of enterprise AI deployments today. Fine-tuning and classification are common but more specialised. Most real deployments are one primary pattern plus a sprinkle of another - chat with RAG, agentic with classification, RAG with a fine-tuned embedding model.

Pattern

Retrieval-augmented generation

RAG

A language model answers from documents you already own.

A retrieval system finds relevant chunks of your knowledge base, the language model reads them, and the answer is grounded in your data rather than the model's pre-training. The single most common pattern in enterprise AI today - because it answers the "we want AI on our documents" brief without committing to fine-tuning.

Where it shines

Document-heavy domains where the answer is in the data and the question varies. Internal knowledge bases, technical documentation, regulatory archives, product catalogues, customer-support history.

Where it struggles

Quality is bounded by the retrieval step. A miss on retrieval is a miss on the answer. Messy documents, inconsistent metadata, or a poor chunking strategy break the pattern silently - the model sounds confident even when retrieval failed.

Infrastructure footprint

Three workloads in one deployment: embedding model (one-time per document, plus updates), vector database, and language-model inference. Single-server is usually sufficient at start; scales to small cluster as corpus and concurrency grow. Storage IO and embedding-model throughput matter as much as the LLM.

Common variants

Hybrid retrieval (lexical + semantic) · graph-RAG (over knowledge graphs) · multi-hop RAG (chained retrieval) · agentic RAG (the model decides what to retrieve next)

Pattern

Chat and assistants

Chat

A conversational interface, optionally with tools.

A familiar UX layered over a language model. Internal employee assistants, customer-facing help desks, copilots inside existing software. Often combined with RAG for knowledge grounding and with tool use for action - booking, calculation, search, function calls into other systems.

Where it shines

Tasks where the human in the loop is essential and the model serves as a fast-typing junior teammate. Drafting, summarising, explaining, navigating documentation, light task automation.

Where it struggles

Latency-bounded - a slow chat is a dead chat. Long conversations drift as context fills the window. Concurrency under load forces capacity decisions early. Hallucinations are visible in a chat in a way they are not in a batch job.

Infrastructure footprint

Workstation through single-server is typical. The metric is tokens per second per user; the load driver is concurrent users. Smaller, well-fitted models often outperform a single oversized one; an 8B-class model serving fifty concurrent users beats a 70B-class model timing out under load.

Common variants

Role-specific assistants · copilots embedded in existing software · voice assistants · multi-modal chat (image / document upload) · proactive assistants that initiate

Pattern

Agentic workflows

Agentic

A model plans, calls tools, and executes a multi-step task.

The model is given a goal and a set of tools - read a file, query a database, call an API, run a script - and decides how to combine them to reach the goal. May involve a single agent reasoning step-by-step or multiple agents collaborating with defined roles. The pattern that turns "AI assistant" into "AI worker."

Where it shines

Long-running tasks with predictable structure but variable execution paths. Code migration, data reconciliation, multi-system reporting, structured research across a known set of sources, repetitive workflows that resist hard-coded automation.

Where it struggles

Errors compound - a wrong decision at step three becomes catastrophic by step ten. Observability is hard; debugging a five-step agent run is harder than debugging a five-line script. Cost predictability is poor without strict step limits and budget controls. Orchestration framework choice matters a lot and changes often.

Infrastructure footprint

Single-server through cluster. Wall-clock latency is more tolerant - agents can take minutes on a task - but throughput and reliability matter more than for chat. Often paired with an orchestration framework (LangGraph, AutoGen, CrewAI, or custom). State persistence and audit logging become first-class concerns.

Common variants

ReAct (reason-then-act loop) · plan-and-execute · multi-agent collaborative · supervisor-worker hierarchies · MCP-based tool servers

Pattern

Fine-tuning and domain adaptation

Fine-tuning

Training a base model further on your data.

A pre-trained base model is adapted to a domain, a style, or a specific task by additional training on data the customer owns. Less common than RAG for the simple cases (which are better handled by retrieval), but essential when the base model's implicit understanding of a domain is materially worse than what your data can produce.

Where it shines

Narrow tasks where a smaller fine-tuned model outperforms a much larger general one. Specialised classification, domain-specific drafting, voice and style alignment, low-resource languages, regulated terminology. Also the path to running smaller models locally for confidentiality reasons.

Where it struggles

Requires labelled data and evaluation discipline. Without a ground-truth evaluation set, fine-tuning is just hoping. Training is GPU-hungry - multi-node cluster typically. Each new base model means re-running the evaluation. Iterating on the fine-tune is a research-shaped activity, not a software-engineering-shaped one.

Infrastructure footprint

Multi-node cluster for full fine-tuning of any serious model. Single-server fine-tuning is realistic for LoRA-class techniques on smaller base models. Memory bandwidth and high-speed interconnect dominate the training cost; inference of the fine-tuned model is modest. Burst load - heavy when training, quiet between cycles.

Common variants

Full fine-tuning · LoRA / QLoRA (parameter-efficient) · RLHF and DPO (preference alignment) · continual pre-training · instruction-tuning · distillation into smaller models

Pattern

Classification and extraction

Classification

Structured output from unstructured input.

The model reads unstructured input - a document, an email, a transcript, an image - and produces structured output. A category, a set of fields, a score, a summary in a defined schema. The pattern that turns AI from a conversational tool into a data pipeline.

Where it shines

High-volume operations where structured output matters more than nuance. Document categorisation, named-entity extraction, intent classification, sentiment scoring, structured summarisation, content moderation. The pattern that integrates most cleanly into existing software.

Where it struggles

Novel taxonomies and ambiguous categories test models harder than open-ended generation does. Evaluation is straightforward - accuracy, F1, recall - but requires labelled ground truth that does not always exist. Bias in the labelling propagates directly into the model behaviour.

Infrastructure footprint

Workstation through single-server is sufficient for most production loads. Throughput-driven - smaller specialised models often outperform larger general ones. Predictable cost: tokens in, tokens out, well-modelled load. The cleanest pattern to budget for.

Common variants

Named-entity recognition · intent classification · sentiment analysis · document routing · structured extraction to schema · multi-label classification · score-and-rank

Side by side

Five patterns, one table.

The shape of each pattern at a glance - the table that ends up screenshotted into the procurement deck.

Pattern	Typical scale	Latency profile	Throughput profile	What dominates
RAG	Single server → small cluster	Sub-second to a few seconds	Per-user, modulated	Embedding + retrieval + LLM working together
Chat	Workstation → single server	Sub-second per token (streaming)	Driven by concurrent users	LLM inference + prompt cache
Agentic	Single server → cluster	Minutes acceptable	Per-task	Orchestration + multiple LLM calls
Fine-tuning	Multi-node cluster	Offline (batch training)	Burst	Memory bandwidth + interconnect
Classification	Workstation → single server	Sub-second	High, batch-friendly	Smaller specialised models

Identify your pattern

Five questions, in order.

Walk down the list. The first question you answer "yes" to gets you to the pattern. If the answer is "yes" to multiple questions, the deployment is a blend - specify the dominant pattern first and treat the others as overlays.

Q01

Does the answer need to come from documents you own?

→ Yes → RAG. The volume of documents and the freshness requirement scope the deployment.

Q02

Does the user expect a conversation, or a single answer?

→ Conversation → Chat. The number of concurrent users scopes the infrastructure.

Q03

Does the task involve multiple steps and tool calls before producing an answer?

→ Yes → Agentic. The number of steps, tool latency, and acceptable failure rate scope the infrastructure.

Q04

Does the base model lack domain knowledge that your data could supply?

→ Yes, after RAG has been ruled out → Fine-tuning. The data volume and evaluation discipline scope the project.

Q05

Is the input unstructured and the desired output a fixed schema?

→ Yes → Classification / extraction. Throughput and schema stability scope the deployment.

In practice

Real deployments combine patterns.

Five patterns is a clean taxonomy. Real deployments are messier. A customer-support assistant is a chat pattern over a RAG backend with a classifier deciding which queries to escalate to a human. An agentic workflow that does invoice processing is agentic at the orchestration layer, classification at each step, and may include a fine-tuned model for the domain-specific extraction.

The taxonomy still helps. Identifying the dominant pattern scopes the infrastructure investment. The overlay patterns add capacity but rarely change the architectural shape. A deployment that is 80% chat with 20% RAG sizes like a chat deployment with extra storage and an embedding pipeline. A deployment that is 80% agentic with 20% classification sizes like an agentic deployment with throughput headroom.

When you describe your situation to an AI consultancy partner, the conversation goes faster if you can lead with the dominant pattern and name the overlays. That is the practical use of this page.

Where this connects

Read these next.

Sizing

Identified your pattern?

Tell us the pattern and the scale, and we'll route you to a partner who has done that pattern at that scale before. The vocabulary is the start; the deployment is partner work.

Back to foundations

Inženirsko hlajenje, komponento po komponento.

Validirani kompleti za velika tower ohišja.

Specifične rešitve za edinstvene zahteve.

RM-4U8G

Five shapes
most AI deployments take.

The pattern decides the infrastructure.

In order of how often they show up.

Retrieval-augmented generation

Chat and assistants

Agentic workflows

Fine-tuning and domain adaptation

Classification and extraction

Five patterns, one table.

Five questions, in order.

Real deployments combine patterns.

Read these next.

AI infrastructure buying guide

Deploying AI in your business

Glossary

Identified your pattern?

Five shapes most AI deployments take.

The pattern decides the infrastructure.

In order of how often they show up.

Retrieval-augmented generation

Chat and assistants

Agentic workflows

Fine-tuning and domain adaptation

Classification and extraction

Five patterns, one table.

Five questions, in order.

Real deployments combine patterns.

Read these next.

AI infrastructure buying guide

Deploying AI in your business

Glossary

Identified your pattern?

Five shapes
most AI deployments take.