AI Compute Configurator

Describe what you want to run in plain terms and get a sensible setup to plan around — an example build, a rationale, a cooling kit and BOM outline. A planning resource, not a quote.

1 · Your situation

Why are you looking at running AI in-house?
Who uses it?
What do you want it to do?
How capable does it need to be?

Answer the above and the next questions appear here ↓

Not ready to register? Prefer to talk it through?

Your reason sets what “good” means

The motivation behind bringing AI in-house decides which trade-offs are worth making. The same hardware can be a smart buy for one reason and a poor one for another, so we use this answer to frame your result — which benefits to highlight, and which conflicts to flag.

What each reason optimises for

  • Cut cloud cost — turns the decision into a utilisation question: owned hardware only wins once you use it enough; below that line, cloud or hybrid is the honest answer.
  • Keep data private — makes “data never leaves your control” the priority, and we'll flag anything that sends it off-site.
  • Latency & control — favours predictable, local performance; it matters most for real-time or customer-facing use.
  • Cloud capacity limits — guaranteeing availability when GPUs are scarce or quotas bite.
  • Just exploring — biases toward a smaller, flexible starting point you can grow into.

How it shows up in your result

This answer doesn't change the number or type of GPUs — that's driven by what the system does and at what scale. What it changes is emphasis: a privacy answer sharpens our data-residency guidance and catches the burst conflict; a cost answer frames the result around utilisation. Think of it as pointing the recommendation at what you actually care about.

This could be a product, not just an internal tool

It's easy to assume in-house AI is only an internal efficiency play. But the same hardware that runs your team's assistant can power a feature you ship to customers — and that possibility is worth weighing before you size anything, because it changes the bar the system has to clear.

Internal vs. customer-facing

  • Internal — can tolerate variable speed, the occasional restart, and simpler support; your team adapts.
  • Customer-facing — turns uptime, consistent latency, and scaling under unpredictable outside demand into first-class concerns: usually redundancy, burst capacity, and a higher care tier. Real costs, but also a path to AI that earns revenue rather than just saving it.
  • Both — design to the customer-facing bar.

Why we ask early

This answer doesn't size the build on its own — it gets you thinking about availability and scale before the detailed questions, and it primes the later steady-or-spiky question (customer traffic tends to spike). It also tells your SI partner whether they're helping you build a tool or launch a product.

What the machine does decides what it needs

This is the single biggest driver of which type of GPU you need. Different jobs stress hardware in completely different ways — a chatbot and a video generator can want opposite things from the same budget.

How each kind of work loads the hardware

  • A private assistant over your documents (chat/RAG) — mostly memory-bound and served by parallel copies; it scales with how many people use it at once.
  • Image or video generation — runs in parallel batches and is throughput-hungry, so it wants several GPUs even for a single user.
  • 3D rendering / VFX — similar batch-throughput work.
  • Not sure / mixed — assumes the flexible professional tier, a safe default you can refine later.

What it sets in your build

Your choice here selects the GPU tier path and sets a floor on card count for the batch-heavy jobs (image/video and rendering parallelise whether or not humans are waiting). It's also the thing you should benchmark with your own real tasks before buying — the workload is what determines true per-GPU throughput.

“Capability” really means memory footprint

Model “size” sounds like a quality dial, but for hardware it's really a memory measurement: how much GPU memory the model needs to load and run. Bigger models are more capable — and past a point they need not just a bigger card, but several cards pooled together.

What the three levels mean (today)

  • A focused single-task tool — small and fast, often a single GPU.
  • A capable general assistant, ChatGPT-like — the mainstream sweet spot, where most teams should start.
  • Today's strongest open models — top quality, but they demand rack-class hardware.

The parameter ranges behind these (≤13B / 13–70B / >70B) are fine print, and they're a snapshot of today's open models — the frontier moves quickly.

Why “best quality” lands you in a rack

The strongest models need roughly four enterprise GPUs (H200-class) pooled over a high-speed link just to load — that's why choosing top capability puts you on the rack rail even if only one person uses it. It's a memory requirement, not a crowd-size one.

A snapshot — usually good news

Because the named cards are today's reference, your SI partner re-checks them against the live model landscape at build time. And efficiency techniques — quantisation, mixture-of-experts designs — increasingly let a big-feeling model run on lighter hardware, so you may land below this estimate, not above it. Treat the recommendation as a current, conservative reference point.

Running a model and training one are two different machines

A common assumption is that “custom AI” needs a training supercomputer. Usually it doesn't. Serving a model and training one are different workloads on different hardware, and most teams only ever do the first.

Three levels of making it yours

  • Run existing models — needs nothing extra; your build just serves them, and techniques like RAG and good prompting get you surprisingly far.
  • Occasional fine-tuning — fits on the same build using spare GPU memory, no separate system.
  • Continuous training as a core activity — the one case that needs a separate, maximum-spec, multi-node cluster, sized to budget rather than to a fit target.

Why training is separate

Training is bursty, enormous, and periodic; sizing your everyday production machine around it means paying for capacity that sits idle most of the year. So if training is core, we scope that cluster with you as a separate inquiry and leave your production build — the one that runs the tuned model day to day — unchanged.

Cards scale with simultaneous load, not headcount

The number of GPUs tracks how many people hit the system at the same time and how hard each session leans on it — not how many people have access. Fifty people who glance at it twice a day and fifty people running it continuously are very different machines.

More cards usually means more copies

For most work, adding cards adds replicas — extra copies of a model that already fits, each serving more users in parallel. That's different from needing a bigger model, which requires cards pooled together to hold one large model. More users → more replicas; a bigger model → pooled cards. The configurator separates the two.

Why we ask intensity separately

“How many people” sets a starting count; “how hard they lean on it” can move that count a long way. A model that's glanced at a few times a day and one that's hammered continuously — agents running, long documents in flight — can differ by roughly double the cards for the same team size. Long contexts cost far more than short questions, so a constant-use answer nudges the starting count up; your SI partner tunes the exact figure against measured load.

Why we ask in people, not numbers

You know your team, not your tokens-per-second. So we ask in plain terms and infer a sensible starting count — but it is a starting point. Size owned hardware to your steady load and cover the peaks separately — own your floor, lease your peak.

Own your floor, lease your peak

The most expensive way to size hardware is for your busiest hour. If your demand spikes — a holiday surge, a quarter-end close, daily timezone peaks — buying enough to cover the peak means costly metal sitting idle the rest of the time.

Steady vs. spiky

  • Steady and predictable — sizing is simple: match the hardware to the load.
  • Spiky — we size your owned build to the steady floor and cover the peaks with partner burst capacity you don't own and don't pay for year-round. The owned baseline never shrinks; burst sits on top of it.

The privacy catch

If your reason for coming in-house is privacy or compliance, bursting is a genuine tension: sending overflow to a partner means your data leaves the hardware you control — the very thing you're trying to avoid. We flag this directly when it applies. The usual answers are to cap at your owned ceiling and queue the spikes, or to use a certified, region-appropriate burst partner under a data-processing agreement. Either way it's a deliberate choice, not a default.

The room is a real constraint

Where the system physically lives decides its form factor — and surfaces the limits people most often underestimate. Power, cooling, and noise are hard ceilings, not afterthoughts.

From desk to data center

  • Quiet office / under a desk — a tower, but even towers get loud under load; a 3–4 GPU tower in a quiet room is a genuine noise problem (and that assumes proper water-cooling; air-cooled machines are loud at as few as two cards).
  • Workshop, studio, or louder office — noise-tolerant, so towers are comfortable.
  • Server room or data center — puts you on the rack rail, where 5–8 GPU and enterprise builds belong.

The limit nobody checks: power and cooling

A full eight-GPU build can draw around 10 kW — as much as several kitchens at once. Many rooms people call a “server room” can't deliver that circuit or remove that much heat. Power (available amperage, single- vs. three-phase) and cooling are the most common real-world blockers, so confirm your facility can actually host the build before committing to a class. Your SI partner checks this with you as part of the handoff.

Why we ask but don't gate on it

Compliance rarely changes which GPUs you need. It changes where data can live, who can touch it, and how it's audited — implementation questions handled with your SI partner, not a hardware switch. That's why this step is skippable and doesn't alter the build.

What compliance actually constrains

  • GDPR / EU residency — keeps data in-region.
  • DORA / NIS2 — operational resilience and incident reporting.
  • EU AI Act — adds model governance and documentation.
  • Healthcare / GxP — validation and audit trails.
  • Defense / classified — implies air-gapping and clearances.

None of these change the card count — they shape the surrounding architecture and process.

What it does in this tool

Telling us isn't pointless even though it doesn't move the build: it tunes the data-residency guidance, rides along in the handoff so your partner arrives briefed, and sharpens the burst-conflict flag for regulated work. One honest limit: choosing something like Defense or GxP doesn't mean this self-serve tool fully scopes it — those carry requirements the configurator can't express, so treat your result as a starting point your partner takes further. Often, owning the hardware is itself the compliance win: your data never leaves your control.

LM TEK d.o.o. · Pod Lipami 10 · 1218 Komenda · Slovenia

Get in touch

Partner with LM TEK

Request information

We will respond within two business days. Your details stay with LM TEK and are not shared with partners until you confirm the introduction.

Request a quote