A local LLM on our own hardware, who benefits?

Question

Accepted Answer

1. Who benefits: organisations contractually unable to send data outside their own server room (defence supply, some health, some legal).
2. Hardware: a 70B-class model (Llama 3.x 70B or similar) wants at minimum 1x H100 80GB for FP16; with quantisation (FP8/INT4) it runs on weaker GPUs at a quality trade-off. That is investment, not subscription.
3. Operations: model updates, monitoring, security patches. Budget half an FTE for ongoing operations if you mean it.
4. Quality: open-source models can compete with earlier-generation closed-source models on specific tasks after fine-tuning, not automatically general purpose.
5. Run the business case first. Many parties asking this never even checked Azure OpenAI with EU residency under a DPA.

When to bring us in: 
Pre-investment check before buying hardware: we run the TCO compare between local and cloud, saves a costly misstep.

A local LLM on our own hardware, who benefits?

Try this first

When to bring us in

See also

None of the above fits?

Who are you?

Or skip the DIY entirely