Most local-AI tutorials start the same way. Install one tool, pull a model, type a question, watch the answer stream back. Five minutes, working chatbot, you're sold.
The trouble is what happens next. You commit. You write the integration. You build a real workload on top of the friendly default — and somewhere between the proof-of-concept and the production run, the runtime that made the demo so easy starts costing you in places the demo never touched.
A runtime tuned for the chat-window demo is not the same runtime you should put under a batch pipeline. They look the same on day one. They diverge violently the moment the workload changes shape.
This is a quick field note about what that divergence looks like, and the mode-shift we made when we hit it.