The naive answer is a threshold. The naive answer fails.
The first version of refusal that any team builds looks the same. Set a similarity threshold. If the question, embedded into the same vector space as the corpus, doesn't return a passage above that threshold, refuse. If it does, answer.
This works on the easy adversarials and breaks on everything interesting. The reason is structural. A similarity score is a single number. It cannot tell the difference between two failure modes that look identical to a threshold:
The first failure mode is a question that is genuinely outside the corpus — weather, sports, programming, a different jurisdiction, a topic the system has no business answering. A good system should refuse these every time.
The second failure mode is a real, in-scope question that happens to be phrased awkwardly — using vocabulary the corpus doesn't, framing the question through a scenario instead of a clause, asking about an edge case where the relevant passage scored slightly low. A good system should answer these every time.
To a similarity threshold, both look like "low score." Set the threshold high and the system refuses real questions; set it low and the system answers questions it has no basis for. There is no number you can pick that separates the two cases, because the distinction between them is not numeric. It is semantic. It is about whether the retrieved passages actually support an answer to the question being asked.
The architectural pivot
What works, in our experience, is to stop asking "is the score high enough" and start asking, of the system itself, "given this question and these candidate passages, can the question be answered from this evidence?"
That sounds tautological. It isn't. It means using a small extra model call — a judge — whose only job is to read the question and the top retrieved passages and decide whether the passages contain an actual basis for answering. The judge has three options: yes, partial, no. If it says no, the system refuses. If it says partial, the system can choose to refuse or to answer with a hedge, depending on how strict the deployment needs to be. If it says yes, the system answers, citing the passages the judge based its verdict on.
This is corpus-intrinsic. The judge isn't asking "is this question about the right topic"; it is asking "is the answer to this question in the evidence in front of me right now." That phrasing matters. A system that refuses based on topic will false-positive on legitimate cross-topic questions — a tax corpus that legitimately overlaps with corporate law, a medical corpus that legitimately overlaps with pharmacology. A system that refuses based on grounded-evidence does not have that problem. It refuses when there is no evidence, and answers when there is, regardless of how exotic the topic surface looks.
The system that refuses cleanly on the questions it can't ground — and only on those — is the system a practitioner can finally build a workflow around. Everything else is a research toy with a confidence problem.
Why this is hard to talk yourself into
The product temptation, when you ship a system that refuses, is enormous and constant: can't we just answer it. The refusal feels like a failure even when it is the system working as designed. Stakeholders will see a refusal in a demo and ask why the AI couldn't "just try." Sales will hear refusal counts and worry about answer-rate metrics. The fastest path to a quieter conversation is to lower the bar, let the model take a shot, and call it.
The reason to hold the line is that the refusal isn't the failure mode — the wrong answer is. A system that refuses honestly produces zero hours of unwound work for the practitioner downstream. A system that confidently produces a fabricated reference produces a chain of consequences: the practitioner uses the answer, builds advice on it, gets caught by the regulator or the client or the next practitioner who reviews it, and the trust in the entire tool collapses. The cost of one fabricated answer in a high-stakes domain is larger than the cost of fifty refusals.
Once you internalise that arithmetic, refusal stops looking like a fallback and starts looking like the most useful behaviour the system has.