Imageplus

05OPERATOR’S DIARY

A human in the loop is only oversight if they can say no.

Almost every plan to put AI into a serious system arrives with the same reassurance attached: there will be a human in the loop. It is the right instinct. It is also, on its own, close to meaningless. A human in the loop can be genuine oversight or it can be a person clicking approve nine hundred times a day without reading and the two look identical on an architecture diagram. The difference is in the design and the design is what decides whether you can trust the thing in production.

A secured access gate at an industrial site, the point where a person decides what gets through
FIG · 00A gate only controls anything if someone can refuse to open it. Oversight works the same way.

The reason this matters is that the human in the loop is usually the whole safety case. It is the answer to "what if the model gets it wrong", the line in the risk assessment that lets the project go ahead. So if that human is not actually able to catch the wrong answer, the safety case is fiction and everyone has agreed not to notice.

01What the loop is actually for.

Human-in-the-loop means a person sits inside an automated decision and can change or stop the outcome before it takes effect. The useful version of that idea is narrow. It is not that someone is watching a dashboard and it is not that a log exists somewhere for later. It is that a specific person, with the authority to act and the context to judge and the time to look, stands between the model’s output and its consequence on exactly the decisions where being wrong is expensive.

The EU AI Act puts this into law for high-risk systems. Article 14 requires that they be built so people can effectively oversee them: understand what the system is doing, interpret its output, intervene and stop it.¹ The wording is deliberate about effectively, because the drafters clearly knew the failure mode. A human who is nominally in charge but cannot in practice tell a good output from a bad one is not oversight and the Act declines to pretend otherwise. That is good engineering written into regulation, which is the most useful kind.

02The rubber stamp is the real risk.

The way a human in the loop fails is rarely dramatic. Nobody decides to stop paying attention. What happens is automation bias: the steady, documented human tendency to defer to a machine that is usually right, especially under time pressure.² A reviewer who has approved nine hundred correct suggestions has been trained, gently and entirely by experience, that the model is trustworthy. The nine hundred and first slides through on that trust. The system has quietly converted a person into a button.

I have watched this happen to careful, competent people, which is the point. It is not a discipline problem you can fix by asking everyone to concentrate harder. It is a property of the design. If you route every output through one tired reviewer at the speed the model produces them, you will get rubber-stamping and you will get it from your best staff as reliably as from anyone else. The honest test is simple and uncomfortable: look at the logs and count how often the human disagrees with the machine. If the answer is almost never, the loop is decorative.

If your reviewer never says no, that is not reassurance. It is the sound of a loop doing nothing.

03Designing a loop that holds.

A loop that actually catches things is built on a few decisions and the first is the most important: do not put a human on everything. Attention is finite, so spend it where being wrong is costly or hard to reverse and let the rest run. The high-volume, low-stakes, easily-undone decisions do not need a person standing on each one. The consequential and the ambiguous ones do. Spreading review evenly across both is how you guarantee nobody looks hard at either.

FIG · 01Where the human belongs, decided once per decision type rather than per case.
Is this decision consequential, ambiguous or hard to reverse?
Yes, a person reviews it with context

Show the evidence, the model’s reasoning and its confidence. Make rejecting one click. This is where human judgement earns its keep.

No, automate it and sample

Let it run, log everything and review a random slice to catch drift. A person on every case here just buries the cases that matter.

The second decision is what you put in front of the reviewer. A bare verdict invites a rubber stamp, because there is nothing to engage with except yes. Give them the why instead: the evidence the model used, the reasoning it followed, how confident it is and what it was unsure about. A reviewer shown a low-confidence call with the two facts that made it shaky will actually think. The interface is not cosmetic here. It is the difference between a person judging and a person agreeing.

The third is to make disagreement cheap. If saying no means filling in a form, escalating to a manager and explaining yourself, people will say yes to get on with their day. Rejecting should be one click and it should be the easiest path, not the brave one. And every decision, approved or rejected, gets logged with its reason, which the AI Act also asks for through its record-keeping requirements.³ Those logs are not just for the auditor. They are how you find out, while there is still time to fix it, that one reviewer approves everything and another catches things the rest miss.

04Why oversight is the part that frees you.

It would be easy to read all of this as a list of constraints, a set of reasons to be cautious. It is the opposite. A loop you actually trust is what lets you automate boldly everywhere else. Once you know the consequential decisions have a real person who can stop them, you can let the model run fast and wide on everything that is reversible and low-stakes, because you have put the human exactly where a mistake would have hurt and nowhere it would only have slowed you down.

That is the trade I want every team to understand. Genuine oversight is not the price you pay to be allowed to use AI. It is the thing that lets you say yes to the ambitious version of the system and still sleep, because you have designed the one place where being wrong is unacceptable to be the one place a human is genuinely in charge. Put the person where they can say no and let them mean it. The rest of the system then gets to be as fast and as automated as the work allows.

QUESTIONS ON THIS PIECE

What readers tend to ask.

01What is human-in-the-loop (HITL) in AI?

Human-in-the-loop means a person sits inside an automated decision and can change or stop the outcome before it takes effect. The point is not that someone is watching. It is that someone with the authority, the context and the time can intervene on the decisions that matter. A human who can only watch is monitoring, not oversight.

02What is automation bias?

Automation bias is the well-documented tendency of people to defer to a machine’s output, especially when they are busy or the system is usually right. It is why a reviewer who has approved nine hundred correct suggestions tends to wave through the nine hundred and first without really reading it. Any human-in-the-loop design that ignores automation bias quietly becomes a rubber stamp.

03Does the EU AI Act require human oversight?

Yes. Article 14 requires that high-risk AI systems be designed so that real people can effectively oversee them, understand their output, intervene and stop them. It is explicit that the oversight has to be genuine rather than nominal. The Act is asking for the same thing a serious operator wants anyway: a person who can actually say no.

04How do you stop human review becoming a rubber stamp?

Put the human where the judgement is rather than on everything, give them the evidence and the model’s reasoning instead of a bare verdict, make rejecting cheap and fast, then log every decision so you can see whether anyone ever disagrees. If a reviewer never says no, that is not reassurance. It is a signal the loop is doing nothing.

WRITTEN BY
Oussama Elgoumri

Chief technology officer, Imageplus. Engineering across custom platforms, integration and AI in regulated production. Twenty years building systems organisations own and can run themselves.

ABOUT THIS INSIGHT
Pillar
Operator’s diary
Published
1 June 2026
Read time
8 minutes · 1,340 words
SOURCES
  1. High-risk AI systems must be designed for effective human oversight: understanding, interpretation, intervention and a stop function. EU AI Act, Article 14 (human oversight).
  2. Automation bias, the tendency to over-rely on automated outputs, is a long-studied human-factors effect and the main reason nominal oversight degrades into rubber-stamping. Overview: automation bias.
  3. High-risk AI systems must keep automatic records (logs) over their lifetime so events can be reconstructed. EU AI Act, Article 12 (record-keeping).

CONTACT

Start a conversation.

Tell us what you want to change. We respond within two working days.