Phantom Human-In-The-Loop

Today, the state of most human-in-the-loops for Agents is this.

Caution: Feels Like Security Check. But is a Witness Checkpoint.

The intention is not insult but to use this for diagnosis. Most AI deployments that claim "human oversight" have designed a workflow where the human can't actually oversee. Not “won't”. Literally can't.

Some readers responded to the last week’s edition on Guardrails saying - “we have a human in the loop to cover for the limitation.” This is the status quo. The human in the loop is at times on the customer side, occsionally with the vendor side. The targeted impact is - Compliance should sign off. But like most things in AI - details matter.

Wait, why do we even have a human in the loop (HITL)?

Before we talk about how HITL has a glass ceiling or it breaks, let's step back and ask : why introduce a human in the first place? 3 simple reasons:

1. Who is Liable here? ("The algorithm approved it" isn't yet a defense)

Healthcare, for example operates in a structured low-trust environment by design. We don't trust that providers will always code correctly, that payers will always pay fairly, that data will always be handled properly - so deterministic verification systems with humans in the loop have been deployed. A radiologist reviews an AI-flagged scan and signs off. If there's a missed cancer lawsuit two years later, the signature determines who's liable.

2. Covering Out of Scope Edge Cases (aka systems can't handle ambiguity)

Think of a a fraud detection system flagging a transaction as suspicious. The customer is a small business owner who just received a large payment from a new client. Unusual pattern, but legitimate. The analyst calls the customer, clears it. The system can't process "this is a real business deal." The human can.

3. Human Trust Issues (aka “leadership” isn't ready to trust the algo alone. Having a human just feels safe.)

A hospital pilots an AI triage system in the ER. The model is accurate. But the CMO insists a nurse reviews every recommendation before it routes patients. Not because the nurse catches errors - the override rate is near zero - but because the CMO isn't ready to explain to the board why a machine decided who gets seen first.

The problem: Most HITL implementations conflate all three.

They put a human in the loop for accountability, but design the workflow as if it's for judgment. Or they add a human for trust, but measure them as if they're catching errors.

If you don't know why the human is there, you can't evaluate whether they're succeeding.

Unfortunately, most AI Stacks today aren’t designed to match the objective.

Think about how clinical documentation happened before AI scribes. Doctor sees patient. Doctor writes note. Doctor signs. One person, one artifact, one accountability chain. The human wasn't "in the loop" - the human was the loop.

Today? Ambient AI scribe joins the visit, drafts the note for doctor to review. Doctor signs. Feels like human-in-the-loop. the provider’s role has changed. They've gone from author to reviewer and approver. Patient chart says "physician-reviewed and signed." Suddenly, the workflow treats the physician as quality control on an assembly line.

If the human is there for accountability, but the system doesn't provide an audit trail - they can't do their job.
If the human is there for judgment, but they're reviewing 50 decisions/hour - they can't do their job.
If the human is there for trust, but there's no exit criteria - they become permanent overhead.

When the system design doesn't match the objective, you end up with Phantom HITL.

Meet - Phantom HITL

Phantom HITL is when the human “guardrails" is present on paper but not functioning in practice. Chart says "human-reviewed." The compliance deck says "expert in the loop."

A simple way to test this: Look at how often your human reviewer actually changes, flags, or rejects the AI's output.

If the correction rate tracks the expected error rate - say, the system is 90% accurate and the human corrects roughly 10% - you might have real oversight. The human is catching what the system is missing.

If the correction rate is near zero but the system isn't near-perfect — you have Phantom HITL. The human has stopped catching errors. Not because they're lazy. Because the volume, the pace, or the cognitive load made catching impossible.

And if you don't measure correction rate at all? Assume phantom. Remember Claire Hast's story from last edition — the AI scribe that fabricated findings to make the note look complete. The provider’s action was not negligence. AI implementations are unintentionally tuning systems for throughput, not verification.

Throughput beats accuracy. Every time.

So, how do teams designing HITL that doesn’t fail

3 Guidelines:

Define the objective.
Design the system to that objective.
Measure against that objective.

Accountability

Human's job: Certify that the process was followed and be liable.

Human Needs:

Clear audit trails - Policies, AI’s decision, Actual reasoning and provenance (What came from where and was it complete)

What NOT to do: Ask the human to verify substance. That's judgment, not accountability.

Judgment

Human's job: Decide the hard cases and override per context demands.

Human Needs:

Conflict surfacing - Where do the rules disagree? Incomplete Data incomplete? The human shouldn't hunt for ambiguity - the system should present it.

What NOT to do: Present a finished artifact and ask "approve or reject?" That's accountability framing with judgment expectations.

Trust:

Human's job: Confirm that the system works, to remove the checkpoint.

Human Needs

AI Alignment rate - AI vs Humans. Disagreement logs - when the human overrides, why?

What NOT to do: Treat this as permanent. If trust is established and the human remains, you've built dependency theater.

Human Oversight As Guardrails - Diagnostic

Stress-test whether your human oversight is real or phantom.

cogniswitch.ai/human-oversight

HITL as Guardrails Implementation Checklist

Once you've defined the objective for HITL, stress-test the design with three questions:

1. Cognitive load - How many “items” per decision?

It’s critical to not load the expert in the loop to NOT cross-reference 10 cited documents for every AI output. The goal is a review/audit - not deep research.

2. Liability exposure - What breaks if this review fails?

Think sales rep sending an awkward email vs a hallucinated medication entering the EMR? Important to scale or match the redundancy to the risk.

3. Explainability burden - Does it enable the human to verify why it's correct?

AI outputs often looks and feels correct. The goal is to vet whether it is correct. It’s important to enable to expert to trace the logic - which policy, which version, which criteria.

The Two Paths Out

Now the question changes - If your architecture fails these checks, where do you go?

Most teams today are top-left. Bottom left at best. Human reviews somethings or everything. That's Phantom HITL.

Bottom-right is the goal. Architecture-First. The system handles routine decisions. The human only sees exceptions - low volume, high context, discrete decisions. That's Waymo. Remote operators for rare edge cases. A human who can actually loop.

Two paths to get there:

Path 1: Change the architecture.

This is also the case for designing deterministic systems. Goal is to reduce what the human reviews and make sure that the system surfaces conflicts before they reach the human. Verification becomes deterministic, not investigative. This allows human to confirm, not reconstruct.

Path 2: Change the oversight model.

Caution - This impacts implementations. Adds delays and triggers ROI questions. Goal is to move from top to bottom. Real-time to time-buffered. Single reviewer to multiple. Review everything to sample and audit. Think FDA drug approval. Months between stages. Multiple independent reviewers. No single human's attention span is the last line of defense.

We want to be top-left and somehow get bottom-right safety.

What you (cannot and) should not do is ship with an architecture that fails these tests and call the human a guardrail.

That’s just unfair to the human(s)🙂

If you've read this far, let me know. Just reply "got it." If something didn't land, tell me that as well. The fun part is figuring out how to say complicated stuff in a way that actually sticks.

And if you liked it - maybe send it to someone who's been told "don't worry, we have a human in the loop"?