Context HandOffs

I was perhaps a bit too harsh in my assessment of the state of enterprise AI as being miserable. Enterprises are actually taking AI applications and agents to production. The deployments have moved forward with the help of their RAG stack and Human-in-the-Loop, in silos though. So that brings us to the question of what happens when these agents start asking other systems of records (or systems of action, as they are now being called) for context? Hence ELI5

Running ops at at my last company, I learned one thing fast: as humans - we are terrible at honouring internal contracts. SalesReps won’t update the CRM. Onboarding team won’t get their deliverables. Engineering won’t get full context of bugs from QA. You get the drift. These gaps surfaced up in weekly reviews. War rooms. People pointing fingers. Mostly ugly faceoffs. It was.. discomforting but effective in one specific way: there was always a team/individual who was held accountable. That essentially was the contract.

The Holy Land Of AI & The Promise Of No Admin Work

A big reason (me included) teams feel relieved because they don’t have update the system of record. Elimination of admin work is the #1 use case. Sales Ops, Healthcare Admin, front desk admin - massive opportunities and action. Most orgs now have a range of AI tools in production chasing outcomes. Three to five AI tools. Each being a RAG + Human review stitched. Can feel like duct taping but hey - it works. Saves time. This is progress. Even though in silos.

Everything Looks Good, Right?

The question is what happens when agent-first systems start demanding context from other agents/ AI applications? Classic example:

Account Manager prepping for QBR - Needs all support tickets, contract terms, implementation deliverables, goals set during sales hand off, pending invoices, new relevant features and more. Outsources this to ClaudeCode and it will pull whatever SupportDesk AI provides, CLM AI tells, and GongAI has documented. Remember - there is always a response. Doesn’t matter - if it’s correct, complete or is completely fabricated.

Higher Stakes? Look at AI scribes. Simple job: listen, summarize, and generate the note. The physician holding the liability, reviews and signs. A 2025 study in npj Health Systems evaluated 208 AI-generated chart summaries. Physicians flagged omissions in nearly a third of them. The question is what happens when physician signs anyway? The incomplete note moves into prior auth, triggers a denial. Denial lands days later, abstracted entirely from its source. Denial rates becomes a lagging metric.

When Agents collate context from five systems, and there are not audit trails - there is no way to know what was passed, what was dropped, and which system owns the error. One agent trusts another the way you'd trust a rumor. This would feel like a bug at one AI implementation.

Poor context handoff across 5 AI systems - means you start bleeding all your gains.

Wait, What About The Human In the Loop?

Let’s think about airport security checkpoint. A human reviews every bag and the conveyor runs at that reviewer's pace, not the queue's. That's the contract. At production volume, the agent pipeline flips that relationship. Fifty outputs queued. Another agent waiting on input. The reviewed doesn't get faster, infact they get bypassed. At that point, nobody knows what’s happening. Did the agent proceed on incomplete context? Did it fabricate? Did it miss critical detail? No audit trail, no answer. Most enterprises today risk the conveyor setting the pace.

❝

Why are humans bad at umpiring? does this happen and how should one wrap their head around this?

Daniel Kahneman called it substitution. When a hard question is too difficult to answer, the mind simply replaces it with an easier one. I have referred to this frame earlier as well - because nothing else describes the failure mode as cleanly. The hard question: was the context passed between these agents accurate, complete, and verifiable? The easy question: did the system respond with a clean and structured output? Yes.

Humans-on-the-Hook?

When the human-in-loop can’t see citation, source, or provenance, against the agent’s output, there is NO review. They can only re-do it, go back to the source and re-verify manually, do the work the agent was expected to do. That's not human-in-the-loop. That's human-on-the-hook.

Context HandOffs - What is it and How Does It Work?
A context handover contract has three components. All three are structural requirements.

Consistency: the same query, routed through the same context, produces the same answer. Every time. Not most times.
Traceability: every output carries its source. The specific document and section, not a summary or a confidence score.
Completeness: nothing critical was dropped in transit. The contraindication made it through. The pricing exception made it through. The unresolved onboarding issue made it through.

Remove any one of these and the contract fails. And the current breed of agents don't produce uncertain output when context is incomplete. They produce confident, well-structured output, every time. The gap is invisible until something downstream breaks completely.

Who will inherit this problem? Vendors? IT Teams?

At some point, the support tickets will start tricking in. IT gets the ticket. IT calls the vendor. The vendor pulls the logs. The logs show the agent responded. Status 200. Output clean. No errors recorded. That's where the conversation ends, because that's all the trail there is. There's no root cause analysis, no attribution trail. Nobody can explain which handover dropped what, when, or why. Everything looks clean.

This is the problem IT leaders are about to inherit. It won't come from negligence or bad vendors. It will come from an architectural limitations that limit context handoff between agents. Every handoff needs consistency, traceability, and completeness baked in before the first agent talks to the second. Before the denials spike and the tickets pile up.

Wait, we have been here before?

The context handover problem isn't new. You've just been calling it something else.

Context Handover - Across Different Systems

Application-Centric - the legacy you know well. Every application owned its own data. CRM had the customer record. Support desk had the tickets. Finance had the invoices. Integration meant passing data between systems and hoping it arrived intact. It usually didn't. That's why your ops team spent half their time chasing context across disconnected tools but you could always trace back to what worked and what didn’t.

Agentic Pipeline - this is where most orgs are today. Technically, teams swapped the applications for agents. The pipeline got “smarter”. But because “context” is still owned and defined per-agent, it’s still passed as a flat, unstructured payload, still unverifiable end-to-end. The agents are more capable than the apps they replaced. The structural problem is identical. Except now, nobody knows which system dropped it.
Data Product First Approach: This is one approach that solves it. Context defined once, governed centrally, served with full provenance at every handoff. The same query produces the same output. Every output carries its source. Nothing drops in transit. The handover contract is basically baked in.

Most organizations are in stage two and don't know it yet. The systems are working. The QBR decks are landing. The notes are getting generated. Everything looks fine - until it doesn't.

The Questions Worth Asking Your Team and Vendors
Whether it's a vendor system or something you built internally, these 4 questions hold:

1. How is context defined? Not in theory - in the actual payload that passes between agents. Is it a document? A summary? A structured object with source metadata attached?

2. How does source context and citation travel through a handoff? If Agent A retrieved something from a policy document, does that attribution reach Agent C - or does it get summarized away at step one?

3. How do you ensure completeness? What structural check verifies that the contraindication, the pricing exception, the unresolved issue, actually made it through - not just that an output was returned?

4. How do you ensure consistency? If you run the same query through the same context tomorrow, do you get the same answer? If not - does anyone know?

To be fair - these aren't abstract architecture questions. They will help you differentiate between where AI gains actually compound vs one where you bleed away everything.

If you read this far, let me know if this landed. Just reply "got it." If your team has already worked through these and I've overstated the problem, I'd genuinely like to hear that too. That's usually where the better conversation starts.

And if someone on your team is making AI architecture decisions right now - forward this before the next deployment conversation.

Best

Vivek

Context HandOffs

The Holy Land Of AI & The Promise Of No Admin Work

Who will inherit this problem? Vendors? IT Teams?

Wait, we have been here before?

Reply

Keep Reading

The Signal