The anatomy of an LLM response…what compliance professionals need to know is happening under the hood

Written by Dave | Apr 16, 2026 2:18:05 PM

You typed a search subject into ChatGPT, Gemini, or Copilot. The output looked surprisingly thorough, a structured summary that you thought could actually work.

But before you use that output as the basis of a compliance decision, it's worth understanding what was happening behind the consumer interface. Because the pipeline that produced it was making a series of invisible decisions — and for compliance purposes, those decisions matter.

What happens between your question and the answer?

When you type a query into ChatGPT or Gemini, you are not talking directly to the AI model. You're using a consumer product built around it. Between your question and the response sits an invisible process — and that process makes a lot of decisions you can't see and can't configure.

🖥️ What you see	🔧 What's actually happening
You → Chat box → Query → Answer with citations	You → API Gateway → Safety Filters → Orchestrator → Prompt Templates → Tool Router → Web Search → Page Fetcher → Content Extractor → Token Truncator → Context Manager → Summariser → Reasoning Model → Output Safety Filter → Citation Formatter → Answer

Every step in that process makes decisions about what to include, what to compress, and what to discard. None of those decisions are visible to you. None were designed with regulatory defensibility in mind.

Data compression and data loss

Every word read by an LLM is processed as a "token", a unit of text roughly equivalent to a word. Models have a limit on how many tokens they can hold in memory at once, this is their context window. When a consumer LLM visits a web page during a research task, it cannot hold the full text within that limit, so it compresses.

A several thousand word article about your subject becomes a fraction of its original length. The specific dates, the business associate names mentioned, the corporate relationship referred to later in the source — these can be the tokens that get discarded. For basic research, that loss could be acceptable. For compliance work, those details are often the entire point.

The hidden system prompt

Before the model looks at your query, it receives a set of hidden instructions, called a system prompt that defines how it must behave. In consumer AI tools, this system prompt can run to roughly 20 pages of rules the model must follow before it engages with your question.

Some of these rules create friction for compliance work:

LLM consumer behaviour	Why it matters for EDD
Avoids making potentially defamatory statements about real people.	Adverse findings are exactly what EDD exists to surface — the model hedges what it should state clearly.
Cautious about politically sensitive content involving real people.	PEP investigations require direct reporting of political exposure — caution softens findings that could be facts.
Not designed to reproduce substantial passages from source material	You cannot verify whether a source actually says what the model claims it says

These rules exist for good reasons in a consumer context. They create friction in a compliance context. The model has been instructed to be cautious about exactly the things you need it to be direct about.

The reproduction limit is worth dwelling on. Consumer LLMs are not designed to reproduce substantial passages from source material. If you ask what a court judgment says about your subject, you will get a paraphrase. If you ask what a regulatory notice says, you will get a summary. The passage that supports the finding is not something the tool is built to show you — which means you cannot verify whether the source actually says what the model claims it says.

What this means in practice

The practical consequence is that consumer LLM outputs make poor compliance evidence. They are useful for orientation, for getting a quick picture of a subject before a deeper investigation. They are not suitable as the investigation itself. A report you cannot fully source, produced by a process you cannot describe, through a pipeline you cannot audit, is not a defensible basis for a risk decision.

How DeepDive approaches this differently

DeepDive uses the same underlying AI models, Claude, GPT-4, and Gemini, but without the consumer guardrails. The platform is built specifically for investigative and compliance work: source data extraction tuned for surfacing adverse findings, prompts tuned for compliance objectives rather than consumer caution, every output linked to a verified, archived source. The pipeline is visible, auditable, and designed from the ground up for regulatory defensibility.

If you want to try DeepDive in action on one of your cases, email info@deep-dive.com or book a discovery call with the team.

View full post