Is native LLM 'deep research' deep enough for due diligence?

Every major AI platform now offers a 'deep research' mode. You give it a brief, it browses the web, and returns a structured report with sections, sub-headings, and citations. It reads like something an analyst might produce. If you're an AML analyst or investigator, this blog covers where these tools fit, and where they don't, when the work is due diligence on a named individual.

What is deep research?

Deep research is an impressive piece of engineering. When you submit a query, the tool doesn't simply search Google and paste back the top results. It breaks your question down into sub-tasks, runs multiple searches, reads the pages it finds, reasons about what it has read, identifies gaps, runs further searches to fill them, and synthesises everything into a structured report. The whole process runs autonomously and can take several minutes to complete.

Each major platform has its own architecture. ChatGPT Deep Research uses a reasoning model optimised for web browsing, which plans its strategy before it begins and adjusts as it goes. Gemini Deep Research leverages Google's large context window alongside native integration with Google Search. Claude Research uses a multi-agent approach, where a lead agent coordinates parallel sub-agents running searches simultaneously.

The outputs these tools produce are genuinely useful for background reading, competitive analysis, summarising a regulatory landscape, or getting up to speed on an unfamiliar jurisdiction. Deep research saves hours of manual work. That is what it was built for.

The question for investigators and compliance professionals isn't whether deep research is impressive. It is whether the process behind it produces reports that are accurate and complete enough to rely on for due diligence on a named individual. Here are four characteristics of LLM deep research — limited search coverage, no entity resolution, probabilistic citations, and lack of audit trial — that can make the difference between insight and proof.

The coverage question

When a deep research tool returns its report, there is no visibility into which search queries were run or how it decided what was relevant.

Within the pages a deep research tool discovers, context-window constraints force the system to sample rather than exhaust. Many candidate URLs are identified, only a subset are fetched, and those are frequently read in truncated form rather than in full. The model may find the page, open the page, and still not read the part that brings material insight.

The consequence is straightforward: adverse information that exists on pages the research could have reached, can be absent from the deep research report. A clean-looking report does not always mean the subject is clean, it means the search stopped before it reached the material that could have said otherwise. For EDD this matters because risk usually doesn’t sit on the first page of Google results, it often sits in the local newspaper report, the regional court listing, the filing buried in a foreign-language disclosure.

Deep research tools rely on limited retrieval pipelines. They query a single search index, default to English language queries and do not systematically run native language searches on regional engines where much of the material insight is to be found.

For a subject with connections in Russia, China, Korea, or the Middle East, that is a material constraint. Corporate filings, court records, and local media often exist primarily in languages other than English and on platforms other than Google or Bing. Information available through native-script searches on regional engines such as Yandex, Baidu, or Naver is not reached.

Combining multiple search engines typically produces a 30–40% uplift in unique URLs discovered — which gives a sense of how much single-pipeline retrieval leaves behind.

The entity resolution fail

Deep research tools do not perform entity resolution. They search, read, and synthesise, treating what they find as if it relates to the same individual. They do not build profiles, cross-reference affiliations, or confirm subject identity across sources.

The effect is visible in two directions. In one, an embezzlement conviction belonging to a different person with the same name ends up in the report. In the other, the deep research doesn't pick up on name variations — Bob, Robert, and Bobby are the same individual, but a tool without entity resolution treats them as three different people, and findings against the subject under a variant name are missed.

False attribution and false omission are both risks when deep research is used without a targeted entity resolution phase. False positives annoy your team. False negatives destroy your firm.

The same gap extends to sanctions, PEP, and watchlist screening. Deep research tools may incidentally encounter mentions of sanctions in news coverage, but they do not systematically screen a subject against OFAC, HMT, EU, or UN consolidated lists, nor against PEP databases. A deep research report on an actively sanctioned individual may say nothing about the sanction, because the tool was never checking the list in the first place.

The citation risk

When a regulator asks to see methodology, the best answer is: what searches were run, what pages were retrieved, what was on those pages at the time, and how the subject was identified. Deep research tools don't provide this. The research phase is a black box. A link that resolves today may return different content tomorrow, or nothing at all. The source as it existed now of investigation is not preserved.

This matters for more than regulatory response. It is the reason the first three questions above are consequential rather than manageable. Missing sources could be identified if you could see which searches were run. Wrong-person attributions could be caught if you could see how the subject was matched across sources. Fabricated citations could be flagged if each claim could be traced back to the source actually retrieved. Without visible working, the reviewer of the report is not in a position to catch the errors the system has made — and the whole point of commissioning the report is that the reader needs information they do not already have.

An investigation report is not just a document. It is a record of a process. When a client, regulator, or court asks you to demonstrate that your investigation was thorough, they are not asking what the report says. They are asking you to prove how you reached those conclusions, what you considered, and what standard you applied.

How DeepDive is engineered for investigations

Coverage: Seven search engines querying in native language and script, with full visibility into every search executed and every source retrieved.

Entity resolution: Purpose-built entity resolution that verifies your subject across sources and tests name variations deterministically, with structured screening against sanctions and PEP datasets as a foundation rather than an afterthought.

Citations: Every source in a DeepDive report is a page that was actually retrieved and preserved. Sources are not generated.

Audit trail: Every search term, language, engine, and decision is recorded, along with the rationale for why each source was included or excluded. This is the difference between "we searched" and "we can show you what we searched, what we found, what we rejected, and why."

If you want to try DeepDive in action on one of your cases, email info@deep-dive.com or book a discovery call with the team.

Book a discovery call

Blog

Is native LLM 'deep research' deep enough for due diligence?

The coverage question

The entity resolution fail

The citation risk

How DeepDive is engineered for investigations

Use Cases

Use Cases

Recent Blogs

What do kitchen knives and LLMs have in common?

What regulators see when your firm uses DeepDive for EDD

How DeepDive multiplies EDD capacity without additional headcount

All Rights Reserved | Website Privacy Policy