What do kitchen knives and LLMs have in common? A guide to the major LLMs

Written by Dave | Mar 18, 2026 4:50:11 PM

If your boss has told you to go use AI to turbo boost your output in compliance and investigation tasks, that’s about as helpful as telling a chef to use a knife. There are many different types suited to different tasks. In this blog, our Chief Product Officer examines the strengths of different LLMS and the challenges that lie ahead for the compliance professional looking to build their AI stack.

What AI is good at — and where humans must stay in control

Before we dive into which LLM is best for which task, let’s take a step back and consider where us mere humans stand tall above the best AI.

✅ Where AI excels	❌ Where the human advantage counts
🔍 Data processing at scale — reads a 500-page prospectus in 90 seconds, without getting bored on page 347.	⚖️ Making final risk decisions — AI doesn't understand real-life consequences, and it doesn't carry professional liability. Humans do.
🔗 Pattern recognition across sources — spots the Cyprus company linked to the BVI entity ultimately owned by a Jersey trust, across thousands of records simultaneously.	🧠 Strategic judgement — AI can tell you a director has 15 other companies. It cannot always tell you whether that's suspicious or just a busy entrepreneur.
🌍 Multi-jurisdictional research — can read documents in every language.	🔍 Spotting what the data doesn't say — context, instinct, and professional experience aren't in any LLM training dataset.

Before you even think about selecting the right Large Language Model, it’s important to have the right mental model: AI is your research assistant, not your replacement. You wouldn't let an intern sign a client letter.

A guide to the major models

Anthropic Claude — Safety-first by design. Claude uses what Anthropic calls Constitutional AI, which means it's built to reason carefully about edge cases, express uncertainty when appropriate, and handle sensitive material thoughtfully.

OpenAI GPT — The market leader. Vast ecosystem, extensive documentation, integrates with almost everything. Not always the best tool for a specific task, but the one everyone has and knows how to use.

Google Gemini — Standout for context window size and Google Workspace integration. Gemini can process enormous amounts of data at once — useful when you need to feed an entire case file into a single session.

Microsoft Copilot — The enterprise incumbent. Already embedded in Microsoft 365, which means it’s often the default choice. Powered by OpenAI's models under the hood and Anthropic, it's strong on document drafting, summarisation, and working across your existing files.

Mistral — The European option. French company, open-source models, deployable on your own infrastructure. For organisations where data sovereignty is non-negotiable — and for this audience, it should be — Mistral is worth serious consideration.

Beyond the major players, there are scores of other few tools worth knowing, here’s just a few of them. Groq isn't a model at all — it's custom hardware that runs existing models at extraordinary speed, making it useful for high-volume work where throughput matters more than analytical depth. Cohere is enterprise-focused with strong retrieval capabilities. Meta's Llama is open-source and customisable for organisations with the technical resource to deploy it. Amazon Bedrock offers a multi-model platform for those already in the AWS ecosystem. Perplexity is optimised for search-style responses — useful for quick lookups, less so for structured investigation work.

The capability vs. cost vs. speed trade-off

A professional kitchen has a knife for every job. The same logic applies here. Before you choose a model, understand the nature of the task at hand.

Task	Recommended model	Why
Initial triage of materials	Claude Sonnet, GPT mini	Speed and volume matter; task is relatively straightforward
Document analysis & pattern finding	Claude Sonnet, GPT-5, Gemini Pro	Needs strong reasoning with structured data
Complex risk assessment	Claude Opus, GPT-5	High stakes; quality over speed
Report generation	Claude Sonnet, GPT-5	Strong writing, proper citation handling
Multi-language research	Gemini Pro, GPT-5	Multilingual capability and cultural context

💡 The pro tip: Use cheaper, faster models for initial review and premium models for final analysis. Don't burn expensive tokens sorting through thousands of search results. Filter first, then bring in the heavy artillery.

Hallucinations, model versions, tokens & temperature…. why managing all of this yourself is harder than it looks

Selecting the right model is only the start. The interface that you have used has a range of tools and optimisations to get the most out of the LLM which do not exist in the API versions of LLMs. To build and manage your own AI stack for compliance investigations, you would need to:

Become proficient in six or more different AI platforms, each with different APIs, billing systems, and capabilities
Know which model to use for each task — and switch between them mid-investigation
Master prompt engineering: the skill of phrasing questions in ways that produce reliable, accurate outputs
Tune "temperature" settings for each query type — controlling whether the model is precise and deterministic or more creative and expansive
Verify every output for hallucinations before it goes anywhere near a risk report
Understand context limits, data management, semantic search, GDPR, and deep source understanding

How DeepDive handles this for analysts and researchers all over the world

DeepDive has been engineered to harness the power of AI so that EDD teams and investigators don’t have to become AI experts. The platform orchestrates multiple AI models automatically — using the right model for each stage of an investigation. Fast, cost-efficient models handle initial search and triage. Higher-capability models handle extraction, analysis, and report generation. Expert prompt engineering is embedded in the platform. Every output is linked to a verified source.

The result? Investigations that take hours rather than days, with a full audit trail and court-ready documentation — without your team needing to become AI engineers.

If you want to put DeepDive to the test email info@deep-dive.com or book a discovery call with the team.

View full post