Skip to content

How DeepDive uses Entity Resolution to eliminate false positives

After DeepDive's NLP digests and structures information from hundreds of sources, a critical challenge remains: ensuring all this data actually pertains to the correct person. This is where our sophisticated entity resolution systems come into play—filtering out false positives and building an accurate picture of your investigation subject.

Compliance analysts and MLRO’s will know all too well the task of disambiguating entities with the same name, in particular, when PEPs and Sanctions matches often yield false positives. DeepDive eliminates the painstaking swivel chair multi-screen challenge of cross- referencing between multiple sources.

The challenges of manual entity resolution

Here's what makes manual disambiguation so challenging for compliance teams and investigators:

  • Name variation and commonality: Many names are shared by hundreds or thousands of individuals worldwide.
  • Limited context: Individual sources often provide insufficient details to disambiguate with certainty.
  • Cross-language confusion: Name variations across different languages and alphabets complicate entity identification.
  • Shifting identifiers: People change roles, locations, and affiliations over time
  • Resource constraints: Thorough verification across multiple sources is time-intensive These challenges often result in either false positives (including information about the wrong person) or excessive caution (excluding potentially relevant information due to uncertainty).
DeepDive's intelligent entity resolution system addresses these challenges through five key capabilities:

 1. Graph-based clustering. DeepDive's proprietary entity resolution performs network link analysis to group related mentions:Entity resolution

  • Pattern recognition: Identifies consistent patterns that indicate the same individual across sources
  • Attribute matching: Compares names alongside locations, dates, affiliations, and other identifiers
  • Network analysis: Maps relationships between mentions to determine likely matches
  • Outlier detection: Flags mentions that significantly deviate from established patterns

This sophisticated approach goes far beyond simple name matching, using the rich context established by our NLP system to make intelligent connections.

 2. Multi-factor entity comparison. DeepDive uses multiple attributes to compare and match entities across different sources

  • Biographical details: Birth dates, education history, and career milestones.
  • Geographical connections: Residential locations, business addresses, and travel patterns.
  • Organisational affiliations: Company roles, institutional connections, and professional memberships.
  • Relationship networks: Family members, business associates, and other consistent connections.
  • Temporal consistency: Chronological alignment of life events and activities.

By requiring multiple matching factors, the system dramatically reduces false positives while maintaining the widest possible pool of sources.

 3. Adversarial AI verification.  Even sophisticated algorithms make mistakes, which is why DeepDive employs a multi-layered verification approach:

  • Semantic analysis: Large Language Models evaluate whether content conceptually relates to the search subject
  • Coherence assessment: The system verifies that the assembled profile presents a logically consistent picture
  • Dual-system validation: Independent AI systems cross-check each other's entity resolution validation
  • Edge case detection: Special attention is given to borderline cases that might confuse standard algorithms

This verification layer acts as a crucial quality control mechanism, catching potential errors before they impact the investigation.

 4. Confidence scoring. DeepDive assigns confidence levels to each entity resolution decision:

  • Match strength: Quantifies the number and quality of matching attributes
  • Source reliability: Factors in the credibility of sources providing identifying information
  • Corroboration level: Weighs how many independent sources support the same conclusion
  • Distinguishing factors: Evaluates the presence of unique identifiers that differentiate similar individuals

These confidence scores provide transparency to analysts, allowing them to focus on high-confidence information

 5. False positive removal. The final step is decisive filtering to ensure only relevant information remains:

  • Strict exclusion: Content not firmly linked to the correct individual is removed from the analysis
  • Cluster separation: Clear boundaries are established between the subject and similar individuals
  • Manual review options: Borderline cases can be flagged for human review when appropriate
  • Continuous learning: The system refines its approach based on feedback and verified outcomes

This disciplined approach ensures the resulting Body of Knowledge focuses exclusively on the correct individual, eliminating the noise and confusion of false matches.

Beyond entity resolution...

DeepDive's entity resolution creates a filtered, verified foundation for the next stages:

  • Body of Knowledge creation through LLM-powered statement extraction
  • Confidence scoring of extracted statements against source reliability
  • Report generation with full source citations and structured sections
  • Interactive chatbot interrogation of the knowledge base

By transforming one of the most challenging aspects of investigations into a reliable, systematic process, DeepDive enables compliance teams and investigators to proceed with confidence that they're analysing the right person.

The result? Investigations that avoid costly mistakes stemming from identity confusion, deliver more accurate risk assessments, and save countless hours previously spent on discounting false positives.

Want to read more? The Body of Knowledge: From resolved entity into insight