RAGs to Riches

Theo Cox

・

Founding GTM, Wexler AI

In 2023, legal AI found its holy grail. Or so it thought.

Retrieval Augmented Generation- RAG, for those who prefer their acronyms punchy- promised to solve the hallucination problem that had made lawyers understandably nervous about letting large language models anywhere near their briefs. The pitch was elegant: ground the AI in actual documents, and it stops making things up. No more invented cases. No more fictional judges. Just cited, verifiable answers.

For a while, it seemed to work. Legal research tools built on RAG could find relevant precedent, summarise contracts, and draft passable first cuts of standard documents. Semantic similarity, it turns out, is something these systems do rather well. They are, in effect, exceptionally good Swiss Army knives-versatile, competent across many tasks, genuinely useful.

But litigation- the actual work of disputes- is not a research problem. It is a reasoning problem. And reasoning, at the level disputes demand, requires a scalpel.

The questions that actually matter

Consider what high-stakes disputes require:

What did the defendant know, and when did they know it? This isn’t a query you answer with a single retrieval. It requires tracing information across dozens of documents- connecting an email from March to a board minute from April to testimony about a phone call in May.

Where do the accounts conflict? The CFO’s testimony on page 47 might contradict the email chain buried in exhibit 312. Surfacing that inconsistency requires systematic comparison across the entire documentary record, not isolated retrieval.

What is “the March meeting”? Fourteen references across 200 documents might use different language—”the board session,” “the Q1 review,” “the meeting with counsel”—to describe the same event. Or they might not. Resolving this requires entity tracking, not semantic similarity.

These are not questions you answer by finding text that looks relevant. They require connecting facts across a sprawling evidentiary landscape—tracking entities, mapping relationships, following temporal threads, and surfacing the inconsistencies that win or lose cases.

In this work, precision is the difference between winning and losing. “Mostly right” isn’t good enough.

The limits of sophisticated RAG

To be clear: RAG is not a poor system. For the right problems, it is genuinely powerful. The Swiss Army knife keeps getting sharper with greater level sophistication.

The leading legal AI platforms- Harvey, Legora, and others- have built sophisticated implementations that go well beyond naive chunk-and-retrieve. Cascading models. Fine-tuned retrievers. Layered orchestration engines. These systems deliver real value for legal research, contract analysis, and document summarisation.

But this architecture constrains precision.

However sophisticated the retrieval, RAG systems still fundamentally operate by finding chunks of text that seem relevant to a query. They have no structural understanding of how Witness A relates to Document B relates to Event C. When you pose a complex question, the system must infer these connections on the fly.

Sometimes it gets lucky. Often, as a Stanford study found last year, it returns answers that are “incorrect but very convincing”—which, in litigation, is arguably worse than being obviously wrong.

When the fact pattern determines the outcome, versatility is not the virtue. Precision is.

A different architecture

This is the problem we set out to solve at Wexler.

Rather than retrieve and hope, we extract facts as discrete entities, map the relationships between them, and construct a structured representation of the evidentiary record. Think of it less as search and more as cartography: before you can navigate a landscape, someone has to draw the map.

The engineering is considerably harder. Our pipeline runs fifteen steps before a single fact surfaces in a chronology:

Parsing documents across formats and jurisdictions
Extracting factual assertions as discrete entities
Resolving references (”the Company,” “Defendant,” “XYZ Corp”) to specific actors
Mapping temporal relationships between events
Clustering related facts across the document set
Enriching with contextual metadata
Analysing against user-supplied case theories

Each step introduces potential error, which is precisely why most platforms don’t attempt it. But the alternative—layering increasingly baroque workarounds onto an architecture designed for a different problem—is how you build systems that are impressive in demos and unreliable in practice.

It’s a scalpel, not a Swiss Army knife. Purpose-built for the precision that disputes demand.

The second act

The legal AI market is entering its second act. The opening number-”look, It can summarise documents!”- has finished. The question now is whether the underlying architecture can handle the work that actually matters.

For legal research and contract review, RAG-based systems will continue to improve. The techniques are well-understood, the benchmarks are established, and the major players are iterating rapidly.

But for litigation- for the painstaking work of building a factual record, identifying what each witness said about each event, and finding the inconsistencies that matter- the industry needs a different approach. One that treats relationships as first-class objects, not artifacts to be inferred.

RAG got us to the theatre. The next generation needs to deliver the performance.

Start your
fact-finding journey

Book a demo

Start your fact-finding journey

Start your
fact-finding journey