How AI Search Actually Works

Context

Setting the Scene:
LLMs in 2025

2025 was the year AI went mainstream. Here's the landscape everyone's building on.

0

Major LLMs released in 2025

And many more smaller releases, forks, and variants

7+ Major players competing

10M Largest context window (tokens)

671B Largest open-source model (params)

The traffic paradox

Website traffic is dropping — but the demand for clear, discoverable content has never been higher.

700M+

Weekly active users on ChatGPT

Source: OpenAI, 2025

72%

Of clients self-educate before ever connecting with you

Source: The Law Gazette, 2025

88%

Of global organisations report regular AI use in at least one business function

Source: McKinsey, 2025

2.5B+

Prompts used every day on ChatGPT alone

Source: OpenAI, 2025

Where are people actually searching?

AI Tools

ChatGPT, Claude, Perplexity

Google AI Overviews

AI-generated summaries above results

Professional network research

Your Website

Still important — but shrinking

The 2025 release timeline

Jan

MiniMax

MiniMax-Text-01 launches with 456B parameters

Feb

OpenAI

GPT-4.5 & early o-series models improve speed and reasoning

Google

Gemini 2.0 goes multimodal across text, image, code, audio & video

Mar

Anthropic

Claude 4 Opus & Sonnet debut with deep reasoning and agent workflows

Google

Gemini 2.5 Pro arrives with 1M token context window

Apr

The major players

OpenAI

GPT-5

400K context · Multimodal · Codex

Closed

Anthropic

Claude 4

Deep reasoning · Agent workflows

Closed

Google

Gemini 2.5 Pro

1M context · Full multimodal

Closed

How LLMs Work:
An Explainer

From training to generation — the complete picture in plain English.

Training

Retrieval

Generation

Phase 1 — Building the Brain

The Analogy

Imagine teaching someone to become a universal expert. They read the world's biggest library — every book, article, and forum. But instead of memorising word-for-word, they build a massive mind map of how ideas connect.

01

Crawl Everything

Billions of pages of text scraped from books, articles, forums, and code.

02

Break Into Tokens

Text gets chopped into units the model can process — words, parts of words, punctuation.

03

Map Relationships

A 3D web of connections is built — not storing facts, but patterns between ideas.

04

Freeze the Model

Training ends, the brain is frozen. Yesterday's news? The LLM doesn't know about it.

Think of it like your Spotify algorithm — but for all human knowledge. It doesn't store the songs, it stores the patterns of what goes together.

See it in action: a legal prediction

The Question

“Which Sydney firms handle antitrust for the technology industry?”

→

Tokenised & Predicted

“[City] firms that handle antitrust for the technology industry often include [firm names]”

The LLM breaks the question into tokens, predicts the most likely answer based on patterns it learned during training, then fills in the firm names — but only firms whose content made those patterns clear.

Phase 2 — The Research Assistant

RAG: Retrieval Augmented Generation

The expert gets an upgrade: a research assistant with a smartphone. Before answering, the assistant searches the web, pulls the top articles, and hands them over. The expert reads those passages, combines them with what they already know, and gives a synthesised answer with citations.

Old Approach

Pure LLM

"I think Australia won, but I'm not 100% sure."

Frozen knowledge, no sources

vs

RAG Approach

LLM + Search

"Australia won 4-0 — here are the match reports."

Live data, citations included

1

Retrieve
Search the web for current info

→

2

Read
Extract relevant passages

→

3

Generate
Synthesise answer with citations

The Critical Bit — How Content Gets Found

1

AI searches Google or Bing using their APIs under the hood

2

It checks the top 5–10 results only — speed matters

3

It tries to extract clear, structured information

4

If your content is a mess, it gets skipped

Person A

"Oh yeah mate, so you go down the hall, well actually it's more of a corridor, and there's a door on the left but that's the linen cupboard..."

AI skips this

vs

Person B

"Bathroom: second door on the left."

AI cites this

What Breaks AI Readability

These common content patterns make your expertise invisible to AI.

Jargon & Acronyms

“Our TMT practice leverages cross-jurisdictional expertise” — AI can't map this to what clients actually search for.

PDFs & Images

Essential text locked in PDFs, infographics, or images that AI tools simply can't read or index.

Fragmented Structure

Bullet points without context. Key information scattered across dozens of pages instead of one clear source.

Assumed Knowledge

Content written for experts, not for the clients actually asking questions. No industry context or plain-language framing.

The fix: rewrite for AI discoverability

Before

Jargon-Heavy

“Our market-leading antitrust team advises on all aspects of corporate and M&A issues across all sectors.”

No clear audience, limited industry context, no explicit task framing

vs

After

AI-Ready

“We help companies in the technology sector navigate antitrust issues. This includes advising on cartels, antitrust litigation, regulatory compliance, and investigations.”

Specifies industry, uses plain language, gives concrete examples

How LLMs Actually Think

The Spotify Analogy

LLMs don't "understand" meaning — they predict patterns. Just like Spotify recommends Glass Animals because millions of Tame Impala listeners also liked them. Not because Spotify understands psychedelic rock.

Why this matters: If your content uses weird phrasing, the AI can't predict that "facilitates strategic commercial outcomes" means "we do M&A deals."

Content + Accessibility

What Makes Content
AI-Visible?

AI will only cite what it can reliably retrieve, parse, and trust. That comes down to two fundamentals.

Content

What you say, and how clearly you say it.

Accessibility

Whether the content is technically readable — by humans and machines.

Content determines whether you're worth citing.
Accessibility determines whether you're possible to cite.

A Tale of Two Firms

Both have the same size, same practice areas — but very different outcomes in AI search.

Meet Jane Smith

General Counsel at a major tech company considering an acquisition. She's new to the market and doesn't know the key legal players. She opens ChatGPT to shortcut her research.

Firm A

Invisible to AI

Practice pages full of jargon and acronyms
Key expertise buried in PDFs and brochures
No plain-language descriptions of services
Content written for peers, not prospects

Result: Not surfaced by ChatGPT. Never made the shortlist. Didn't even know the RFP existed.

Firm B

Discovered by AI

Clear, crawlable pages per practice area
Plain-language explanations with named industries
Structured HTML with proper headings and schema
Answer-first content matching client questions

Result: Surfaced in ChatGPT's recommendation. Made the shortlist. Won the engagement.

1

Jane researches on ChatGPT

→

2

Firm B is surfaced; Firm A is missing

→

3

Shortlist created without Firm A

→

4

Firm B wins the RFP

This isn't hypothetical. In-house counsel are already asking AI for recommendations. Procurement teams are using AI to build firm shortlists. The firms that act now will be found first.

Five Signals AI Looks For

Consistency

Same terminology and structure across every page. Contradictions force the model to hedge or skip you entirely.

Specificity

Named industries, jurisdictions, outcomes. Generic statements are hard to differentiate and rarely get cited.

Completeness

Who, what, where, when, why — all on one page. Thin pages lead to partial answers or competitor citations.

Consolidation

One authoritative page per topic. Scattered duplicates create conflicting passages the model can't resolve.

Freshness

Dated, updated, and current. Stale content loses trust — newer content becomes the dominant passage.

Accessibility = AI Readability

If a screen reader can't parse it, neither can an LLM.

Semantic HTML

Proper headings, lists, and landmarks help machines interpret each section.

Alt Text

Describing images gives AI indexable context it can't get from pixels.

Reading Order

Content that flows in DOM order, not just visual order. Reduces mis-parsing.

Tagged Documents

Properly tagged PDFs and readable tables are more likely to be extracted and reused.

Building a Strategy

Your 6-Step
Visibility Framework

A repeatable process that fits into normal marketing work.

What Doesn't Work

These are guesses, not strategies.

“All you need to do is write in plain English”

“Add more FAQs”

“Put keywords in your headings”

Start with measurement. You can't fix what you can't measure. Audit your content, prioritise by impact, then restructure for AI discoverability.

1

Pick Your Battlefield

Choose 3–5 priority practice areas. Agree one geography per page set. Start where revenue matters most.

2

Build a Question Map

Use a tool like Content Intelligence to automatically generate thousands of real questions your audiences are asking — across every topic, intent, and persona — then test your content against them at scale.

3

Fix the Content

Quarterly refresh of top copy. Add proof points to bios. Create answer-first insight templates. Redirect duplicates.

4

Set Governance

Canonical naming system. Marketing owns clarity, practice leaders own accuracy, digital owns accessibility.

5

Test at Scale

You can't validate by asking ChatGPT two questions. You need hundreds of tests across multiple models, repeatedly.

6

Use a Tool to Measure

Manually testing is not realistic. A tool lets you scan, identify issues, prioritise fixes, and prove lift.

The Content Strategy That Works

Four content pillars that make your expertise discoverable.

The Golden Brief

Plain-language explanations of each practice area, with dedicated, crawlable pages. No jargon, no ambiguity.

Face-Time

Rich partner and lawyer bios. AI engines love authoritative human profiles with named expertise and credentials.

Bread & Butter Content

FAQs and “what does this mean for me?” guides. Generative AI prefers answers over brochures.

Digital Footprint Hygiene

Clean URL structure, schema markup (especially for legal services), and updated sitemaps.

The Tool

Squiz Content Intelligence

Make your content discoverable for AI Search. The diagnostic layer that tells you what to fix first.

AI Search Visibility

Audit content structure for AI readability and discoverability.

Accessibility Auditing

Scan against WCAG 2.2 AA/AAA standards automatically.

Prioritise What Matters

Data-driven recommendations ranked by impact.

Works With Your Stack

Compatible with all written web content. No migration required.

1

Load Your URLs

→

2

Get Insights

→

3

Take Action

Product

Squiz Content Intelligence

Watch an overview of how Content Intelligence helps you measure and improve AI discoverability.

Squiz Content Intelligence on YouTube

Free Report

Free AI Visibility Report
& Competitor Review

See exactly how AI search platforms read your content today — and what to fix first.

Get started here with a content audit report

Loading form…

Scan to Explore

Learn more about Content Intelligence

Content Intelligence

See your AI readiness at a glance

Setting the Scene:LLMs in 2025

The traffic paradox

Where are people actually searching?

The 2025 release timeline

The major players

How LLMs Work:An Explainer

Phase 1 — Building the Brain

The Analogy

Crawl Everything

Break Into Tokens

Map Relationships

Freeze the Model

See it in action: a legal prediction

Phase 2 — The Research Assistant

RAG: Retrieval Augmented Generation

Pure LLM

LLM + Search

The Critical Bit — How Content Gets Found

What Breaks AI Readability

Jargon & Acronyms

PDFs & Images

Fragmented Structure

Assumed Knowledge

The fix: rewrite for AI discoverability

Jargon-Heavy

AI-Ready

How LLMs Actually Think

The Spotify Analogy

What Makes ContentAI-Visible?

Content

Accessibility

A Tale of Two Firms

Meet Jane Smith

Invisible to AI

Discovered by AI

Five Signals AI Looks For

Consistency

Specificity

Completeness

Consolidation

Freshness

Accessibility = AI Readability

Semantic HTML

Alt Text

Reading Order

Tagged Documents

Your 6-StepVisibility Framework

What Doesn't Work

Pick Your Battlefield

Build a Question Map

Fix the Content

Set Governance

Test at Scale

Use a Tool to Measure

The Content Strategy That Works

The Golden Brief

Face-Time

Bread & Butter Content

Digital Footprint Hygiene

Squiz Content Intelligence

AI Search Visibility

Accessibility Auditing

Prioritise What Matters

Works With Your Stack

Load Your URLs

Get Insights

Take Action

Squiz Content Intelligence

Free AI Visibility Report& Competitor Review

Get started here with a content audit report

Scan to Explore

Content Intelligence

Setting the Scene:
LLMs in 2025

How LLMs Work:
An Explainer

What Makes Content
AI-Visible?

Your 6-Step
Visibility Framework

Free AI Visibility Report
& Competitor Review