How AI Search Actually Works

No jargon. No fluff. Just clear analogies that make LLMs, RAG, and AI search click for everyone.

Squiz Content Intelligence — AI Search Readiness and Accessibility Health dashboard
How visible is your website to AI search right now?Find out in minutes with a free audit report.
Check my visibility →

Setting the Scene:
LLMs in 2025

2025 was the year AI went mainstream. Here's the landscape everyone's building on.

0
Major LLMs released in 2025

And many more smaller releases, forks, and variants

7+ Major players competing
10M Largest context window (tokens)
671B Largest open-source model (params)

The traffic paradox

Website traffic is dropping — but the demand for clear, discoverable content has never been higher.

700M+
Weekly active users on ChatGPT
Source: OpenAI, 2025
72%
Of clients self-educate before ever connecting with you
Source: The Law Gazette, 2025
88%
Of global organisations report regular AI use in at least one business function
Source: McKinsey, 2025
2.5B+
Prompts used every day on ChatGPT alone
Source: OpenAI, 2025

The 2025 release timeline

Jan

MiniMax-Text-01 launches with 456B parameters

Feb

GPT-4.5 & early o-series models improve speed and reasoning

Gemini 2.0 goes multimodal across text, image, code, audio & video

Mar

Claude 4 Opus & Sonnet debut with deep reasoning and agent workflows

Gemini 2.5 Pro arrives with 1M token context window

Apr

Llama 4 Scout ships with 10M token context — largest ever

Qwen 3 becomes the most popular open-weight model

Jul

Grok 5 launches — strong in maths, coding, and reasoning

Aug

GPT-5 arrives: unified multimodal system with 400K token context

DeepSeek V3 hits 671B params — top open-source benchmark scores

Nov

Gemini 3 pushes further into advanced reasoning and tool use

The major players

OpenAI
GPT-5
400K context · Multimodal · Codex
Closed
Anthropic
Claude 4
Deep reasoning · Agent workflows
Closed
Google
Gemini 2.5 Pro
1M context · Full multimodal
Closed
Meta
Llama 4
10M context · Scout & Maverick
Open Source
DeepSeek
V3 Terminus
671B params · Top benchmarks
Open Source
Alibaba
Qwen 3
Most popular open-weight model
Open Source
xAI
Grok 5
Maths & coding specialist
Closed

The takeaway: Open-source models matched or beat proprietary systems on key benchmarks. The playing field levelled — and content structure became the differentiator, not budget.

How LLMs Work:
An Explainer

From training to generation — the complete picture in plain English.

Training
Retrieval
Generation

Phase 1 — Building the Brain

The Analogy

Imagine teaching someone to become a universal expert. They read the world's biggest library — every book, article, and forum. But instead of memorising word-for-word, they build a massive mind map of how ideas connect.

01

Crawl Everything

Billions of pages of text scraped from books, articles, forums, and code.

02

Break Into Tokens

Text gets chopped into units the model can process — words, parts of words, punctuation.

03

Map Relationships

A 3D web of connections is built — not storing facts, but patterns between ideas.

04

Freeze the Model

Training ends, the brain is frozen. Yesterday's news? The LLM doesn't know about it.

Think of it like your Spotify algorithm — but for all human knowledge. It doesn't store the songs, it stores the patterns of what goes together.

See it in action: a legal prediction

The Question

“Which Sydney firms handle antitrust for the technology industry?”

Tokenised & Predicted

[City] firms that handle antitrust for the technology industry often include [firm names]

The LLM breaks the question into tokens, predicts the most likely answer based on patterns it learned during training, then fills in the firm names — but only firms whose content made those patterns clear.

Phase 2 — The Research Assistant

RAG: Retrieval Augmented Generation

The expert gets an upgrade: a research assistant with a smartphone. Before answering, the assistant searches the web, pulls the top articles, and hands them over. The expert reads those passages, combines them with what they already know, and gives a synthesised answer with citations.

Old Approach

Pure LLM

"I think Australia won, but I'm not 100% sure."

Frozen knowledge, no sources
vs
RAG Approach

LLM + Search

"Australia won 4-0 — here are the match reports."

Live data, citations included
1
Retrieve
Search the web for current info
2
Read
Extract relevant passages
3
Generate
Synthesise answer with citations

The Critical Bit — How Content Gets Found

1

AI searches Google or Bing using their APIs under the hood

2

It checks the top 5–10 results only — speed matters

3

It tries to extract clear, structured information

4

If your content is a mess, it gets skipped

Person A

"Oh yeah mate, so you go down the hall, well actually it's more of a corridor, and there's a door on the left but that's the linen cupboard..."

AI skips this
vs
Person B

"Bathroom: second door on the left."

AI cites this

What Breaks AI Readability

These common content patterns make your expertise invisible to AI.

Jargon & Acronyms

“Our TMT practice leverages cross-jurisdictional expertise” — AI can't map this to what clients actually search for.

PDFs & Images

Essential text locked in PDFs, infographics, or images that AI tools simply can't read or index.

Fragmented Structure

Bullet points without context. Key information scattered across dozens of pages instead of one clear source.

Assumed Knowledge

Content written for experts, not for the clients actually asking questions. No industry context or plain-language framing.

The fix: rewrite for AI discoverability

Before

Jargon-Heavy

“Our market-leading antitrust team advises on all aspects of corporate and M&A issues across all sectors.”

No clear audience, limited industry context, no explicit task framing
vs
After

AI-Ready

“We help companies in the technology sector navigate antitrust issues. This includes advising on cartels, antitrust litigation, regulatory compliance, and investigations.”

Specifies industry, uses plain language, gives concrete examples

How LLMs Actually Think

The Spotify Analogy

LLMs don't "understand" meaning — they predict patterns. Just like Spotify recommends Glass Animals because millions of Tame Impala listeners also liked them. Not because Spotify understands psychedelic rock.

Why this matters: If your content uses weird phrasing, the AI can't predict that "facilitates strategic commercial outcomes" means "we do M&A deals."

Your competitor with better structure is getting cited instead of you.See exactly where you stand — free AI visibility report.
Get my AI visibility report →

What Makes Content
AI-Visible?

AI will only cite what it can reliably retrieve, parse, and trust. That comes down to two fundamentals.

Content

What you say, and how clearly you say it.

Accessibility

Whether the content is technically readable — by humans and machines.

A Tale of Two Firms

Both have the same size, same practice areas — but very different outcomes in AI search.

Meet Jane Smith

General Counsel at a major tech company considering an acquisition. She's new to the market and doesn't know the key legal players. She opens ChatGPT to shortcut her research.

Firm A

Invisible to AI

  • Practice pages full of jargon and acronyms
  • Key expertise buried in PDFs and brochures
  • No plain-language descriptions of services
  • Content written for peers, not prospects
Result: Not surfaced by ChatGPT. Never made the shortlist. Didn't even know the RFP existed.
Firm B

Discovered by AI

  • Clear, crawlable pages per practice area
  • Plain-language explanations with named industries
  • Structured HTML with proper headings and schema
  • Answer-first content matching client questions
Result: Surfaced in ChatGPT's recommendation. Made the shortlist. Won the engagement.
1

Jane researches on ChatGPT

2

Firm B is surfaced; Firm A is missing

3

Shortlist created without Firm A

4

Firm B wins the RFP

This isn't hypothetical. In-house counsel are already asking AI for recommendations. Procurement teams are using AI to build firm shortlists. The firms that act now will be found first.

Are you being found — or is your competition?Compare your AI visibility against competitors in your sector.
Check against the competition →

Five Signals AI Looks For

Consistency

Same terminology and structure across every page. Contradictions force the model to hedge or skip you entirely.

Specificity

Named industries, jurisdictions, outcomes. Generic statements are hard to differentiate and rarely get cited.

Completeness

Who, what, where, when, why — all on one page. Thin pages lead to partial answers or competitor citations.

Consolidation

One authoritative page per topic. Scattered duplicates create conflicting passages the model can't resolve.

Freshness

Dated, updated, and current. Stale content loses trust — newer content becomes the dominant passage.

Accessibility = AI Readability

If a screen reader can't parse it, neither can an LLM.

Semantic HTML

Proper headings, lists, and landmarks help machines interpret each section.

Alt Text

Describing images gives AI indexable context it can't get from pixels.

Reading Order

Content that flows in DOM order, not just visual order. Reduces mis-parsing.

Tagged Documents

Properly tagged PDFs and readable tables are more likely to be extracted and reused.

Get your free AI visibility report.See how AI search platforms read your content today — and what to fix first.
Get your AI visibility report →

Your 6-Step
Visibility Framework

A repeatable process that fits into normal marketing work.

What Doesn't Work

These are guesses, not strategies.

“All you need to do is write in plain English”

“Add more FAQs”

“Put keywords in your headings”

Start with measurement. You can't fix what you can't measure. Audit your content, prioritise by impact, then restructure for AI discoverability.

1

Pick Your Battlefield

Choose 3–5 priority practice areas. Agree one geography per page set. Start where revenue matters most.

2

Build a Question Map

Use a tool like Content Intelligence to automatically generate thousands of real questions your audiences are asking — across every topic, intent, and persona — then test your content against them at scale.

3

Fix the Content

Quarterly refresh of top copy. Add proof points to bios. Create answer-first insight templates. Redirect duplicates.

4

Set Governance

Canonical naming system. Marketing owns clarity, practice leaders own accuracy, digital owns accessibility.

5

Test at Scale

You can't validate by asking ChatGPT two questions. You need hundreds of tests across multiple models, repeatedly.

6

Use a Tool to Measure

Manually testing is not realistic. A tool lets you scan, identify issues, prioritise fixes, and prove lift.

The Content Strategy That Works

Four content pillars that make your expertise discoverable.

The Golden Brief

Plain-language explanations of each practice area, with dedicated, crawlable pages. No jargon, no ambiguity.

Face-Time

Rich partner and lawyer bios. AI engines love authoritative human profiles with named expertise and credentials.

Bread & Butter Content

FAQs and “what does this mean for me?” guides. Generative AI prefers answers over brochures.

Digital Footprint Hygiene

Clean URL structure, schema markup (especially for legal services), and updated sitemaps.

Squiz Content Intelligence

Make your content discoverable for AI Search. The diagnostic layer that tells you what to fix first.

AI Search Visibility

Audit content structure for AI readability and discoverability.

Accessibility Auditing

Scan against WCAG 2.2 AA/AAA standards automatically.

Prioritise What Matters

Data-driven recommendations ranked by impact.

Works With Your Stack

Compatible with all written web content. No migration required.

1

Load Your URLs

2

Get Insights

3

Take Action

Ready to see it in action on your own website?Start your free content audit — no migration, no commitment.
Start my free audit →

Squiz Content Intelligence

Watch an overview of how Content Intelligence helps you measure and improve AI discoverability.

Squiz Content Intelligence on YouTube

Free AI Visibility Report
& Competitor Review

See exactly how AI search platforms read your content today — and what to fix first.

Get started here with a content audit report

Loading form…

Scan to Explore

Learn more about Content Intelligence

QR Code - Squiz Content Intelligence

Content Intelligence

See your AI readiness at a glance

Squiz Content Intelligence - AI Search Readiness and Accessibility Health dashboard
1/5