Can You See Which AI Bots Are Reading Your Site?
AI crawlers like GPTBot and ClaudeBot are reading your website right now. Here's how to see them in your analytics — and how to make your site easy for AI search to understand.
We wrote recently about how AI is changing where your traffic comes from. That post was about being cited in AI answers. This one is about the flip side, happening on your server right now: AI crawlers are visiting your site, reading your content, and most owners have no idea.
It's worth being able to see them — and worth making sure they understand what they're reading.
Who's actually crawling
A handful of named crawlers do most of the AI reading on the web:
- GPTBot — OpenAI, the company behind ChatGPT
- ClaudeBot — Anthropic, the company behind Claude
- PerplexityBot — the AI search engine Perplexity
- Google-Extended — Google's AI training crawler, separate from normal search
They're doing two different jobs. Some visits gather content to train future models. Others happen live, when someone asks a question and the AI fetches pages to build an answer — and, if you're lucky, cites you. That second kind is the new shop window.
Why you'd want to know
Most analytics tools treat all bots as noise and filter them out, so this activity is invisible by default. That's a shame, because it tells you things worth knowing:
- Whether you're visible to AI at all. If the major crawlers never appear, you're not part of the conversation AI is having with your potential customers.
- What they're reading. The pages AI crawlers favour are a clue to what's working — and which content is worth deepening.
- An early read on a new channel. Crawler interest tends to precede referral traffic from AI tools. Seeing it grow is a leading indicator, not a lagging one.
On the sites we manage, our built-in analytics surfaces this directly — which AI crawlers visited, how often, and which are sending real people back to you — rather than hiding it in the bot filter.
Making your site legible to AI
Seeing the crawlers is half of it. The other half is helping them understand your site, and there's a simple, emerging standard for that: an llms.txt file. It's a plain map of your site written for AI, pointing to your most important content so models don't have to guess. It's early, which is exactly why putting one in place now is a low-cost head start — we generate and maintain one automatically on the sites we look after.
Beyond that, the fundamentals from our earlier post still do the heavy lifting: clean metadata, genuine schema markup, a logical structure, and not hiding your best content behind a gate that crawlers can't read.
A note on control
Visibility cuts both ways. If you'd rather not have your content used for AI training, the same crawlers can largely be turned away through your site's robots rules — Google-Extended and GPTBot, for instance, both respect them. That's a strategic call, not a technical one: most organisations want to be found by AI search, but some have good reasons to hold content back. Either way, it should be a decision you've made on purpose, not a default you never saw.
Where to start
If you can't currently answer "is AI reading my site, and does it understand it?", that's the gap to close. We start with an audit — crawler visibility, your llms.txt and robots setup, metadata and schema — and fold the monitoring into the service so it's something you can see from then on. Email us at hello@mutual.agency.
Sources
Andrew is Technical Director at Mutual, a Craft CMS Partner agency. He has been building with Craft CMS since its public beta in 2012 — working through every major version from Craft 1 to Craft 5 — and has delivered over 100 sites for clients including Apple, Transparency International, and Arts University Bournemouth.
He writes about Craft CMS on the Mutual blog and has contributed to net Magazine. At Mutual, he leads development of Mutual One, a marketing platform built on Craft CMS as its foundation.
He has spoken about Craft CMS to undergraduate students at the University of Brighton and Canterbury Christ Church University, and appeared on the Devmode.fm podcast. He has also trained development teams at other agencies in working with the platform.