ROBOTS // crawler access
AI-bot access checker
Paste your robots.txt and find out in seconds whether the AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended and more — can reach your content.
$ runs in your browser · nothing stored · no signup
// paste robots.txt
Checks live as you type. No upload, nothing leaves your browser.
// results
Parse-mode only — checks path "/" using the robots.txt rules you pasted. Does not fetch your live robots.txt.
Not sure what to allow? See our guide on AI SEO services or get a free AI SEO audit.
What this checks
robots.txt is a plain-text file at the root of your site (yoursite.com/robots.txt) that tells
crawlers which paths they're allowed to fetch. For years it only mattered for search engines. Now it's the gatekeeper for the
AI crawlers too — and that's the stakes: if your robots.txt quietly blocks the AI bots, you disappear
from AI answers. No error, no warning, no ranking drop. You just stop getting cited by ChatGPT, Perplexity, and Google
AI Overviews, and most teams never notice. This tool reads the rules you paste and tells you, bot by bot, who can reach your
content and who's locked out.
The bots that matter
Not all AI crawlers do the same job. The single most important distinction is training vs. search: some bots collect data to train models (you'll never see a referral from them), while others fetch your page live to answer a user's question right now and cite you. Block the wrong one and you either donate your content for free or vanish from the answers that send traffic. Here's the field guide:
| User-agent | Operator | What it feeds | Type |
|---|---|---|---|
GPTBot | OpenAI | Training data for future GPT models | Training |
ChatGPT-User | OpenAI | Live fetches when a user's ChatGPT prompt browses the web | Search / browse |
OAI-SearchBot | OpenAI | ChatGPT search index — surfaces & links your pages in answers | Search |
ClaudeBot | Anthropic | Crawls content for Claude (training and product use) | Training / search |
anthropic-ai | Anthropic | Anthropic's AI crawler (legacy/secondary agent) | Training / search |
PerplexityBot | Perplexity | Builds Perplexity's index — cites you in Perplexity answers | Search |
Google-Extended | Controls Gemini / Vertex AI training only — not Search | Training | |
CCBot | Common Crawl | Open dataset that feeds many open-source & commercial models | Training |
Bytespider | ByteDance | Crawls for ByteDance / TikTok AI products | Training |
Amazonbot | Amazon | Crawls for Amazon products (Alexa answers, Amazon AI) | Search / AI |
Applebot-Extended | Apple | Controls use of your content for Apple Intelligence training | Training |
Read that table once and the strategy gets obvious: blocking a training bot costs you nothing in traffic; blocking a search bot costs you citations and referrals. The bot worth understanding cold is Google-Extended — see below.
Google-Extended ≠ Google Search
This trips up almost everyone, so it gets its own section. Google-Extended only governs whether your content trains
Google's generative AI (Gemini and Vertex AI). It has zero effect on Googlebot, on Google Search indexing, or
on ranking. You can Disallow: / for Google-Extended and your pages will still be crawled, indexed, and
ranked in ordinary Google Search exactly as before. They are different crawlers with different jobs. Anyone who tells you
blocking Google-Extended will hurt your rankings is wrong.
How to fix a blocked bot
robots.txt is just grouped User-agent / Allow / Disallow directives. Copy-paste one of
these into your robots.txt and adjust.
Allow GPTBot (let ChatGPT train on you):
User-agent: GPTBot
Allow: / Block AI training, but keep AI search visibility:
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
# Still allow the AI search/browse bots
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: / Block every AI crawler (nuclear option):
User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: OAI-SearchBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: PerplexityBot
User-agent: Google-Extended
User-agent: CCBot
User-agent: Bytespider
User-agent: Amazonbot
User-agent: Applebot-Extended
Disallow: /
Note that a missing rule means allowed — robots.txt is opt-out, not opt-in. If a bot has no group of its own
and your User-agent: * group allows crawling, that bot can crawl. To exclude a bot you have to name it explicitly.
Should you block AI crawlers? The trade-off
There's no universally right answer — only a trade-off you should make on purpose instead of by accident:
- Block training, allow search. The pragmatic default for most marketing sites. You don't hand your content to model training, but you stay quotable in ChatGPT, Perplexity, and AI Overviews — where the referral traffic lives.
- Allow everything. Maximum reach and citation surface. Sensible if your goal is brand visibility and you're relaxed about your content being in training sets.
- Block everything. Right for paywalled, proprietary, or licensed content you don't want machines reading at all — but understand you're opting out of the AI-answer ecosystem entirely.
Our AEO-flavoured view: for most businesses, being cited in AI answers is the new front page. Block the training bots if you care about your IP, but think hard before you block the search bots — that's where discovery is moving.
One limitation to know
This tool runs in parse mode. It evaluates the robots.txt text you paste against the root path
("/") — it does not fetch your live robots.txt from the server. So paste the real
file (or the rules you're about to ship) to get an accurate read. Once your access is set, point the bots at your best content
with an llms.txt file — the natural next step. Want a human to audit the whole picture?
Grab a free AI SEO audit or see AI SEO services.
// the AI-readiness workflow
Use the whole toolbox
Each tool does one job. Run them in order and you’ve covered access, identity, structure and preview — the whole AI-readiness loop, in your browser.
01
Check the bots can reach you
Confirm GPTBot, ClaudeBot, PerplexityBot & Google-Extended aren’t blocked in robots.txt.
// you are here
02
Tell them who you are
Generate an llms.txt — a clean, structured brief for the AI crawlers.
Open →
03
Mark up your answers
Add FAQPage, Article or Organization JSON-LD — the data layer AI engines trust.
Open →
04
Preview your snippet
See your Google snippet and a stylised AI answer before it goes live.
Open →
// questions
FAQ
Is my site blocked from ChatGPT? +
What is GPTBot? +
Should I block AI crawlers? +
Does blocking Google-Extended hurt my Google ranking? +
robots.txt vs llms.txt — what is the difference? +
// your move
Want this done
for you?
Founder-led AI SEO — brand signals, citations, real organic growth. We’ll tell you straight whether it fits.