ROBOTS // crawler access

AI-bot access checker

Paste your robots.txt and find out in seconds whether the AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended and more — can reach your content.

Use the tool ↓ Free AI SEO audit →

$ runs in your browser · nothing stored · no signup

● AI-Bot Access Checker — runs in your browser

// paste robots.txt

Checks live as you type. No upload, nothing leaves your browser.

// results

Parse-mode only — checks path "/" using the robots.txt rules you pasted. Does not fetch your live robots.txt.

Not sure what to allow? See our guide on AI SEO services or get a free AI SEO audit.

What this checks

robots.txt is a plain-text file at the root of your site (yoursite.com/robots.txt) that tells crawlers which paths they're allowed to fetch. For years it only mattered for search engines. Now it's the gatekeeper for the AI crawlers too — and that's the stakes: if your robots.txt quietly blocks the AI bots, you disappear from AI answers. No error, no warning, no ranking drop. You just stop getting cited by ChatGPT, Perplexity, and Google AI Overviews, and most teams never notice. This tool reads the rules you paste and tells you, bot by bot, who can reach your content and who's locked out.

The bots that matter

Not all AI crawlers do the same job. The single most important distinction is training vs. search: some bots collect data to train models (you'll never see a referral from them), while others fetch your page live to answer a user's question right now and cite you. Block the wrong one and you either donate your content for free or vanish from the answers that send traffic. Here's the field guide:

User-agent	Operator	What it feeds	Type
`GPTBot`	OpenAI	Training data for future GPT models	Training
`ChatGPT-User`	OpenAI	Live fetches when a user's ChatGPT prompt browses the web	Search / browse
`OAI-SearchBot`	OpenAI	ChatGPT search index — surfaces & links your pages in answers	Search
`ClaudeBot`	Anthropic	Crawls content for Claude (training and product use)	Training / search
`anthropic-ai`	Anthropic	Anthropic's AI crawler (legacy/secondary agent)	Training / search
`PerplexityBot`	Perplexity	Builds Perplexity's index — cites you in Perplexity answers	Search
`Google-Extended`	Google	Controls Gemini / Vertex AI training only — not Search	Training
`CCBot`	Common Crawl	Open dataset that feeds many open-source & commercial models	Training
`Bytespider`	ByteDance	Crawls for ByteDance / TikTok AI products	Training
`Amazonbot`	Amazon	Crawls for Amazon products (Alexa answers, Amazon AI)	Search / AI
`Applebot-Extended`	Apple	Controls use of your content for Apple Intelligence training	Training

Read that table once and the strategy gets obvious: blocking a training bot costs you nothing in traffic; blocking a search bot costs you citations and referrals. The bot worth understanding cold is Google-Extended — see below.

Google-Extended ≠ Google Search

This trips up almost everyone, so it gets its own section. Google-Extended only governs whether your content trains Google's generative AI (Gemini and Vertex AI). It has zero effect on Googlebot, on Google Search indexing, or on ranking. You can Disallow: / for Google-Extended and your pages will still be crawled, indexed, and ranked in ordinary Google Search exactly as before. They are different crawlers with different jobs. Anyone who tells you blocking Google-Extended will hurt your rankings is wrong.

How to fix a blocked bot

robots.txt is just grouped User-agent / Allow / Disallow directives. Copy-paste one of these into your robots.txt and adjust.

Allow GPTBot (let ChatGPT train on you):

User-agent: GPTBot
Allow: /

Block AI training, but keep AI search visibility:

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# Still allow the AI search/browse bots
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

Block every AI crawler (nuclear option):

User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: OAI-SearchBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: PerplexityBot
User-agent: Google-Extended
User-agent: CCBot
User-agent: Bytespider
User-agent: Amazonbot
User-agent: Applebot-Extended
Disallow: /

Note that a missing rule means allowed — robots.txt is opt-out, not opt-in. If a bot has no group of its own and your User-agent: * group allows crawling, that bot can crawl. To exclude a bot you have to name it explicitly.

Should you block AI crawlers? The trade-off

There's no universally right answer — only a trade-off you should make on purpose instead of by accident:

Block training, allow search. The pragmatic default for most marketing sites. You don't hand your content to model training, but you stay quotable in ChatGPT, Perplexity, and AI Overviews — where the referral traffic lives.
Allow everything. Maximum reach and citation surface. Sensible if your goal is brand visibility and you're relaxed about your content being in training sets.
Block everything. Right for paywalled, proprietary, or licensed content you don't want machines reading at all — but understand you're opting out of the AI-answer ecosystem entirely.

Our AEO-flavoured view: for most businesses, being cited in AI answers is the new front page. Block the training bots if you care about your IP, but think hard before you block the search bots — that's where discovery is moving.

One limitation to know

This tool runs in parse mode. It evaluates the robots.txt text you paste against the root path ("/") — it does not fetch your live robots.txt from the server. So paste the real file (or the rules you're about to ship) to get an accurate read. Once your access is set, point the bots at your best content with an llms.txt file — the natural next step. Want a human to audit the whole picture? Grab a free AI SEO audit or see AI SEO services.

// the AI-readiness workflow

Use the whole toolbox

Each tool does one job. Run them in order and you’ve covered access, identity, structure and preview — the whole AI-readiness loop, in your browser.

Check the bots can reach you

Confirm GPTBot, ClaudeBot, PerplexityBot & Google-Extended aren’t blocked in robots.txt.

// you are here

Tell them who you are

Generate an llms.txt — a clean, structured brief for the AI crawlers.

Open →

Mark up your answers

Add FAQPage, Article or Organization JSON-LD — the data layer AI engines trust.

Open →

Preview your snippet

See your Google snippet and a stylised AI answer before it goes live.

Open →

// questions

FAQ

Is my site blocked from ChatGPT? +

Paste your robots.txt above to find out. ChatGPT relies on three OpenAI user-agents: GPTBot (training), and ChatGPT-User plus OAI-SearchBot (live browsing and ChatGPT search). If your robots.txt has "User-agent: GPTBot" with "Disallow: /", your content is excluded from training; blocking ChatGPT-User or OAI-SearchBot is what keeps you out of live ChatGPT answers. Block all three and you are effectively invisible to ChatGPT.

What is GPTBot? +

GPTBot is OpenAI's web crawler used to gather content for training future models. It is the one most people mean when they say "block ChatGPT," but it only governs training data — it does not control whether ChatGPT can browse to your page live. That is ChatGPT-User and OAI-SearchBot. You can allow GPTBot in robots.txt with a group: "User-agent: GPTBot" followed by "Allow: /".

Should I block AI crawlers? +

It depends on what you are optimising for. Blocking AI training crawlers (GPTBot, Google-Extended, CCBot, Applebot-Extended, Bytespider) keeps your content out of model training sets — a reasonable stance for original IP or paywalled work. But blocking the AI search bots (ChatGPT-User, OAI-SearchBot, PerplexityBot) removes you from AI answers and the citations that send referral traffic. Many sites split the difference: block training, allow AI search, so they stay visible without donating their content to model training.

Does blocking Google-Extended hurt my Google ranking? +

No. Google-Extended only controls whether your content is used to train Google's generative AI (Gemini and Vertex AI) and to ground AI features. It is completely separate from Googlebot, Google Search indexing, and ranking. You can disallow Google-Extended and your pages will still be crawled, indexed, and ranked in normal Google Search exactly as before. The two are different crawlers with different jobs.

robots.txt vs llms.txt — what is the difference? +

robots.txt is a permission file: it tells crawlers which paths they may or may not fetch. llms.txt is a curation file: a Markdown file at your site root that points AI systems to your most important, clean content so they can find and quote the right pages. robots.txt controls access; llms.txt guides attention. They work together — there is no point publishing an llms.txt if your robots.txt blocks the bots from reading the pages it lists.

// your move

Want this done
for you?

Founder-led AI SEO — brand signals, citations, real organic growth. We’ll tell you straight whether it fits.

Book a call → Free AI SEO audit

AI-bot access checker

// paste robots.txt

// results

What this checks

The bots that matter

Google-Extended ≠ Google Search

How to fix a blocked bot

Should you block AI crawlers? The trade-off

One limitation to know

Use the whole toolbox

Check the bots can reach you

Tell them who you are

Mark up your answers

Preview your snippet

FAQ

Want this donefor you?

Tell usabout your site.

Want this done
for you?

Tell us
about your site.