ROBOTS // crawler access

AI-bot access checker

Paste your robots.txt and find out in seconds whether the AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended and more — can reach your content.

$ runs in your browser · nothing stored · no signup

AI-Bot Access Checker — runs in your browser

// paste robots.txt

Checks live as you type. No upload, nothing leaves your browser.

// results

Parse-mode only — checks path "/" using the robots.txt rules you pasted. Does not fetch your live robots.txt.

Not sure what to allow? See our guide on AI SEO services or get a free AI SEO audit.

What this checks

robots.txt is a plain-text file at the root of your site (yoursite.com/robots.txt) that tells crawlers which paths they're allowed to fetch. For years it only mattered for search engines. Now it's the gatekeeper for the AI crawlers too — and that's the stakes: if your robots.txt quietly blocks the AI bots, you disappear from AI answers. No error, no warning, no ranking drop. You just stop getting cited by ChatGPT, Perplexity, and Google AI Overviews, and most teams never notice. This tool reads the rules you paste and tells you, bot by bot, who can reach your content and who's locked out.

The bots that matter

Not all AI crawlers do the same job. The single most important distinction is training vs. search: some bots collect data to train models (you'll never see a referral from them), while others fetch your page live to answer a user's question right now and cite you. Block the wrong one and you either donate your content for free or vanish from the answers that send traffic. Here's the field guide:

User-agentOperatorWhat it feedsType
GPTBotOpenAITraining data for future GPT modelsTraining
ChatGPT-UserOpenAILive fetches when a user's ChatGPT prompt browses the webSearch / browse
OAI-SearchBotOpenAIChatGPT search index — surfaces & links your pages in answersSearch
ClaudeBotAnthropicCrawls content for Claude (training and product use)Training / search
anthropic-aiAnthropicAnthropic's AI crawler (legacy/secondary agent)Training / search
PerplexityBotPerplexityBuilds Perplexity's index — cites you in Perplexity answersSearch
Google-ExtendedGoogleControls Gemini / Vertex AI training only — not SearchTraining
CCBotCommon CrawlOpen dataset that feeds many open-source & commercial modelsTraining
BytespiderByteDanceCrawls for ByteDance / TikTok AI productsTraining
AmazonbotAmazonCrawls for Amazon products (Alexa answers, Amazon AI)Search / AI
Applebot-ExtendedAppleControls use of your content for Apple Intelligence trainingTraining

Read that table once and the strategy gets obvious: blocking a training bot costs you nothing in traffic; blocking a search bot costs you citations and referrals. The bot worth understanding cold is Google-Extended — see below.

Google-Extended ≠ Google Search

This trips up almost everyone, so it gets its own section. Google-Extended only governs whether your content trains Google's generative AI (Gemini and Vertex AI). It has zero effect on Googlebot, on Google Search indexing, or on ranking. You can Disallow: / for Google-Extended and your pages will still be crawled, indexed, and ranked in ordinary Google Search exactly as before. They are different crawlers with different jobs. Anyone who tells you blocking Google-Extended will hurt your rankings is wrong.

How to fix a blocked bot

robots.txt is just grouped User-agent / Allow / Disallow directives. Copy-paste one of these into your robots.txt and adjust.

Allow GPTBot (let ChatGPT train on you):

User-agent: GPTBot
Allow: /

Block AI training, but keep AI search visibility:

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# Still allow the AI search/browse bots
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

Block every AI crawler (nuclear option):

User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: OAI-SearchBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: PerplexityBot
User-agent: Google-Extended
User-agent: CCBot
User-agent: Bytespider
User-agent: Amazonbot
User-agent: Applebot-Extended
Disallow: /

Note that a missing rule means allowed — robots.txt is opt-out, not opt-in. If a bot has no group of its own and your User-agent: * group allows crawling, that bot can crawl. To exclude a bot you have to name it explicitly.

Should you block AI crawlers? The trade-off

There's no universally right answer — only a trade-off you should make on purpose instead of by accident:

  • Block training, allow search. The pragmatic default for most marketing sites. You don't hand your content to model training, but you stay quotable in ChatGPT, Perplexity, and AI Overviews — where the referral traffic lives.
  • Allow everything. Maximum reach and citation surface. Sensible if your goal is brand visibility and you're relaxed about your content being in training sets.
  • Block everything. Right for paywalled, proprietary, or licensed content you don't want machines reading at all — but understand you're opting out of the AI-answer ecosystem entirely.

Our AEO-flavoured view: for most businesses, being cited in AI answers is the new front page. Block the training bots if you care about your IP, but think hard before you block the search bots — that's where discovery is moving.

One limitation to know

This tool runs in parse mode. It evaluates the robots.txt text you paste against the root path ("/") — it does not fetch your live robots.txt from the server. So paste the real file (or the rules you're about to ship) to get an accurate read. Once your access is set, point the bots at your best content with an llms.txt file — the natural next step. Want a human to audit the whole picture? Grab a free AI SEO audit or see AI SEO services.

// questions

FAQ

Is my site blocked from ChatGPT? +
Paste your robots.txt above to find out. ChatGPT relies on three OpenAI user-agents: GPTBot (training), and ChatGPT-User plus OAI-SearchBot (live browsing and ChatGPT search). If your robots.txt has "User-agent: GPTBot" with "Disallow: /", your content is excluded from training; blocking ChatGPT-User or OAI-SearchBot is what keeps you out of live ChatGPT answers. Block all three and you are effectively invisible to ChatGPT.
What is GPTBot? +
GPTBot is OpenAI's web crawler used to gather content for training future models. It is the one most people mean when they say "block ChatGPT," but it only governs training data — it does not control whether ChatGPT can browse to your page live. That is ChatGPT-User and OAI-SearchBot. You can allow GPTBot in robots.txt with a group: "User-agent: GPTBot" followed by "Allow: /".
Should I block AI crawlers? +
It depends on what you are optimising for. Blocking AI training crawlers (GPTBot, Google-Extended, CCBot, Applebot-Extended, Bytespider) keeps your content out of model training sets — a reasonable stance for original IP or paywalled work. But blocking the AI search bots (ChatGPT-User, OAI-SearchBot, PerplexityBot) removes you from AI answers and the citations that send referral traffic. Many sites split the difference: block training, allow AI search, so they stay visible without donating their content to model training.
Does blocking Google-Extended hurt my Google ranking? +
No. Google-Extended only controls whether your content is used to train Google's generative AI (Gemini and Vertex AI) and to ground AI features. It is completely separate from Googlebot, Google Search indexing, and ranking. You can disallow Google-Extended and your pages will still be crawled, indexed, and ranked in normal Google Search exactly as before. The two are different crawlers with different jobs.
robots.txt vs llms.txt — what is the difference? +
robots.txt is a permission file: it tells crawlers which paths they may or may not fetch. llms.txt is a curation file: a Markdown file at your site root that points AI systems to your most important, clean content so they can find and quote the right pages. robots.txt controls access; llms.txt guides attention. They work together — there is no point publishing an llms.txt if your robots.txt blocks the bots from reading the pages it lists.

// your move

Want this done
for you?

Founder-led AI SEO — brand signals, citations, real organic growth. We’ll tell you straight whether it fits.

// or send a message

Tell us
about your site.

Drop your URL and we’ll give you an honest read — no pitch, no obligation. Prefer to talk live? Book a call →

// 30 min · intro, founder-to-founder

Book a call