Crawl budget is the number of URLs a search engine will fetch from your site in a given window — a product of how fast your server lets bots crawl and how much the engine actually wants to. For most small sites it’s a non-issue. On large, templated, or programmatic sites it’s the difference between your money pages getting recrawled weekly and rotting in “Discovered – currently not indexed” for months.
Crawl Budget
Crawl budget is the total number of URLs a search engine crawler will request from a site in a given period, set by the crawl rate limit (what the server can take) and crawl demand (what the engine wants), and shaped by site size, page quality, internal linking, and crawl directives.
What Crawl Budget Actually Is
Google describes crawl budget as two forces multiplied together. Neither one is a dial you set directly — both are inferred from your site’s behavior.
- Crawl rate limit (crawl capacity). The maximum fetch rate Googlebot will use without hammering your server. Fast, error-free responses raise it; slow responses, timeouts, and 5xx errors drop it fast. Google would rather under-crawl than take your site down.
- Crawl demand. How badly the engine wants your URLs. Driven by popularity (links, traffic), perceived quality, and staleness. A URL nobody links to and nobody updates earns very little demand, so it gets crawled rarely — sometimes once and never again.
Crawl budget is not a ranking factor. You don’t rank higher because Google crawls more. You rank because pages get discovered, indexed, and refreshed — and crawl budget is the throughput limit on all three.
The trap is treating crawl budget as a thing you “have.” It’s an emergent property. You don’t top it up; you stop wasting it and you earn more of it by being fast and worth crawling. If you want the mechanics under the hood, see how web crawlers work and our breakdown of Googlebot.
Who Actually Needs to Care
Be honest about whether this applies to you before you spend a week on it.
| Site profile | Crawl budget a priority? | Why |
|---|---|---|
| Brochure / local site (< 1,000 URLs) | No | Google crawls it in full, easily |
| Mid-size blog / SaaS (1k–10k URLs) | Sometimes | Only if speed or duplication is bad |
| Large e-commerce / faceted catalog | Yes | Parameter URLs explode the URL count |
| Programmatic / pSEO (10k–1M+ URLs) | Critical | Discovery cadence decides what indexes |
| News / frequently updated | Yes | Freshness depends on recrawl speed |
Google’s own guidance: if your site has fewer than a few thousand URLs, it’ll generally be crawled efficiently without intervention. The crawl-budget conversation is for large, fast-changing, or heavily templated sites — exactly the kind of programmatic SEO work where one bad URL pattern multiplies into a million crawlable dead ends.
What Crawl Budget Is Spent On
Bots spend your budget on every URL they can reach, not just the ones you care about. Every wasted fetch is a money page that didn’t get recrawled. The usual culprits:
- Duplicate URLs — parameter variants (
?sort=,?ref=), session IDs, trailing-slash and case inconsistencies, HTTP/HTTPS and www splits. - Faceted navigation — e-commerce filter combinations that explode into combinatorial URL counts.
- Thin pages — tag archives, empty categories, autogenerated thin content.
- Redirect chains — every hop in a redirect chain is a separate fetch.
- Soft 404s and error pages, plus orphaned infinite spaces like calendars and internal search results.
When thousands of URLs sit stuck in Discovered – currently not indexed, crawl-budget waste is usually the suspect: Google found them, decided they weren’t worth the crawl, and parked them.
How to Optimize Your Crawl Budget
This is the part everyone wants. The order matters — fix discovery and waste before you touch crawl-rate throttling.
1. Cut the waste first
Most “crawl budget problems” are duplication problems wearing a costume.
- Consolidate duplicates with
rel="canonical"and pick one URL format (one casing, one trailing-slash rule, one protocol/host). - Block genuinely useless paths — internal APIs, tracking endpoints, infinite filter spaces — in
robots.txt. Remember:robots.txtblocks crawling, not indexing. To keep a page out of the index, let it be crawled and servenoindexinstead. See the meta robots settings and the X-Robots-Tag header for non-HTML files. - Prune or consolidate thin pages; 301 the dead ones into stronger equivalents.
2. Control faceted navigation and parameters
On catalogs this is usually the whole game. Disallow low-value filter combinations, canonicalize the rest to the clean category URL, and avoid linking to parameter URLs in your internal navigation in the first place. Don’t let crawlers find a URL you never wanted indexed.
3. Tighten architecture and internal linking
Crawlers follow links. Flat, well-linked sites get crawled efficiently; deep, orphaned ones don’t.
- Keep important pages within 2–3 clicks of the homepage.
- Use contextual internal links and a clean silo structure to funnel crawl equity toward priority URLs. Our site structure guide goes deeper.
- Hunt down and link orphaned content — a page with zero internal links is nearly invisible to crawlers.
4. Keep sitemaps honest
Your XML sitemap is a priority signal, not a dumping ground. List only canonical, indexable, 200-status URLs. Strip out redirected, noindexed, and non-canonical entries, and keep lastmod accurate so freshness signals mean something.
5. Make the server fast
Crawl rate limit is directly tied to response time. Reduce TTFB, enable caching, serve over HTTP/2 or HTTP/3, and put a CDN in front of static assets. Faster responses literally let Googlebot fetch more URLs per session — and protect crawlability when traffic spikes.
6. Fix errors and redirect chains
Monitor for 4xx/5xx in Search Console, repair or 301 broken pages, and collapse multi-hop redirects to a single hop. A chain of three redirects costs three fetches to deliver one page.
7. Throttle bots only as a last resort
Crawl-delay and rate caps are blunt instruments. If your server is overloaded, fix the server — don’t tell Google to come less often, because “less often” can mean fresh content sits undiscovered.
How to Measure It
You can’t optimize what you don’t watch. The signal lives in two places:
- Crawl Stats report (Search Console) — total crawl requests, average response time, and the breakdown by response code and file type. Rising response time or 5xx share means your crawl capacity is shrinking.
- Server log files — the ground truth. Logs show exactly which URLs bots hit, how often, and which they ignore. This is where you catch budget burned on parameter junk that GSC summaries hide.
Validate spot-checks with our guide on how to check when Google last crawled a page. Track indexation rate, response time, and recrawl latency for priority URLs — not vanity crawl totals.
No dashboard theater. “Pages crawled went up 40%” means nothing if the extra crawls hit junk. The only metric that matters is whether your money pages get discovered and refreshed faster.
For large templated sites where this is existential, our programmatic SEO service and AI SEO services treat crawl efficiency as a build-time constraint, not a cleanup chore.
Frequently Asked Questions
What is crawl budget in SEO?
Crawl budget is the number of URLs a search engine will fetch from your site in a given period. It’s set by your crawl rate limit (how fast your server lets bots crawl) and crawl demand (how much the engine wants your pages). It governs how quickly content gets discovered, indexed, and refreshed.
Is crawl budget a ranking factor?
No. Crawl budget doesn’t directly affect rankings — Google has confirmed this. It affects discovery and freshness: how fast new pages get indexed and how often existing ones get recrawled. Waste it on junk URLs and your important pages may be crawled rarely, indirectly costing you visibility on what actually ranks.
Does my small website need to worry about crawl budget?
Almost certainly not. Google says sites under a few thousand URLs are generally crawled efficiently without any intervention. Crawl budget becomes a real concern only on large e-commerce catalogs, news sites, and programmatic sites with tens of thousands of URLs or heavy faceted navigation and parameter duplication.
How do I check my crawl budget in Google Search Console?
Open the Crawl Stats report under Settings. It shows total crawl requests over time, average server response time, and a breakdown by response code, file type, and Googlebot type. Rising response times or a growing share of 5xx errors signal your crawl capacity is dropping. Server logs give the full URL-level picture.
Does robots.txt save crawl budget?
Partly. Disallowing useless paths in robots.txt stops bots fetching them, which conserves budget. But it doesn’t remove already-indexed URLs — and blocked URLs can still appear in search without a snippet. To keep a page out of the index, allow crawling and serve a noindex directive instead of blocking it.