To unindex pages from search engines, you serve a crawlable noindex signal (meta tag or X-Robots-Tag header), or you delete the URL and return a 410/404 — then you wait for a recrawl. The single most common mistake is blocking a URL in robots.txt and expecting it to drop: that prevents crawling, which means Google never sees the noindex and the URL can stay indexed indefinitely.
How to Unindex Pages From Search Engines
Unindexing is the deliberate removal of a URL from a search engine’s index using a crawlable noindex directive, an HTTP X-Robots-Tag header, or a 410/404 status, so the page no longer appears in organic results.
This is one of those tasks that looks like a one-liner and quietly burns a quarter of someone’s traffic when done wrong. We see it constantly on programmatic and faceted sites: thousands of low-value URLs blocked in robots.txt, all of them still sitting in the index with a “no information available” snippet. The fix is mechanical once you understand which signal does what.
Noindex vs robots.txt: the distinction that breaks everyone
These two are not interchangeable, and conflating them is the root cause of most failed deindexing.
noindextells search engines “you may crawl this, but do not index it.” It removes the URL from results. It requires the page to be crawlable.- robots.txt
Disallowtells crawlers “do not fetch this URL at all.” It controls crawling, not indexing. A disallowed URL can still be indexed (URL-only, no snippet) if other pages link to it.
The trap: if you Disallow a URL and put noindex on it, the crawler is blocked from ever fetching the page, so it never reads the noindex. The page stays indexed. To deindex with noindex, the URL must be open in robots.txt so the crawler can re-read it. Once the URL has dropped, you can optionally add a Disallow to save crawl budget — but only after removal is confirmed.
Rule of thumb: robots.txt is a crawl-savings tool, not a removal tool. If a URL is already indexed, robots.txt alone will not get it out — and may actively prevent the signal that would.
The methods, ranked by what they actually do
| Method | Stops crawling? | Removes from index? | Best for |
|---|---|---|---|
noindex meta tag | No | Yes (after recrawl) | HTML pages you keep but hide |
X-Robots-Tag header | No | Yes (after recrawl) | PDFs, images, non-HTML, bulk |
| 410 Gone | Drops on recrawl | Yes (fast) | Permanently deleted pages |
| 404 Not Found | Drops on recrawl | Yes (slower) | Deleted pages |
robots.txt Disallow | Yes | No (can stay URL-only) | Saving crawl budget post-removal |
| Canonical tag | No | Soft consolidation only | Duplicate/near-duplicate variants |
| Auth / password | Yes | Yes (no access) | Staging, private, members-only |
| GSC Removals tool | No | ~6-month temporary hide | Emergencies, while a real fix lands |
Meta noindex tag (the default for HTML)
Add this to the <head> of any page you want gone:
<meta name="robots" content="noindex, follow">
Use follow so link equity still flows through the page while it’s deindexing. The page must not be blocked in robots.txt, or Google can’t read this. After it drops, you can switch to noindex, nofollow or block it. This is the cleanest option when you’re keeping the page live for users but want it out of search.
X-Robots-Tag HTTP header (non-HTML and bulk)
For files where there’s no <head> to edit — PDFs, images, JSON, generated downloads — set the directive at the server level:
# nginx: noindex all PDFs
location ~* \.pdf$ {
add_header X-Robots-Tag "noindex, noarchive";
}
# Apache: noindex a single file
<Files "private-report.pdf">
Header set X-Robots-Tag "noindex"
</Files>
This also scales for deindexing entire patterns of URLs without touching templates. It’s the same directive family as the meta tag — see our deeper write-ups on the X-Robots-Tag and meta robots advanced settings for the full syntax.
Delete and return 410 (or 404)
If a page is genuinely gone, the honest signal is 410 Gone. Google treats 410 slightly more decisively than 404 and tends to drop those URLs faster. A 404 works too, just on a longer timeline. Either way, do not redirect a deleted URL to your homepage — that’s a soft 404 waiting to happen, and Google may ignore it. Read more on handling Not found (404) cleanly.
Canonical tags (consolidation, not removal)
A rel="canonical" is a hint, not a removal command. It’s the right tool for near-duplicate variants (print versions, tracking parameters, sort orders) where you want signals consolidated onto the primary URL — but it won’t reliably pull a page out of the index on its own. If a variant is genuinely thin or unwanted, pair canonicalization with noindex, or fix the issue at the source. This is also how Google’s “duplicate without user-selected canonical” status gets resolved.
Authentication and the GSC Removals tool
Password / HTTP auth is the only method that guarantees a crawler never sees content — use it for staging and private resources, and make sure those URLs aren’t sitting in a sitemap or linked publicly. The Google Search Console Removals tool hides a URL for roughly six months; treat it as an emergency tourniquet (leaked PII, an accidental publish) while the real fix — noindex or a 410 — propagates. It is not a permanent solution on its own.
The sequence we actually run
When we deindex pages at scale, the order matters more than any single tactic:
- If it’s sensitive or leaked, restrict access immediately (auth) and fire a GSC removal request to buy time.
- For pages you’re keeping, add a crawlable
noindex(meta orX-Robots-Tag) and confirm the URL is not blocked in robots.txt. - For pages you’re deleting, return 410 (or 404) at the server.
- Pull the URLs out of your XML sitemap and remove internal links pointing to them — a clean sitemap speeds re-evaluation. See XML sitemaps and SEO.
- Wait for a recrawl. Nothing happens until Googlebot revisits. You can check when Google last crawled a page to gauge timing.
- Only after confirmed removal, add a robots.txt
Disallowto stop wasting crawl budget on those paths. - Monitor GSC’s Pages report and server logs; if URLs reappear, the signal was misconfigured — recheck step 2.
Why deindexing matters in the AI-Overviews era
Index hygiene is no longer just about rankings. Pages eligible for organic results are also eligible to be surfaced — and quoted — inside AI Overviews and other generative answers. A bloated index full of thin, duplicate, or stale URLs gives both classic ranking and AI synthesis more low-quality surface area to misrepresent your brand. Trimming junk URLs is now a quality-control move for what the model can say about you, not only what ranks. On large programmatic sites this is exactly the kind of cleanup we bake into Core pSEO builds and ongoing AI SEO services.
Privacy-era reality: if a URL exposed personal data, deindexing from your own console is not enough — you may also need a legal/GDPR delisting request to the search engine, separate from technical removal. Technical
noindexremoves it from your indexed footprint; it doesn’t compel third parties who copied the content.
Common ways this goes wrong
- Blocking in robots.txt to deindex. The number-one error. The crawler can’t read your
noindex, so the URL lingers. Open it, let it drop, then block. - Expecting instant results. Deindexing is recrawl-gated. Low-priority URLs can take weeks. Patience or a removal request — not panic.
- Redirecting deleted pages to the homepage. Reads as a soft 404 and may keep the old URL alive.
- Leaving deindexed URLs in the sitemap. You’re telling Google “index this” and “don’t index this” at the same time. Pick one.
- Trusting canonical to force removal. It’s a hint. For unwanted thin pages, use
noindexand tighten the underlying thin content.
Frequently Asked Questions
Should I use noindex or robots.txt to remove a page from Google?
Use noindex to remove an already-indexed page — it must stay crawlable so Google can read the directive. Use robots.txt Disallow only to prevent crawling and save crawl budget, never as a removal tool. Blocking in robots.txt stops Google from seeing your noindex, so the URL can stay indexed.
How long does it take to unindex a page?
It depends on recrawl frequency, not a fixed clock. High-priority URLs may drop within days; low-value or rarely-crawled pages can take several weeks. Nothing happens until Googlebot revisits the page and reads the new signal. To force urgency, submit a temporary removal in Google Search Console while the permanent noindex or 410 propagates.
Does robots.txt prevent a page from appearing in Google?
Not reliably. robots.txt blocks crawling, not indexing. A disallowed URL can still appear as a bare, snippet-less listing if other pages link to it, because Google indexes the URL without fetching its content. To keep a page out of results entirely, use a crawlable noindex directive or password-protect it — robots.txt alone is not enough.
What’s the difference between 404 and 410 for unindexing?
Both signal a removed page and both eventually deindex the URL. 410 Gone declares the removal permanent and tends to be processed faster by Google; 404 Not Found is more ambiguous and clears on a slower timeline. For pages you’ve deliberately deleted forever, return 410. Either is far better than redirecting to an unrelated page.
How do I noindex pages in WordPress?
Use an SEO plugin (Yoast, Rank Math, or AIOSEO) to toggle “noindex” per post, per page, or per content type — it injects the crawlable meta robots tag for you. Confirm the URL isn’t also disallowed in robots.txt. Our noindex URLs in WordPress guide covers the per-template and bulk pitfalls in detail.