Glossary

Understanding The X-Robots-Tag: How It Affects Your SEO And Website Ranking

The X-Robots-Tag is a powerful HTTP header that lets you control how search engines index and crawl non-HTML resources—like PDFs, images, and APIs—so you can prevent unwanted indexing, manage crawl budget, and protect sensitive content. Understanding how to implement and use directives such as noindex, nofollow, and noarchive via X-Robots-Tag helps you fine-tune SEO for all file types, improve site visibility, and ensure search engines treat non-HTML assets the way you intend.

X-Robots-Tag

X-Robots-Tag: an HTTP response header that instructs search engine robots how to index or crawl a resource (supports directives like noindex, nofollow, none, noarchive, nosnippet, unavailable_after, max-snippet, max-image-preview, max-video-preview), usable for HTML and non-HTML files and set per-response or per-file via server configuration.

What Is the X-Robots-Tag?

X-Robots-Tag is an HTTP response header that tells search engine crawlers how to treat a specific resource (indexing, following links, caching, snippets, previews). Unlike the meta robots tag, which only works inside HTML pages, X-Robots-Tag can be applied to any file served over HTTP—PDFs, images, videos, JSON, APIs, and binary files—because it’s delivered with the server response for that resource.



Common directives supported include:



  • noindex

  • nofollow

  • none (equivalent to noindex + nofollow)

  • noarchive

  • nosnippet

  • unavailable_after

  • max-snippet

  • max-image-preview

  • max-video-preview


Multiple directives can be comma-separated in a single header or split across multiple headers.



Where it’s set:



  • Server configuration (Apache .htaccess, Nginx config)

  • Application code that sends HTTP responses

  • CDN or edge rules



Example header forms:


X-Robots-Tag: noindex, nofollow
X-Robots-Tag: noindex
X-Robots-Tag: max-image-preview:large


Why use it:



  • Control indexing of non-HTML assets

  • Enforce consistent rules across file types

  • Protect sensitive or duplicate resources

  • Manage crawl budget for large sites or media-heavy pages



Interaction with other signals: when both a meta robots tag and an X-Robots-Tag apply, search engines typically follow the most specific instruction for the resource being crawled; X-Robots-Tag explicitly controls the HTTP-served resource itself.

How to Use the X-Robots-Tag

When and where to apply



  • Use X-Robots-Tag on non-HTML files (PDFs, images, videos, API responses) or when you need per-response control that HTML meta robots cannot provide.

  • Apply globally, by file type, by path, or per response depending on goals (block indexing, prevent snippets, control previews, set unavailable_after).



Common directives and combiners



  • noindex — prevent indexing

  • nofollow — prevent following links on the resource

  • none — equivalent to noindex, nofollow

  • noarchive — prevent cached snapshot

  • nosnippet — prevent text/video/image snippets

  • unavailable_after: — expire the resource from the index after the date

  • max-snippet:, max-image-preview:, max-video-preview:

  • Combine with commas: X-Robots-Tag: noindex, noarchive, nosnippet



Examples by server / platform



Apache (.htaccess or server config)



  • For a single file:


Header set X-Robots-Tag "noindex, nofollow" "/path/to/file.pdf"


  • For all PDFs:



Header set X-Robots-Tag "noindex, noarchive"



Nginx



  • For a specific location or file type:


location ~* .pdf$ {
add_header X-Robots-Tag "noindex, noarchive";
}


  • For a single file:


location = /files/secret.pdf {
add_header X-Robots-Tag "noindex, nofollow";
}


IIS (web.config)



  • Add a response header:











  • Use URL Rewrite or conditions to target file types/paths.



PHP / application-level



  • Send the header conditionally:


header('X-Robots-Tag: noindex, noarchive');


  • Use runtime logic to set per-response directives (user session, auth, query parameter).



AWS S3 / CloudFront



  • S3 metadata: set x-amz-meta-x-robots-tag or configure CloudFront to add the header:
    CloudFront behavior → Response headers policy → Add X-Robots-Tag: noindex, noarchive

  • For S3 static sites, set the header via the upload tool (s3 cp --metadata-directive).



Netlify



  • _headers file:


/files/*.pdf
X-Robots-Tag: noindex, noarchive


  • Or use Netlify Edge Functions to set headers dynamically.



Cloudflare



  • Use Transform Rules or Workers to add/modify X-Robots-Tag per path or content type.



CDN considerations



  • Ensure the CDN preserves or injects the header consistently; prefer edge rules for uniform behavior.

  • Avoid conflicting headers between origin and edge — the final response header wins.



Per-file vs per-response decisions



  • Per file (server config) for static assets and consistent behavior.

  • Per response (application) when control depends on user state, auth, or dynamic conditions.



Testing and verification



  • Use curl: curl -I — check the X-Robots-Tag value.

  • Use browser DevTools Network tab to inspect response headers.

  • Use Google Search Console URL Inspection (for HTML) and Coverage reports; use URL Inspection on canonical HTML referencing non-HTML to see how Google treated it.

  • Use third-party header checkers and log analysis to ensure crawlers respect directives.



Best practices



  • Prefer X-Robots-Tag for non-HTML assets; use meta robots for HTML.

  • Avoid accidental noindex: test after deploying header rules.

  • Use noindex for sensitive or duplicate non-HTML assets; use noarchive/nosnippet to protect content previews.

  • Combine with robots.txt and authentication where appropriate — robots.txt disallows crawling but not indexing if linked; X-Robots-Tag noindex is stronger for preventing indexing.

  • Document header rules and keep them discoverable for future site changes.



Examples quick reference



  • Block indexing of PDFs sitewide: X-Robots-Tag: noindex on *.pdf

  • Prevent cached copies and snippets for a video: X-Robots-Tag: noarchive, nosnippet

  • Expire an old asset after a date: X-Robots-Tag: unavailable_after: Sun, 31 Dec 2025 23:59:59 GMT

  • Limit snippet size: X-Robots-Tag: max-snippet:50



Rollback and monitoring



  • Remove or change headers carefully; monitor index status and search traffic after changes.

  • Track server logs and Search Console to confirm crawler behavior and index changes.

Understanding The X-Robots-Tag: How It Affects Your SEO And Website Ranking

The X-Robots-Tag is a powerful HTTP header that lets you control how search engines index and crawl non-HTML resources—like PDFs, images, and APIs—so you can prevent unwanted indexing, manage crawl budget, and protect sensitive content. Understanding how to implement and use directives such as noindex, nofollow, and noarchive via X-Robots-Tag helps you fine-tune SEO for all file types, improve site visibility, and ensure search engines treat non-HTML assets the way you intend.

When Should You Use the X-Robots-Tag?



  1. Use server-level control over indexing and crawling for non-HTML files.

    • Apply to PDFs, images, videos, XML, JSON, ZIP, and other assets that cannot include meta robots tags.

    • Example header: X-Robots-Tag: noindex.




  2. Block indexing of dynamically generated or programmatically served content.

    • Use on pages created by back-end scripts or APIs where editing HTML is not feasible.

    • Useful for temporary or environment-specific pages (staging, testing, internal tools).




  3. Control search engine behavior for entire file types or directories.

    • Set responses via server configuration (Apache, Nginx) to cover many files without altering each file.

    • Example: Block all PDFs in /private/ with a server rule.




  4. Prevent indexing while still allowing crawling or link equity handling.

    • Combine directives as needed: noindex, follow, noarchive, nosnippet, unavailable_after.

    • Example: X-Robots-Tag: noindex, noarchive.




  5. Apply precise, conditional control with HTTP headers only.

    • Use on 200 responses for resources you do not want indexed; ensure directives are present on final responses, not only redirects.

    • For 404/410 responses, consider site goals—search engines typically drop 404/410 content naturally; the header can reinforce this.




  6. Control indexing across different user agents.

    • Use user-agent-specific directives if different crawlers need different rules.

    • Example: X-Robots-Tag: googlebot: noindex.




  7. Implement temporary measures or phased rollouts.

    • Apply the header for limited-time suppression (e.g., product launch delays, legal takedowns).

    • Remove the header when ready and verify via indexing tools.




  8. When not to use the X-Robots-Tag (use alternatives).

    • Do not use it as the primary method for HTML pages when you can edit meta robots tags directly; meta tags are clearer and easier for editors.

    • Do not rely on it for blocking access—use robots.txt or authentication for crawl blocking; the header controls indexing, not access.

    • Avoid using only noindex on pages that should still pass link equity via internal links—noindex may stop the page from passing signals reliably.




  9. Best practices.

    • Test headers using curl or header checkers and verify in Google Search Console (URL Inspection) and Bing Webmaster Tools.

    • Ensure the header appears on the final HTTP response, not only on intermediate redirects.

    • Use server configuration management (templates, environment variables) to avoid accidental site-wide noindex.

    • Document and audit header rules regularly to prevent unintended SEO impacts.