THE SAVAGE LAB  // field study

Is Keyword Difficulty Accurate? We Audited 30 SERPs

A psychometric validity test of Keyword Difficulty: across 30 SERPs and 241 ranking pages, KD predicts page-level links well (ρ=0.91) but barely tracks domain authority (ρ=0.36) — and every 'easy' keyword was still wall-to-wall DR 72–94.

By Dr Blaze ·

Sample
30 keywords · 241 ranking pages
Dataset
Ahrefs KD + top-10 SERP link profiles (DR, referring domains) across 30 keywords / 241 ranking pages
Snapshot
2026-06-27

Keyword Difficulty (KD) is the number most SEOs sort by before they decide what to write. So we treated it like any other measurement instrument and asked the only question that matters about a metric: is it valid — does it measure what people use it to decide? We pulled Ahrefs KD for 30 keywords across nine niches, then pulled the real backlink profiles of every page ranking in their top 10 (241 ranking pages in total), and tested KD against the ground truth.

The short answer: KD is a valid measure of one thing — and most people read it as a different thing. It predicts the page-level link counts of ranking results almost perfectly (Spearman ρ = 0.91). But it barely tracks the domain authority of who actually ranks (ρ = 0.36) — and that gap is exactly where it misleads a new site.

This is a companion to measuring SEO like a psychometrician: the discipline of asking what a score actually measures before you trust it.

Why “is KD accurate?” is the wrong question

“Accurate” assumes we agree on what KD is trying to estimate. We don’t. In measurement terms, a score has construct validity when it measures the construct you claim it does. KD is marketed as “how hard it is to rank for this keyword” — but Ahrefs computes it primarily from the number of referring domains pointing at the pages currently ranking. Those are two different constructs:

  • The construct KD computes: how many backlinks the winning pages have.
  • The construct you use it for: whether your site can realistically rank.

When a metric is named after the second thing but measures the first, you get Goodhart’s law with a UI: people optimize the number, not the reality behind it. So we didn’t ask “is KD accurate?” We asked “what does KD actually predict, and what does it miss?”

Method (reproducible)

  • Sample: 30 keywords spanning SEO/marketing, finance, health, software, home/DIY, travel, ecommerce, B2B, and general how-to — deliberately chosen across the difficulty range, not cherry-picked. Full list in the table below.
  • Ground truth: for each keyword we pulled the top 10 organic results and their Ahrefs Domain Rating (DR) and referring domains (page-level). That’s 241 ranking pages with link data.
  • Tools & scope: Ahrefs, US, snapshot pulled 2026-06-27. This audits Ahrefs’ KD specifically; a cross-tool reliability check (e.g. vs Semrush) is the obvious next study.
  • Stats: Spearman ρ (rank correlation, robust to the heavy skew of link counts) and Pearson r. Per keyword we used the median DR and median referring domains of the top 10.

Limitations are real and stated up front: n = 30 is a sample, not a census; DR and referring-domain counts are themselves Ahrefs estimates; a SERP snapshot is a moment in time; and correlation isn’t causation. None of that changes the central, robust pattern.

Finding 1 — KD is not random. Give it its due.

Against page-level referring domains, KD does its job: Spearman ρ = 0.91 (Pearson on log-scaled refdomains also 0.91). Sort 30 keywords by KD and you’ve very nearly sorted them by how many backlinks the ranking pages carry. Anyone telling you KD is “meaningless” is overcorrecting. As a relative sorter of page-level link competition, it works.

Finding 2 — But it barely predicts who ranks

Now test KD against the Domain Rating of the pages in the top 10 — the thing that actually decides whether a brand-new site is in the conversation. The correlation collapses: Spearman ρ = 0.36 (Pearson r = 0.50).

Here’s the stat that should change how you read KD: for 29 of the 30 keywords, the median DR of the top 10 sat between 72 and 94 — regardless of KD. A KD-12 SERP and a KD-90 SERP were both, typically, walls of DR-85 domains. KD moved; the domain authority you’re up against did not.

Finding 3 — The “easy” keywords that aren’t

This is where the construct mismatch bites. Every keyword we sampled with KD ≤ 20 — the ones a tool flags green, “go for it” — was still dominated by high-authority domains:

KeywordAhrefs KDMedian DR of top 10Median referring domains (page-level)Weakest page (min DR)
things to do in lisbon11722
fractional cfo6781443
how to unclog a drain10721540
best protein powder12941885
best mattress16822155
air fryer recipes17733430

“Best protein powder” scores KD 12 — easy! — because the ranking pages have few backlinks each (median 18 referring domains). But the domains ranking have a median DR of 94, and the weakest page in the entire top 10 is DR 85. There is no soft underbelly. KD told you “easy”; the SERP is a fortress.

The one genuine exception proves the rule: “things to do in lisbon” (KD 1) really was low-authority — median DR 17, weakest page DR 2. It was also the single lowest KD in the set. Everywhere from KD 6 upward, “low difficulty” meant “the winning pages don’t have many page-level links,” not “a new domain can rank here.”

Finding 4 — Same KD, wildly different SERPs (low precision)

A reliable instrument gives similar readings for similar things. KD’s spread within a band is enormous:

Ahrefs KD bandKeywordsMedian referring domains across those SERPs
0–1962 – 34
40–591042 – 253
60–79522 – 775
80–1008655 – 10,236

In the 60–79 band, the real link competition ranged 35×. “Magnesium benefits” (KD 63) needed a median of just 22 referring domains; “marketing automation” (KD 71) needed 375. Two numbers that look adjacent in your tool describe completely different fights.

Finding 5 — Weak pages almost never break in

If low KD meant “winnable,” you’d expect weak pages scattered through easy SERPs. You don’t see it. Across all 30 keywords, only 6 of 30 SERPs contained even a single top-10 page under DR 40; only 4 of 30 under DR 30. For a DR-10 site, “find a low-KD keyword” is not a strategy — the low-KD SERP is usually still locked.

What KD actually measures vs. what you think it measures

What KD measuresWhat you use it for
ConstructPage-level referring domains of ranking pages”Can my site rank for this?”
ValidityHigh (ρ = 0.91) for that constructLow (ρ = 0.36) for domain-level difficulty
Failure modeCalls authority-walled SERPs “easy”
Who it hurtsNew / low-DR sites most of all

KD isn’t broken. It’s a valid measure of the wrong variable for the decision most people make with it. It answers “how link-heavy are the winning pages?” and gets read as “is this winnable for me?” For a DR-90 brand those questions nearly converge. For a new site they’re opposite — which is why KD systematically understates difficulty exactly for the sites that can least afford the mistake.

What to do instead

Use KD as a coarse relative sort, then read the SERP it’s summarizing:

  1. Look at the DR of who actually ranks, not just KD. If the median is 80+, links are the real gate, whatever KD says.
  2. Find the weakest page in the top 10. If nothing is under ~DR 40, a new site isn’t getting in on content alone — that’s the link-building reality, not a content problem.
  3. For low-authority sites, hunt for genuine gaps — SERPs with low-DR pages present — which usually live in the true long tail, not in “low KD” head terms. This is the wedge behind programmatic SEO done right.
  4. If the SERP is authority-walled, the lever is being citable, not just ranking — see generative engine optimization and AI Overviews, where brand signals can matter more than page-level links.

A metric is a tool, not a verdict. Read what it measures, not the color of the cell.

Frequently Asked Questions

Is Keyword Difficulty accurate?

It depends what you mean by accurate. In our 30-SERP test, Ahrefs KD predicted the page-level referring-domain counts of ranking results very well (Spearman ρ = 0.91), so as a relative measure of page-level link competition it’s accurate. But it predicted the domain authority of who actually ranks poorly (ρ = 0.36), so as an answer to “can my site rank for this?” it’s frequently misleading — especially for new or low-authority sites.

Does a low Keyword Difficulty score mean a keyword is easy to rank for?

Not reliably. Every keyword we sampled with KD ≤ 20 (except one) was still dominated by DR 72–94 domains, with no weak pages to displace. Low KD often just means the winning pages have few backlinks each — the domains ranking can still be far out of reach for a new site. Always check the actual Domain Ratings in the top 10 before trusting a low KD.

What should I use instead of Keyword Difficulty?

Don’t discard KD — pair it with the SERP. After sorting by KD, look at the median Domain Rating of the top 10 and the weakest page’s DR. If nothing in the top 10 is below roughly DR 40, the keyword is link-gated regardless of its KD score. For low-authority sites, prioritize SERPs that actually contain weak pages, and lean on citability (AI/answer-engine visibility) where page-level links dominate.

How is Keyword Difficulty calculated?

Methodologies are proprietary and differ by tool, but Ahrefs’ KD is derived primarily from the number of referring domains pointing to the pages currently ranking in the top 10. That’s why it tracks page-level link counts so closely and domain authority so loosely — it’s largely a function of the former by construction.


Method note: Ahrefs data, US, top 10 organic results, pulled 2026-06-27. n = 30 keywords / 241 ranking pages. Per-keyword figures are medians of the top 10. KD here refers to Ahrefs’ Keyword Difficulty. A cross-tool reliability comparison is planned. If you want this kind of measurement rigor applied to your own SEO, talk to us.

← Back to the Lab

// want this rigor on your own numbers?

We measure SEO
like it's an instrument.

Book a call →

// or send a message

Tell us
about your site.

Drop your URL and we’ll give you an honest read — no pitch, no obligation. Prefer to talk live? Book a call →

// 30 min · intro, founder-to-founder

Book a call