Keyword Difficulty (KD) is the number most SEOs sort by before they decide what to write. So we treated it like any other measurement instrument and asked the only question that matters about a metric: is it valid — does it measure what people use it to decide? We pulled Ahrefs KD for 30 keywords across nine niches, then pulled the real backlink profiles of every page ranking in their top 10 (241 ranking pages in total), and tested KD against the ground truth.
The short answer: KD is a valid measure of one thing — and most people read it as a different thing. It predicts the page-level link counts of ranking results almost perfectly (Spearman ρ = 0.91). But it barely tracks the domain authority of who actually ranks (ρ = 0.36) — and that gap is exactly where it misleads a new site.
This is a companion to measuring SEO like a psychometrician: the discipline of asking what a score actually measures before you trust it.
Why “is KD accurate?” is the wrong question
“Accurate” assumes we agree on what KD is trying to estimate. We don’t. In measurement terms, a score has construct validity when it measures the construct you claim it does. KD is marketed as “how hard it is to rank for this keyword” — but Ahrefs computes it primarily from the number of referring domains pointing at the pages currently ranking. Those are two different constructs:
- The construct KD computes: how many backlinks the winning pages have.
- The construct you use it for: whether your site can realistically rank.
When a metric is named after the second thing but measures the first, you get Goodhart’s law with a UI: people optimize the number, not the reality behind it. So we didn’t ask “is KD accurate?” We asked “what does KD actually predict, and what does it miss?”
Method (reproducible)
- Sample: 30 keywords spanning SEO/marketing, finance, health, software, home/DIY, travel, ecommerce, B2B, and general how-to — deliberately chosen across the difficulty range, not cherry-picked. Full list in the table below.
- Ground truth: for each keyword we pulled the top 10 organic results and their Ahrefs Domain Rating (DR) and referring domains (page-level). That’s 241 ranking pages with link data.
- Tools & scope: Ahrefs, US, snapshot pulled 2026-06-27. This audits Ahrefs’ KD specifically; a cross-tool reliability check (e.g. vs Semrush) is the obvious next study.
- Stats: Spearman ρ (rank correlation, robust to the heavy skew of link counts) and Pearson r. Per keyword we used the median DR and median referring domains of the top 10.
Limitations are real and stated up front: n = 30 is a sample, not a census; DR and referring-domain counts are themselves Ahrefs estimates; a SERP snapshot is a moment in time; and correlation isn’t causation. None of that changes the central, robust pattern.
Finding 1 — KD is not random. Give it its due.
Against page-level referring domains, KD does its job: Spearman ρ = 0.91 (Pearson on log-scaled refdomains also 0.91). Sort 30 keywords by KD and you’ve very nearly sorted them by how many backlinks the ranking pages carry. Anyone telling you KD is “meaningless” is overcorrecting. As a relative sorter of page-level link competition, it works.
Finding 2 — But it barely predicts who ranks
Now test KD against the Domain Rating of the pages in the top 10 — the thing that actually decides whether a brand-new site is in the conversation. The correlation collapses: Spearman ρ = 0.36 (Pearson r = 0.50).
Here’s the stat that should change how you read KD: for 29 of the 30 keywords, the median DR of the top 10 sat between 72 and 94 — regardless of KD. A KD-12 SERP and a KD-90 SERP were both, typically, walls of DR-85 domains. KD moved; the domain authority you’re up against did not.
Finding 3 — The “easy” keywords that aren’t
This is where the construct mismatch bites. Every keyword we sampled with KD ≤ 20 — the ones a tool flags green, “go for it” — was still dominated by high-authority domains:
| Keyword | Ahrefs KD | Median DR of top 10 | Median referring domains (page-level) | Weakest page (min DR) |
|---|---|---|---|---|
| things to do in lisbon | 1 | 17 | 2 | 2 |
| fractional cfo | 6 | 78 | 14 | 43 |
| how to unclog a drain | 10 | 72 | 15 | 40 |
| best protein powder | 12 | 94 | 18 | 85 |
| best mattress | 16 | 82 | 21 | 55 |
| air fryer recipes | 17 | 73 | 34 | 30 |
“Best protein powder” scores KD 12 — easy! — because the ranking pages have few backlinks each (median 18 referring domains). But the domains ranking have a median DR of 94, and the weakest page in the entire top 10 is DR 85. There is no soft underbelly. KD told you “easy”; the SERP is a fortress.
The one genuine exception proves the rule: “things to do in lisbon” (KD 1) really was low-authority — median DR 17, weakest page DR 2. It was also the single lowest KD in the set. Everywhere from KD 6 upward, “low difficulty” meant “the winning pages don’t have many page-level links,” not “a new domain can rank here.”
Finding 4 — Same KD, wildly different SERPs (low precision)
A reliable instrument gives similar readings for similar things. KD’s spread within a band is enormous:
| Ahrefs KD band | Keywords | Median referring domains across those SERPs |
|---|---|---|
| 0–19 | 6 | 2 – 34 |
| 40–59 | 10 | 42 – 253 |
| 60–79 | 5 | 22 – 775 |
| 80–100 | 8 | 655 – 10,236 |
In the 60–79 band, the real link competition ranged 35×. “Magnesium benefits” (KD 63) needed a median of just 22 referring domains; “marketing automation” (KD 71) needed 375. Two numbers that look adjacent in your tool describe completely different fights.
Finding 5 — Weak pages almost never break in
If low KD meant “winnable,” you’d expect weak pages scattered through easy SERPs. You don’t see it. Across all 30 keywords, only 6 of 30 SERPs contained even a single top-10 page under DR 40; only 4 of 30 under DR 30. For a DR-10 site, “find a low-KD keyword” is not a strategy — the low-KD SERP is usually still locked.
What KD actually measures vs. what you think it measures
| What KD measures | What you use it for | |
|---|---|---|
| Construct | Page-level referring domains of ranking pages | ”Can my site rank for this?” |
| Validity | High (ρ = 0.91) for that construct | Low (ρ = 0.36) for domain-level difficulty |
| Failure mode | — | Calls authority-walled SERPs “easy” |
| Who it hurts | — | New / low-DR sites most of all |
KD isn’t broken. It’s a valid measure of the wrong variable for the decision most people make with it. It answers “how link-heavy are the winning pages?” and gets read as “is this winnable for me?” For a DR-90 brand those questions nearly converge. For a new site they’re opposite — which is why KD systematically understates difficulty exactly for the sites that can least afford the mistake.
What to do instead
Use KD as a coarse relative sort, then read the SERP it’s summarizing:
- Look at the DR of who actually ranks, not just KD. If the median is 80+, links are the real gate, whatever KD says.
- Find the weakest page in the top 10. If nothing is under ~DR 40, a new site isn’t getting in on content alone — that’s the link-building reality, not a content problem.
- For low-authority sites, hunt for genuine gaps — SERPs with low-DR pages present — which usually live in the true long tail, not in “low KD” head terms. This is the wedge behind programmatic SEO done right.
- If the SERP is authority-walled, the lever is being citable, not just ranking — see generative engine optimization and AI Overviews, where brand signals can matter more than page-level links.
A metric is a tool, not a verdict. Read what it measures, not the color of the cell.
Frequently Asked Questions
Is Keyword Difficulty accurate?
It depends what you mean by accurate. In our 30-SERP test, Ahrefs KD predicted the page-level referring-domain counts of ranking results very well (Spearman ρ = 0.91), so as a relative measure of page-level link competition it’s accurate. But it predicted the domain authority of who actually ranks poorly (ρ = 0.36), so as an answer to “can my site rank for this?” it’s frequently misleading — especially for new or low-authority sites.
Does a low Keyword Difficulty score mean a keyword is easy to rank for?
Not reliably. Every keyword we sampled with KD ≤ 20 (except one) was still dominated by DR 72–94 domains, with no weak pages to displace. Low KD often just means the winning pages have few backlinks each — the domains ranking can still be far out of reach for a new site. Always check the actual Domain Ratings in the top 10 before trusting a low KD.
What should I use instead of Keyword Difficulty?
Don’t discard KD — pair it with the SERP. After sorting by KD, look at the median Domain Rating of the top 10 and the weakest page’s DR. If nothing in the top 10 is below roughly DR 40, the keyword is link-gated regardless of its KD score. For low-authority sites, prioritize SERPs that actually contain weak pages, and lean on citability (AI/answer-engine visibility) where page-level links dominate.
How is Keyword Difficulty calculated?
Methodologies are proprietary and differ by tool, but Ahrefs’ KD is derived primarily from the number of referring domains pointing to the pages currently ranking in the top 10. That’s why it tracks page-level link counts so closely and domain authority so loosely — it’s largely a function of the former by construction.
Method note: Ahrefs data, US, top 10 organic results, pulled 2026-06-27. n = 30 keywords / 241 ranking pages. Per-keyword figures are medians of the top 10. KD here refers to Ahrefs’ Keyword Difficulty. A cross-tool reliability comparison is planned. If you want this kind of measurement rigor applied to your own SEO, talk to us.