// blog

Measuring SEO Like a Psychometrician

Most SEO measurement is methodologically naive. A psychometrician's guide to how to measure SEO: construct vs proxy, measurement error, validity, and Goodhart.

By Dr Blaze · 

I’m not an SEO guy. I’m a psychometrician — I spent years building and validating instruments that try to measure things you can’t see directly, like cognitive ability or personality, and proving they measure what they claim to. So when I started looking at how marketers measure SEO, I had the same reaction a structural engineer has watching someone build a deck with drywall screws. The thing might stand up. It’s also held together by a misunderstanding of what the parts are for.

Here’s the blunt version: most SEO measurement is methodologically naive. Not stupid — naive. People track rankings, traffic, impressions, and a dashboard full of numbers, and they treat those numbers as if they were the thing they care about, measured cleanly. They’re not. Every one of those numbers is a noisy proxy for something you can’t observe directly, carrying more error than anyone admits, often measuring something other than what its name implies. A psychometrician doesn’t see a “traffic chart.” A psychometrician sees a measurement model — and immediately starts asking the questions that SEO people skip.

This post is for two kinds of reader. If you’re a founder or operator drowning in SEO dashboards and quietly suspecting that none of it connects to revenue, you’re right, and I’ll tell you why. If you’re an SEO who wants to measure your own work more rigorously than your competitors, this is the toolkit they don’t have. Either way: the discipline that makes measurement trustworthy is a hundred years old, it’s called classical test theory, and almost nobody in SEO has heard of it.

Your Metrics Are Proxies, Not the Thing You Care About

Start with the most important distinction in measurement, the one everything else hangs off: the difference between a construct and an indicator.

A construct is the thing you actually care about but can’t measure directly. In psychometrics it’s “verbal reasoning” or “conscientiousness” — real, consequential, invisible. In SEO, the construct is almost always qualified demand turning into revenue. That’s what the business wants. Nobody has ever wanted “a ranking.” They want customers, and rankings are a story they tell themselves about where customers come from.

An indicator is something observable you use to estimate the construct, because you can’t touch the construct itself. Rankings, organic traffic, impressions, click-through rate — these are indicators. Proxies. They correlate with qualified demand, sometimes strongly, but they are not it, and the gap between them is where almost every measurement mistake in this industry lives.

This matters because of a rule that’s easy to say and hard to internalize: optimizing the proxy is not the same as moving the construct. You can triple your traffic with a viral listicle that attracts nobody who’ll ever buy. You can climb three positions on a keyword that no customer types. The proxy moved; the construct didn’t budge. If you’re measuring the proxy and reporting it as success, you’ve built a beautiful instrument that’s pointed at the wrong thing. A page’s organic CTR can look excellent precisely because it’s pulling clicks from people who bounce in four seconds — high on the indicator, zero on the construct.

The fix isn’t to stop tracking indicators. You have to — the construct is invisible, indicators are all you get. The fix is to never forget they’re indicators. Hold them loosely. Treat every chart as an estimate of something else, not the thing itself.

Every SEO Metric Carries Measurement Error — and You’re Treating Noise as Signal

Here’s the equation that should be tattooed on every SEO dashboard. Classical test theory says any observed score is the sum of two parts:

Observed = True score + Error

Or, as it’s written in every measurement textbook, X = T + E. The number you see (X) is the real value you want (T) plus measurement error (E) — random noise from the imperfect instrument. You never observe T directly. You observe X and you infer T, and the quality of that inference depends entirely on how much E is in the mix. Reliability, formally, is the proportion of your observed variance that’s actually true-score variance rather than error. A reliable instrument has little E. An unreliable one is mostly noise wearing a number’s clothing.

Now look at your SEO metrics through that lens, because every single one is an X, not a T:

  • Rank trackers. Your tracked position swings day to day — personalization, location, device, datacenter, SERP feature shuffles, A/B tests Google is running on its own results. A keyword that “dropped from 4 to 7 overnight” mostly didn’t. That’s E. Daily rank wobble is overwhelmingly measurement error, and reacting to it is reacting to noise.
  • Google Search Console. GSC is not ground truth; it’s a filtered, privacy-protected sample. Google omits “anonymized” queries — those issued by too few people — so the visible query rows routinely don’t sum to the chart total. Ahrefs analyzed 22 billion clicks across nearly 900,000 properties and found 46.77% of clicks were anonymized in April 2025. Nearly half your query-level data is structurally invisible. That’s not a glitch; it’s the instrument’s design. Every conclusion you draw from the query table is drawn from the half you can see.
  • Attribution. Every attribution window is a modeling assumption dressed as a fact. Change the lookback from 7 days to 30 and “SEO’s contribution” changes — same reality, different instrument reading. The number didn’t measure truth more or less; you swapped the ruler.

None of this means the metrics are useless. It means they’re estimates with error bars, and almost everyone reports them as point values with no error at all. The single most common analytical sin in SEO is reacting to a movement that’s smaller than the measurement error of the instrument that produced it. You wouldn’t trust a bathroom scale that reads ±5 kg to tell you that you gained 200 grams. SEO people do the equivalent every morning before coffee.

Validity: Are You Measuring What You Think You’re Measuring?

Reliability asks whether your instrument is consistent. Validity asks the harder question: does it measure what you claim it measures? An instrument can be perfectly reliable and completely invalid — a scale that consistently reads 4 kg heavy is reliable and wrong. SEO is full of reliable, invalid metrics — when I audited Keyword Difficulty across 30 SERPs, that’s exactly what I found: it reliably tracks the page-level links of ranking results, yet is largely invalid as the “can I rank for this?” signal people read it as.

Take “organic traffic.” It’s a real, repeatable count. It’s also frequently invalid as a measure of qualified demand, because the people arriving may not be your ICP at all — informational searchers who’ll never buy, scrapers, off-target queries you happen to rank for. The metric is honest about what it counts; it’s just not counting what you think.

The sharpest example is GSC’s average position, the metric people read most carelessly. It’s a single number averaged across every query, location, device, and personalization context your page appeared in — and it blends branded searches (where you rank #1) with the unbranded terms you actually compete for. Per Google’s own documentation, it’s the average of the topmost position your page held across all those impressions. So your “average position 6” is a fiction stitched from #1 brand queries and #20 commercial ones. As a construct — “how visible am I for terms that matter” — it has almost no validity in aggregate. It only becomes valid when you decompose it to a single query and a single context.

This is why I’m wary of any SEO who reports an aggregate number without telling me what it’s an aggregate of. Validity isn’t a property of the metric — it’s a property of the metric plus the claim you’re attaching to it. “Traffic is up 40%” is meaningless until you say what traffic is standing in for. If it’s standing in for revenue, prove the link. If it can’t, stop reporting it as if it could. This is also why I lean on search engine visibility — share of the SERP real estate that matters — over raw rank: it’s a closer, more valid operationalization of the construct than a position number divorced from whether anyone searches the term.

Goodhart’s Law: When the Metric Becomes the Target

There’s a failure mode so reliable it has a name. In 1997 the anthropologist Marilyn Strathern compressed an idea from the economist Charles Goodhart into one sentence that should be printed on every KPI deck: “When a measure becomes a target, it ceases to be a good measure.”

The mechanism is simple. A metric works as a proxy because, under normal conditions, it correlates with the construct. The moment you make the metric a target — reward it, optimize hard for it, pay a bonus on it — people (and algorithms, and content teams) start moving the metric directly instead of moving the construct underneath it. The correlation that made it useful breaks, because the behavior that produced the correlation gets replaced by behavior that produces the number.

SEO is a museum of Goodhart’s Law:

  • Make “keyword density” a target and you get keyword-stuffed pages that rank for nothing and read like a ransom note.
  • Make “number of backlinks” a target and you get link farms and exchanges — backlinks that move the count and not the authority the count was supposed to estimate.
  • Make “word count” a target because long content “ranks better” and you get 2,000 words of padding around 300 words of substance.
  • Make “rankings” the target and someone games rankings — for queries no customer searches.

In measurement terms this is criterion contamination: the act of optimizing for the indicator corrupts the indicator’s relationship to the construct. The metric stops being a thermometer and becomes a thing you’re heating directly. And the tell is always the same — the number goes up while the business goes nowhere. If your traffic graph is climbing and your pipeline is flat, you may not have an attribution problem. You may have a Goodhart problem: you turned a proxy into a target and it dutifully stopped measuring anything.

You Can’t Cleanly A/B Test SEO — Causal Inference Without a Control

Here’s the part that genuinely separates SEO measurement from the lab work I’m used to. In a proper experiment you have a control group: two groups identical except for the one thing you changed, so any difference in outcome is caused by your change. That’s how you earn the word “caused.”

SEO has no clean control, and the reason is structural: there is only one Google index, and it’s a single shared environment changing under everyone at once. You change a title tag, rankings move, and you want to say your change caused it. But in the same window Google may have shipped a core update, a competitor may have published, seasonality may have shifted, and the SERP layout may have changed. You have a hundred confounds and one observation. Correlation between your change and the ranking move is not evidence the change caused it — it might be the algorithm update, and you have no way, from a single before/after, to tell them apart. Most “this tactic works, look at my graph” SEO advice is exactly this mistake: a confounded before/after sold as a causal claim. Real causal research demands you rule out the alternatives, and a single timeline doesn’t.

You can’t run a true experiment, but you’re not helpless — you just have to drop to quasi-experimental designs and be honest that they’re weaker:

  • Page-cohort tests. Apply a change to a matched set of similar pages, leave a comparable set untouched as a holdout, and compare the cohorts. Not a true control, but far better than a single-page before/after. (This is the closest SEO gets to honest A/B testing, and it’s why naive page-level A/B tests of SEO changes usually mislead.)
  • Time-based tests with a counterfactual. Pre-register what you expect to happen and when, then check against a forecast built from the pre-change trend. If the lift only appears where and when your change landed — and not on the holdout — your causal story gets stronger.
  • Pre-registration of expectations. Write down the hypothesis before you look. It’s the cheapest defense against the human talent for finding a causal story in any chart after the fact.

None of these are bulletproof. The point is to know how weak your evidence is, attach the right amount of confidence to it, and stop narrating coincidences as mechanisms.

How to Actually Measure SEO — Construct First, Error Bars Always

Enough diagnosis. Here’s how I’d have you measure SEO if I were building the instrument:

Define the construct before you pick a single metric. Write down, in one sentence, the thing you actually care about — almost always qualified pipeline or revenue from organic. Every metric below it has to justify its existence as an indicator of that. If a number can’t trace a plausible path to the construct, it’s decoration. Delete it from the report.

Instrument leading indicators, not just lagging ones. Revenue is a lagging indicator — true but slow, and by the time it moves, the cause is months cold. Leading indicators move first: indexation and crawl health, qualified (not total) impressions for your money terms, the share of arriving traffic that reaches a real conversion step, assisted conversions in your conversion funnel. Build the chain from leading to lagging so you can act on the early signal and confirm with the slow one.

Report estimates with error bars. Stop reacting to single days. Use rolling windows, smooth the noise, and define — in advance — how big a move has to be before it counts as signal rather than wobble. If you can’t distinguish the movement from the instrument’s measurement error, it isn’t a result. It’s weather.

Triangulate. No single instrument is trustworthy alone, so corroborate. GSC plus server logs plus analytics plus actual sales conversations. When three independent, differently-flawed instruments agree, your estimate of the construct gets real. When they disagree, you’ve learned something more useful than any one of them claimed.

Kill the vanity proxies. Any metric you track that you can’t connect to the construct, and can’t connect to a decision you’d make differently based on its value, is a vanity metric. It costs attention and buys nothing. Cut it.

That’s the whole method: measure the construct, instrument the leading indicators, treat every number as an estimate with error around it, and never let a proxy quietly promote itself to a target.

Want a Measurement-Literate SEO Partner?

Here’s the uncomfortable thing about everything above: most SEO agencies don’t think this way. Not because they’re lazy — because nobody trained them to. They were trained to make the proxies go up, and they’re genuinely good at it. They’ll show you climbing rankings and rising traffic and a dashboard that glows green, and none of it will be a lie. It just won’t answer the only question that matters, which is whether the construct moved.

I think that way because I was trained to, before I ever touched SEO. When we run fractional SEO for a client, the first thing we build isn’t a content calendar — it’s the measurement model: what’s the construct, which indicators estimate it, how much error is in each, and what movement would actually count as evidence. The SEO work is downstream of getting the measurement honest. If you’ve ever looked at a green dashboard and felt a quiet dread that none of it connects to revenue, that instinct is correct, and it’s the right reason to want a partner who measures the thing instead of the proxy.

Frequently Asked Questions

What does “measure SEO like a psychometrician” actually mean?

It means treating your marketing funnel as a measurement model, not a scoreboard. A psychometrician asks what underlying construct (usually revenue or qualified demand) each metric estimates, how much error the metric carries, and whether it’s valid for the claim you’re attaching to it — instead of treating rankings and traffic as the goal itself.

Why are rankings and traffic considered proxy metrics, not real goals?

Because nobody’s business actually wants a ranking or a pageview — they want customers and revenue. Rankings, traffic, and impressions are observable indicators that correlate with qualified demand, but they aren’t it. You can grow any of them while revenue stays flat, which is the clearest sign you’re optimizing a proxy instead of the construct underneath it.

Can you A/B test SEO the way you A/B test a landing page?

Not cleanly. There’s only one shared Google index, so you can’t hold everything constant except your change — algorithm updates, seasonality, and competitors all confound the result. The honest substitute is quasi-experimental: page-cohort tests with a holdout, time-based tests against a forecast, and pre-registering what you expect before you look.

What is measurement error in SEO and why does it matter?

Measurement error is the noise between a metric’s reading and the true value it’s estimating. Classical test theory writes it as Observed = True + Error. Rank trackers wobble daily, Google Search Console hides nearly half of query data, and attribution windows are assumptions — so day-to-day movements are usually error, not signal, and reacting to them wastes effort.

What SEO metrics should I actually track?

Start from the construct — usually organic-sourced revenue or qualified pipeline — then track leading indicators that plausibly cause it: indexation health, qualified impressions for money terms, the share of organic visitors who reach a conversion step, and assisted conversions. Report them with rolling windows, triangulate across tools, and delete any metric you can’t tie to a decision.

← Back to Blog

// related services

Put this into practice

// ready to put it all together?

Founder-led SEO.
No dashboard theater.

Book a call →

// or send a message

Tell us
about your site.

Drop your URL and we’ll give you an honest read — no pitch, no obligation. Prefer to talk live? Book a call →

// 30 min · intro, founder-to-founder

Book a call