The Measurement Crisis

Mihir Wagle 6 min read
metricsaijudgmentcounterfactualproduct management

AI didn't create a skills crisis for PMs, it exposed a measurement crisis that was already there.

Nikhyl Singhal called the split a while back on Lenny's podcast: information movers versus builders. The PMs who synthesize, summarize, and package versus the ones who frame problems and drive decisions. His argument was that AI would eat the first group. He was right. It's happening now. Spec writing, backlog grooming, competitive teardowns, stakeholder updates. Any decent AI tool does these in minutes. The floor on legible output just went to zero.

The standard advice is: good, now spend your time on the hard stuff. Problem framing. Customer intuition. Strategy. Executive influence.

This is directionally correct and practically useless.

The dot ball problem

Baseball had its version of this. The Moneyball revolution was essentially: stop paying for batting average, start paying for on-base percentage. Measure the walk, not just the hit. The player who didn't make an out created value that the old system couldn't see.

Cricket figured it out even earlier. The best bowlers are measured not just by wickets taken but by dot balls bowled.1 Nothing happened. That was the plan. The bowler chose a specific delivery, to a specific spot, with a specific field, and the batter scored nothing. Do that six times in a row and you've bowled a maiden over. The pressure compounds. Mistakes follow. The wicket that eventually falls three overs later traces back to the dots that built the pressure, but only the wicket shows up in the highlights.

Product management hasn't learned this yet. OKR systems reward what shipped. The PM who kills a bad feature before engineering spends a quarter on it created enormous value. There's no artifact. No launch metric. No demo. The PM who ships a mediocre feature with a polished spec and a clean dashboard looks productive by every measure the org tracks.

AI makes this worse. When the floor on artifact quality rises, everyone's specs look good. Everyone's dashboards are clean. The delta between a strong PM and a weak one shifts entirely to judgment. But the system measuring PMs hasn't moved. It's still counting wickets, not dots.

This isn't speculation. In a 2023 field experiment at Boston Consulting Group, Mollick et al. found that AI lifted bottom-half consultants' performance by 43% while top performers gained only 17%.2 Noy and Zhang at MIT found AI erased half the performance gap between strong and weak writers. Brynjolfsson, Li, and Raymond found the same pattern in customer service: the bottom quintile caught up, the top saw negligible gains. The floor rose. The ceiling didn't. If your performance system can't distinguish between "got better with AI" and "was already good," you're now promoting the wrong people and you can't tell.

The shelf life of "safe" skills

Here's the part the LinkedIn discourse won't touch: the "hard stuff" has an expiration date too.

I keep a compiled knowledge base at work. Competitive intelligence, pricing models, architectural decisions, all structured for an LLM to query. When I have a strategic hypothesis, I test it against that base in minutes. The ones that survive get advanced. The ones that don't get killed before they waste anyone's time. A year ago that cycle took days of research and three meetings. Now the bottleneck isn't the analysis. It's having the hypothesis worth testing.

Scale that pattern. An agent with full organizational context, customer call transcripts, and usage telemetry will close the strategy gap faster than people expect. Competitive analysis with an agent that can ingest every earnings call, pricing page, and developer forum post simultaneously isn't a five-year-away problem. It's already starting.

The skills that actually resist commoditization aren't skills in the traditional sense. Taste. Conviction under uncertainty. Willingness to be wrong in public and adjust without theatrics. Consider: an agent can generate five strategic options and score them on available data. It cannot walk into a room where a VP is championing the worst one and change their mind without making them feel stupid. That's not analysis. It's social manipulation in service of good judgment, and no one knows how to teach it.

What this means if you're a PM

Stop optimizing for artifact quality. Your specs were already good enough before AI. Now they're free. The marginal return on a better-formatted PRD is zero.

Start building a track record of decisions, not deliverables. Document what you recommended, what the alternatives were, what you killed and why. Make the illegible legible. Not because the org is asking for it yet, but because the ones who figure out how to make judgment visible before the system demands it will have a massive head start when it does.

The uncomfortable version of this: if the way you explain your job to people relies on the work that just got commoditized, the discomfort you're feeling isn't about AI. It's about identity. That's worth sitting with, because the PMs who move fastest will be the ones who've already let go of it.

What this means if you run a PM org

Your measurement system is the problem. Not your PMs.

If you're still evaluating PMs primarily on shipped features, roadmap coverage, and stakeholder satisfaction scores, you're selecting for the person with the best AI-generated artifacts. You'll promote them. Your strategy will get worse. You won't know why.

The organizations that figure out how to measure decision quality, not output quality, get a talent advantage that compounds. Everyone else will watch their best judgment-oriented PMs leave for places that can see them.

This isn't a PM problem. It's a leadership problem wearing PM clothes.

The real crisis

Every knowledge worker profession built its credentialing and promotion systems on legible output. Polished decks, clean analyses, well-structured communications. AI just made all of that free.

AEI's Brent Orrell calls this "de-skilling the knowledge economy": routine knowledge work gets automated while the value shifts to tacit, subjective, and intuitive judgment.3 Most organizations haven't caught up.

The crisis isn't automation. It's that performance infrastructure assumes effort correlates with value. That assumption held when producing a good deliverable actually required skill and time. It doesn't anymore. And nobody has a replacement ready.

"Measure judgment" is easy to say. Almost impossible to operationalize. How do you score the decision that didn't get made? How do you attribute the meeting where someone changed an executive's mind? How do you distinguish the PM who shaped the strategy from the one who just happened to be in the room?

The organizations that solve this first win. Not because they'll have better AI tools. Because they'll be able to see who's actually good.

Footnotes

  1. In cricket, a dot ball is a delivery where the batter scores no runs. A maiden over is a set of six consecutive dot balls. A wicket is getting a batter out. Elite T20 bowlers are now evaluated on dot ball percentage as heavily as wickets taken or runs conceded.

  2. Dell'Acqua, McFowland, Mollick et al., "Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality," Harvard Business School Working Paper, 2023.

  3. Orrell, "De-Skilling the Knowledge Economy," American Enterprise Institute, June 2025.

← Back to blog

Enjoyed this post? Get new ones in your inbox.