Calibration for Executives: Why Your Confidence Probably Lies

Brier scores, forecasting discipline, and the hidden cost of overconfidence.

Apr 4, 20267 min read

Here is a test that will take you thirty seconds and a little courage. Write down ten things you believe will happen in the next ninety days , a deal closing, a hire accepting, a quarter landing, a competitor moving. For each, write the probability you assign, anywhere from 0.5 to 0.99. Seal the list. Open it in ninety days.

If your 90%-confident predictions come true about 90% of the time, you are calibrated. If they come true closer to 65% of the time, you are not. You are miscalibrated in a specific direction , overconfident , and you are in very good company, because that is the default state of almost every executive who has not deliberately trained otherwise.

Brier scores and why they beat your gut

The standard measure of forecasting accuracy is the Brier score, proposed by Glenn Brier in 1950. You square the difference between your probability and the outcome (1 for happened, 0 for didn't), then average across forecasts. Lower is better; a perfect forecaster scores 0, chance on a 50/50 scores 0.25, and maximally wrong scores 1.

The Brier score is blunt but honest. It punishes both overconfidence and underconfidence. It rewards the discipline of saying 70% when you mean 70%, not 90% to sound decisive or 55% to hedge. And critically, it produces a number you can track over time , which converts vague self-assessment into an actual feedback loop.

Tetlock's finding, and why it shouldn't surprise anyone

Philip Tetlock spent twenty years collecting 82,361 predictions from 284 experts on political and economic questions. The aggregate result: experts were, on average, barely better than random chance, and well below simple statistical models. But within the pool, a subset , about 2% , performed consistently and substantially better. Tetlock called them "superforecasters."

The superforecasters were not smarter, better connected, or more senior. They shared one habit: they treated their beliefs as probabilistic, actively hunted for disconfirming evidence, updated their forecasts in small increments when new information arrived, and kept score. They were metacognitively honest about their own track record in a way the other 98% weren't.

That 98% includes most of the people who run companies.

Why executives run miscalibrated

Three forces conspire.

Selection. You got to your seat by making bold calls that worked out. The calls that didn't work out are either rationalized, forgotten, or attributed to circumstance. Your internal record of "times I was right" is heavily filtered.

Incentive. Organizations reward confidence. A leader who says "I'm about 65% sure" in a board meeting sounds wobbly, even when 65% is the correct number. A leader who says "we've got this" sounds decisive, even when "we've got this" is a 40% claim dressed up.

No feedback loop. Most decisions at the executive level have fuzzy, delayed, confounded outcomes. Six months later, the thing worked or didn't, and a dozen other factors were in play. Without a dated journal with pre-committed probabilities, you cannot learn. The signal is too weak and too late.

What to do about it

Start a decision journal. Not a notebook of reflections , a specific format: decision, date, probability, rationale, outcome, score. Aim for ten entries a month for three months. At ninety days, calculate your Brier score and compare your 70%-confidence buckets to their actual hit rates.

Almost everyone who does this discovers the same thing in the first round: their 90%-confident calls come true about 70% of the time. The gap closes with practice. The consequence of closing it , smaller bets at the right times, less over-commitment to dying strategies, more willingness to update in the face of bad news , is where the real compounding lives.

Calibration isn't a parlor trick. It's the infrastructure under every other decision skill. Without it, metacognition has nowhere to stand.

Back to the Library