EngineMDA

How We Correct Showdown Bias (and the Sample Sizes We Use Per Node)

June 18, 20269 min read

If you have ever looked at population data with a skeptical eye, one objection comes first, and it is the right one: you only see the cards that reach showdown. Most hands end with a fold and the cards never get revealed. So how can any MDA tool honestly claim to know the range the pool plays? This is the single most important technical question about mass data analysis, and almost nobody in this space answers it in public. We are going to.

What showdown bias actually is

In a typical online cash pool, only a minority of hands that see a flop ever reach showdown. Someone folds first the vast majority of the time. That means if you try to reconstruct a player's range from the hands you literally saw at showdown, you are not looking at a random sample of their holdings. You are looking at a filtered one.

The filter is not neutral. Hands that reach showdown skew toward the holdings strong enough to call down or to barrel all the way to the river. The folds, the give-ups, the check-and-surrender lines are systematically missing from what you observe. Build a model directly on that filtered sample and you will overestimate value-heavy ranges and underestimate everything that quietly folded along the way.

The key point: showdown bias is a selection effect, not random noise. You cannot fix a selection effect by collecting more hands, because every new hand passes through the same filter. More data alone makes you more confident in a biased number. The fix has to be structural.

The distinction that solves most of the problem: actions vs. ranges

Here is the insight that does most of the work, and that almost every casual treatment of MDA misses. There are two completely different things you can measure at a node, and only one of them is touched by showdown bias.

1. Action frequencies (fully observable, zero bias)

How often the pool checks, bets a given size, raises, or folds at a node. You see every one of these for every hand, whether it goes to showdown or not. You always observe what a player did, even when you never see their cards. The overfold to a c-bet, the underbluff on the river, the sizing tell: all of it is directly observable and immune to showdown bias by construction.

2. Range composition (only revealed at showdown)

Which specific holdings take each action. This is the only place showdown bias lives, because the actual cards are revealed only when a hand goes to showdown.

GTOKiller's exploitative engine is driven first by the action-frequency layer, the unbiased one. The bulk of the EV against a real pool comes from attacking observable behavior: a pool that folds 47% to a c-bet when it should fold 50% is a measurable, bias-free leak. You do not need to see a single hand at showdown to know that frequency, and you do not need to see it to exploit it.

How we correct the range estimation

For the layer that genuinely depends on revealed cards (range composition), we do not take showdown hands at face value. Three corrections, applied in order:

Reweighting by showdown propensity

A holding that almost always reaches showdown is overrepresented in the revealed sample; a holding that rarely shows down is underrepresented. We model the probability that a given line reaches showdown and reweight the observed hands accordingly, so the estimated range reflects what was actually played, not just what we happened to see.

Cross-street consistency

A river range has to be consistent with the action frequencies already measured on the earlier streets, which are unbiased. A turn betting range cannot contain more combos than the observed turn bet frequency allows. We constrain each street's range estimate to the upstream frequencies, anchoring the biased layer to the unbiased one.

Bound, never invent

Where the showdown sample is too thin to estimate composition with confidence, we do not guess. We fall back to what the action layer can support and widen the uncertainty, or we drop the node entirely. We never paper over a gap with what theory would predict.

The sample sizes we actually use, per node

A complete decision tree in GTOKiller is roughly 500,000 nodes. Those nodes are not equal. High-frequency spots, the ones that move your monthly winrate, are backed by millions of observations. Rare runouts have far fewer. Treating both the same way would be the easiest way to smuggle a biased or noisy number into a solution.

Parameter	Current pipeline
Tree depth	~500,000 nodes per complete decision tree
Data window	Rolling 2 years (older hands are discarded)
Refresh cadence	Full reprocess every 6 months
Sample threshold	Dynamic per spot: range-sensitive nodes require more

The threshold is dynamic, not a single fixed number. It scales with how much a given estimate leans on the showdown-revealed layer. A node whose recommendation rests mostly on action frequencies clears a lower bar, because that layer is unbiased. A node whose recommendation depends on knowing the exact range composition has to clear a much higher one, because that is where the bias lives. A node that does not clear its bar is discarded, not filled in with theory.

What we discard, and why that is a feature

This is the part most tools will not say out loud. If a spot does not have enough sample, we would rather tell you "we do not have enough data here" than invent a frequency to fill the screen. That choice has visible consequences, and we think they are the right ones:

A rare spot may show up with fewer available actions than you expected. If the pool only ever used one bet size in a node, we show that one size, because it is the only one we can verify.

The product is honest about its limits. Every number you see is anchored to real observations, never to interpolation. We would rather cover fewer spots completely than every spot halfway.

Ready to exploit these leaks in real time?

Open the Solver

Engine

What Is GTOKiller? The Exploitative Solver

Traditional solvers teach you to play against a perfect opponent that doesn't exist. GTOKiller uses Mass Data Analysis to find the most profitable line against the players you actually face at the tables.

7 min read

Engine

How Our Exploitative GTO Engine Works

Traditional solvers solve for equilibrium. Pure exploitation is brittle. GTOKiller's engine solves a modified game that attacks your opponent's leaks while bounding your worst-case — here's the math behind it.

8 min read

StrategyRiverMDA

They Fold 59% on the River. You're Not Barreling Enough.

The population folds 59.42% to triple barrels on the river — and at 175% pot it jumps to 70%. MDA data from 18,600+ opportunities reveals the most profitable river sizing and the board textures that break opponents the hardest.

6 min read

Back to Blog