Probabilistic Sharpe ratio: testing whether a Sharpe is statistically distinguishable from a target

A reported Sharpe ratio of 1.5 looks impressive. The probabilistic Sharpe ratio answers a different question: how confident can the investor be that the strategy's true Sharpe is at least 1.0, given the sample size, the return distribution's skewness and kurtosis, and the statistical noise inherent in any finite-sample estimate? It is the formal test of whether a Sharpe is statistically distinguishable from a chosen reference level.

What the probabilistic Sharpe ratio is

Bailey and López de Prado introduced the probabilistic Sharpe ratio (PSR) in The Sharpe Ratio Efficient Frontier (2012) as a correction to the headline Sharpe. The standard Sharpe ratio is a point estimate that ignores both the precision of the estimate (sample size) and the shape of the return distribution that produced it. Two strategies with identical Sharpe ratios in their backtests can have very different true Sharpes if one was estimated from 36 monthly observations of a heavily skewed series and the other from 360 observations of a near-normal series.

The PSR collapses these factors into a single probability between 0 and 1: the probability that the true Sharpe ratio of the strategy exceeds a chosen reference level. A PSR of 0.95 against a reference of 1.0 means there is a 95% probability that the strategy's true Sharpe is above 1.0, given the observed sample.

How it works

The PSR is computed from four inputs: the observed Sharpe ratio, the sample size, the skewness of the return distribution, and the excess kurtosis. The formula is PSR(Sharpe*) = Φ((Sharpe − Sharpe*) × √(n − 1) / √(1 − γ × Sharpe + (κ − 1) / 4 × Sharpe²)), where Sharpe* is the reference level, γ is skewness, κ is excess kurtosis, n is the sample size, and Φ is the standard normal CDF.

The intuition is that the standard error of a Sharpe estimate depends on both the sample size and the higher moments of the distribution. Negative skewness and excess kurtosis both inflate the standard error—meaning a strategy with the same observed Sharpe but worse tail behaviour has a less reliable estimate. The PSR correctly penalises strategies whose return distributions look attractive in the headline number but come from samples that under-represent the true tail risk.

The reference level Sharpe* is a choice. A common convention tests whether the observed Sharpe exceeds 0.5 or 1.0 (round-number anchors). For comparing strategies, the more useful PSR uses the second strategy's observed Sharpe as the reference—answering "how confident are we that strategy A is genuinely better than strategy B?" instead of "is each one above some absolute floor?"

What the evidence shows

Bailey and López de Prado's empirical work showed that backtested strategies with reported Sharpe ratios in the 2–4 range frequently produce PSR values below 0.5 against a reference Sharpe of 1.0—meaning that despite the impressive headline number, the underlying evidence does not support a strong claim that the strategy's true Sharpe exceeds 1.0. The gap between the observed Sharpe and the statistically defensible Sharpe is the size of the strategy's overfitting risk.

The PSR is most informative for short-horizon backtests of strategies with non-normal return distributions. A two-year backtest of a volatility-selling strategy can produce a Sharpe ratio of 3.0—but the PSR against a reference of 1.0 is often well below 0.5 once the sample size and the negative skew of the strategy's return distribution are accounted for. The headline Sharpe is misleading; the PSR reveals it.

For longer-horizon backtests of more normal-distribution strategies, the gap between the standard Sharpe and the PSR narrows. A 30-year backtest of a diversified equity strategy with near-normal returns produces a PSR close to 1.0 against any reasonable reference if the observed Sharpe is meaningfully positive. The metric is therefore most useful as a discount on the headline Sharpe in precisely the cases where the headline is most likely to be misleading.

Limitations and trade-offs

The PSR depends on stable assumptions about the return distribution. The skewness and kurtosis estimates that feed the formula are themselves noisy—and the estimates from a small sample are particularly unreliable, which is exactly the case where the PSR matters most. The metric is more robust than the standard Sharpe but does not eliminate the underlying small-sample problem.

The PSR also does not address backtest overfitting in the broader sense. A strategy designed by selecting from many candidates will have a higher observed Sharpe than the typical strategy from the same family, simply because of selection. The PSR corrects for distributional and sample-size effects on a single strategy's Sharpe; it does not adjust for the fact that the strategy was chosen from many. The deflated Sharpe ratio (López de Prado, 2018) extends the framework to address this multiple-testing problem.

Like all summary statistics, the PSR collapses the full strategy evaluation into a single number. It is most useful as a complement to other diagnostics—drawdown profile, regime-by-regime performance, factor decomposition—rather than as a standalone verdict.

Probabilistic Sharpe ratio in pfolio

The probabilistic Sharpe ratio is not currently displayed in pfolio Insights. The standard Sharpe ratio, return distribution statistics (skewness, kurtosis), and the underlying return series are all available; the probabilistic adjustment can be computed externally from these inputs.

Related articles

Disclaimer
This article constitutes advertising within the meaning of Art. 68 FinSA and is for informational purposes only. It does not constitute investment advice. Investments involve risks, including the potential loss of capital.

Get started now

It is never too early and it is never too late to start investing. With pfolio, everybody can be their own wealth manager.
pfolio — start investing for free, broker-agnostic DIY portfolio management
This website uses cookies. Learn more in our Privacy Policy