This is the sixth in my Hedge Fund Hacks series. In this post, I hammer home how little we know about the expected returns of a hedge fund or a managed futures strategy because we have so little data. You will learn just how large the uncertainty is and what you can do about it.

## Glimpses Through the Fog

I remember hiking with my wife in North Wales in the Spring of 2003. Wales tends to be cloudy and wet. We were on our way up the Bristly Ridge towards Glyder Fach when the clouds descended. I wanted to find Castell-y-Gwynt (Castle of the Winds) so we could have lunch on the long cantilevered stone pictured to the right.

Castell-y-Gwynt lies on a long broad ridge studded with weird rocky outcrops and edged with cliffs. After 20 minutes of searching in 30 yard visibility, we had glimpsed enough features to place ourselves on the map and find our lunch spot.

Trying to figure out anything about hedge fund performance reminds me of this experience: all you have to go on are glimpses through the fog.

## Predicting Returns: Nearly Everything is Hidden

If we are using past performance data to make allocation decisions, we should understand how much is hidden from us. Once a month we get an account statement – a single data point that crystallizes an unfathomable number of interactions within a complex system. Like a hiker struggling to identify landmarks in the fog, our task is to try to deduce the overall lay of the land from just a few scattered glimpses.

To illustrate the point, this article focuses on the simplest measure of performance: the mean monthly return. It is the best estimate of future performance and a key input to the most popular method of portfolio construction: mean-variance optimization. As a result, our estimate of monthly returns, through the portfolio optimization process, drives our ultimate success or failure.

## Sampling and Inference

Using a sample to make an estimate of some characteristic of the population is called statistical inference. In this case:

- The “characteristic” we are interested in is the average monthly return.
- The “population” (hidden) is all the possible monthly returns that might arise from a hedge fund manager’s trading decisions.
- The “sample” is the manager’s track record.

Most people intuitively understand that the larger the sample (the longer the track record), the more likely the sample mean is to resemble the population mean (the expected value). But there are a couple quite subtle things to understand here:

- The mean of the larger sample is
**not necessarily**closer to the “true” value, it is just**more likely**to be closer. - We only approach the “expected value” of returns if we get to play the game an infinite number of times. How many times will we get to play the game? Once!

## Estimated vs. Actual Returns: Theoretical Approach

In the world of hedge fund analysis, we have a sample of a manager’s monthly returns from which to estimate future monthly returns: it is the manager’s track record. Our job is to estimate the unknown distribution of his future returns.

In this experiment we are going to come at it from the opposite direction. Let’s say we are omniscient and we know precisely the characteristics of the distribution of a manager’s returns:

- Normally distributed.
- Average monthly return, mu, is 1%.
- Monthly volatility (standard deviation), sigma, is 4.6%.

Note: This is equivalent to a Sharpe Ratio of 0.75 at 0% risk-free rate.

We are going to explore just how much variation there could be in the track record due to chance alone.

### Experiment

We can calculate the solution directly. When you take samples from the distribution above, the averages of the samples are themselves normally distributed with mean mu, and standard deviation sigma / sqrt(n). Where “n” is the sample size – the length of the track record.

Below is a plot showing what the sampling distributions look like if we have track records of 12, 60, 120, and 240 months. The shaded area covers the range +0.5% to +1.5% per month. The legend tells you how much of the area under the curve is shaded for each plot.

To see them plotted separately, **MOUSE OVER THE CHART**. On mobile, tap the chart. Switch back by tapping outside the chart.

### Discussion

#### How Confident Can You Be?

As you would expect, the longer the track record the narrower the spread of sample average monthly returns. For the 12 month track record, only 29% of the randomly generated track records have an average monthly return within 0.5% of the true value. 12 month track records generated at random from the exact same manager can have wildly different monthly returns. We have to go all the way to a 240 month track record before almost all (91%) of the random track-records are between 0.5% and 1.5%.

How confident can you be about any manager’s future performance? Not very!

#### How Long a Track Record Is Long Enough?

If you recall from the math above, the standard deviation of the sampling distribution varies inversely with the square root of the sample size. This means there are diminishing returns to increasing sample size.

Going from 12 to 24 months reduces the spread by 29%. Increasing sample size from 60 to 72 months reduces the spread by less than 9%. The cost you incur as you raise the bar on minimum track record, is a reduced universe of managers from which to choose. The adjacent table shows this effect. By the time you are trading off 72 months vs 60 months, you are reducing the spread by 9%, but the universe of funds is also 12% smaller.

## Track Record vs. Managers

TR | Spread | Change | Managers | Change |
---|---|---|---|---|

12 | 2.00% | - | 3147 | - |

24 | 1.41% | -29% | 2982 | -5% |

36 | 1.15% | -18% | 2780 | -7% |

48 | 1.00% | -13% | 2519 | -9% |

60 | 0.89% | -11% | 2252 | -11% |

72 | 0.82% | -9% | 1977 | -12% |

Spread: Standard deviation of the sample means.

Change: Percent change from row above.

Managers: Number exceeding minimum TR.

Change: Percent change from row above.

If you are looking to invest in a single manager you need a long track record. If you have the resources to allocate to multiple managers, you can work with shorter track records. The odds of all the managers out-performing purely by luck goes down as the number of managers goes up.

Anything less than 36 months, and you are in what I think of as the Venture Capital zone – traditional portfolio construction rules do not apply. This is the domain of Hedge Fund Seeders.

#### Side Note: Emerging Manager Premium

I have seen various papers attempting to verify the existence of and quantify the “emerging manager premium”. My suspicion is that managers who randomly produce above average results in the short term will get funding, and over time their “true” performance will assert itself. This is a manifestation of reversion to mean (the subject of a future post). Nobody is going to risk allocating to a manager who is under-performing their peers! This selection bias eliminates weak initial track records from the field, and makes it appear that managers’ performance drops off over time!

### R Code

Below you will find a copy of the code I used to implement the experiment described above and generate the charts. It also includes a variation on the experiment in which the track record length is held constant while the volatility of the underlying returns are varied. Enjoy!

[code title=”Random Track Records” language=”R”]

# Item 2: Estimated Returns vs Track Record Length / Sharpe Ratios

if (F){

seriesMonths <- c(12, 60, 120, 240) # Range of months to test

seriesSharpes <- c(0.25, 0.50, 0.75, 1.00) # Range of Sharpe Ratios to test

monthsConstant <- 2 # Choice of months to use when testing Sharpe Ratios

sharpeConstant <- 3 # Choice of Sharpe Ratio to use when testing months

expReturn <- 0.01 # Expected monthly return for sampling

lowerTail <- 0.005 # For shaded area on plot

upperTail <- 0.015 # For shaded area on plot

# Constant Vol different series length

# Closed form solution for monthly average return

vol <- sqrt(12) * expReturn / seriesSharpes[sharpeConstant] # Convert Sharpe to vol.

colors <- rainbow(length(seriesMonths))

xAxis <- seq(-0.05 + expReturn, 0.05 + expReturn, 0.0001)

yVals <- sapply(seriesMonths, function(months) dnorm(x=xAxis, expReturn, vol / sqrt(months)))

yMax <- max(yVals)

shaded <- vector("numeric", length(seriesMonths))

# Plot combined chart

dev.new()

par(oma=c(0, 0, 1, 0))

plot(xAxis, yVals[, 1], t="l", lwd=2, ylim=c(0, yMax), col=colors[1],

ylab="Density", xlab="Sample Average Monthly Returns")

polygon(c(lowerTail, xAxis[xAxis >= lowerTail & xAxis <= upperTail], upperTail),

c(0, yVals[xAxis >= lowerTail & xAxis <= upperTail, length(seriesMonths)], 0),

col="lightgrey", border="lightgrey")

for (i in 1:length(seriesMonths)){

lines(xAxis, yVals[ , i], lwd=2, col=colors[i])

belowLower <- pnorm(lowerTail, expReturn, vol / sqrt(seriesMonths[i]))

belowUpper <- pnorm(upperTail, expReturn, vol / sqrt(seriesMonths[i]))

shaded[i] <- 100 * round(belowUpper – belowLower, 2)

}

legend(x="topleft", legend=paste0(seriesMonths, " Months (", shaded, "% Shaded)"),

bty="n", lwd=2, col=colors)

mtext(paste0("Average Monthly Returns (", seriesSharpes[sharpeConstant], " Sharpe Ratio)"),

line=-1, side=3, outer=T, cex=1.5)

mtext(paste0("Shaded Areas >=", 100 * lowerTail, "% and <= ", 100 * upperTail,

"% vs. true value of ", 100 * expReturn, "%"),

line=-2.5, side=3, outer=T, cex=1)

# Plot separate Charts

dev.new()

par(mfrow=c(2, 2), oma=c(0, 0, 3, 0))

for (i in 1:length(seriesMonths)){

plot(xAxis, yVals[ , i], ylim=c(0, yMax), t="l", col=colors[i],

main=paste0(seriesMonths[i], " Months Track Record"),

ylab="Density", xlab="Sample Avg. Mo. Returns")

polygon(c(lowerTail, xAxis[xAxis >= lowerTail & xAxis <= upperTail], upperTail),

c(0, yVals[xAxis >= lowerTail & xAxis <= upperTail, i], 0),

col="lightgrey", border="lightgrey")

lines(xAxis, yVals[ , i], ylim=c(0, yMax), lwd=2, col=colors[i])

text(min(xAxis), yMax * 0.9,

paste0(shaded[i], "% Shaded"), pos=4)

}

mtext(paste0("Average Monthly Returns (", seriesSharpes[sharpeConstant], " Sharpe Ratio)"),

line=1, side=3, outer=T, cex=1.5)

mtext(paste0("Shaded Areas >=", 100 * lowerTail, "% and <= ", 100 * upperTail,

"% vs. true value of ", 100 * expReturn, "%"),

line=-1, side=3, outer=T, cex=1)

# Constant series length, different vol

vols <- sqrt(12) * expReturn / seriesSharpes # Convert Sharpes to vol.s

colors <- rainbow(length(vols))

xAxis <- seq(-0.05 + expReturn, 0.05 + expReturn, 0.0001)

yVals <- sapply(vols,

function(volatility) dnorm(x=xAxis,

expReturn,

volatility / sqrt(seriesMonths[monthsConstant])))

yMax <- max(yVals)

# Plot combined chart

dev.new()

par(oma=c(0, 0, 1, 0))

plot(xAxis, yVals[ , 1], t="l", lwd=2, ylim=c(0, yMax), col=colors[1],

ylab="Density", xlab="Average Monthly Returns")

polygon(c(lowerTail, xAxis[xAxis >= lowerTail & xAxis <= upperTail], upperTail),

c(0, yVals[xAxis >= lowerTail & xAxis <= upperTail, length(seriesMonths)], 0),

col="lightgrey", border="lightgrey")

for (i in 1:length(vols)){

lines(xAxis, yVals[ , i], lwd=2, col=colors[i])

belowLower <- pnorm(lowerTail, expReturn, vols[i] / sqrt(seriesMonths[monthsConstant]))

belowUpper <- pnorm(upperTail, expReturn, vols[i] / sqrt(seriesMonths[monthsConstant]))

shaded[i] <- 100 * round(belowUpper – belowLower, 2)

}

legend(x="topleft",

legend=paste0(seriesSharpes, " Sharpe Ratio (", shaded, "% Shaded)"),

bty="n", lwd=2, col=colors)

mtext(paste0("Average Monthly Returns (", seriesMonths[monthsConstant], " Months)"),

line=-1, side=3, outer=T, cex=1.5)

mtext(paste0("Shaded Areas >=", 100 * lowerTail, "% and <= ", 100 * upperTail,

"% vs. true value of ", 100 * expReturn, "%"),

line=-2.5, side=3, outer=T, cex=1)

# Plot 4 panel chart

dev.new()

par(mfrow=c(2, 2), oma=c(0, 0, 3, 0))

for (i in 1:length(vols)){

plot(xAxis, yVals[ , i], ylim=c(0, yMax), t="l", col=colors[i],

main=paste0(seriesSharpes[i], " Sharpe Ratio"),

ylab="Density", xlab="Average Monthly Returns")

polygon(c(lowerTail, xAxis[xAxis >= lowerTail & xAxis <= upperTail], upperTail),

c(0, yVals[xAxis >= lowerTail & xAxis <= upperTail, i], 0),

col="lightgrey", border="lightgrey")

lines(xAxis, yVals[ , i], ylim=c(0, yMax), lwd=2, col=colors[i])

text(min(xAxis), yMax * 0.9, paste0(shaded[i], "% Shaded"), pos=4)

}

mtext(paste0("Average Monthly Returns (", seriesMonths[monthsConstant], " Months)"),

line=1, side=3, outer=T, cex=1.5)

mtext(paste0("Shaded Areas >=", 100 * lowerTail, "% and <= ", 100 * upperTail,

"% vs. true value of ", 100 * expReturn, "%"),

line=-1, side=3, outer=T, cex=1)

}

[/code]

## Length of Track Record: Real Hedge Fund Data

The examples given above suffer from what Naseem Taleb calls “The Ludic Fallacy”. We have used game-like rules to make a point, and the real-world doesn’t necessarily follow these rules. I am going to show the real world impact of this sample-size problem using data from the Eurkahedge database.

### The Data

I have a version of the Eurekahedge database that I have cleaned following the guidelines in my Hedge Fund Data Hygiene article. I search through the data to find any program with at least 20 years, 240 months, of data. Using 240 months of returns gives a very good estimate of each manager’s average monthly return. I also filter out programs with Sharpe Ratios above 2, as they tend to be fixed income strategies with very low volatility. There are 212 programs meeting these criteria.

In order to reveal the effect of the sample size alone, I re-normalize the returns for each program by dividing by the standard deviation of returns and scaling back up to an annualized 20% volatility. So all 212 sets of returns have the same “known” 20% volatility.

The following two charts give a good overview of the data I am using. Notice that the global mean monthly return is 1.25% and the average Sharpe Ratio at 0% risk-free rate is 0.75. These values informed the choices I made in the earlier “theoretical” experiment.

### The Experiment

The following set of steps is carried out for four different sample sizes: 12, 24, 48, and 96 months. These sample sizes do not match those used in the theoretical experiment due to re-sampling constraints. The minimum track record is 240 months, and the maximum is 274. Clearly, sampling 240 months from a 240 month track-record is not going to be informative.

For each of the sets of data, I calculate the overall average monthly return. Then I create 10,000 random track records by sampling the raw data set. The sampling is with replacement so each month’s return can be selected repeatedly. The sample returns can be picked from any month in the return series; I am not picking a block of consecutive returns. This ensures a representative collection of returns, unbiased by economic or market conditions.

In order to simulate the process of managing asset allocation with a risk budget, each simulated track record is re-scaled to 20% annual volatility. For each sample, I calculate the mean and subtract it from the average monthly return for the set. These are the sample errors and a negative value implies the sample’s average monthly return over-stated the average monthly return of the raw data. I call these negative sample errors “Negative Surprises”: if you expected a 1% return and got 0.5% that would be a negative surprise.

The results are displayed as histograms on the following 4 panel plot. There is one for each sample size, and each covers 2.12 million sample error values (212 programs x 10,000 simulations each). I also display the proportion of sample errors that are negative surprises, the average of the negative surprises, and the 20% quantile of distribution of the errors.

### Discussion

The global average monthly return of the entire set of 212 programs when normalized to 20% annualized volatility is 1.25%. The expected negative surprises range from -1.64% to -0.53% for 12 month through 96 month simulated track records. We will overestimate future returns about 50% of the time, and when we do, we can expect to have over-estimated by an amount similar in size to the returns themselves!

As we should expect from the theoretical discussion above, the expected negative surprise for the 96 month track records is around 1/sqrt(8) times the value for the 12 month track records. That’s a considerable improvement, the expected negative surprise has declined about 70%. It certainly would not do for a physics experiment, but if a manager with a 96 month track record is showing over 1% per month average we can be confident that his “true” return is in positive territory!

## Conclusions and Recommendations

So all is well. We have shown that if we allocate to managers with long track records we can be confident our expectations will be met.

Not so fast.

Our task is not simply to find managers with a positive expectation, we have to build a portfolio. Returns, volatilities, and covariances will all feed into our decision-making process. Every one of those values has errors and they will tend to compound. How can we trust the results of our efforts?

Here we arrive at a seemingly insurmountable conundrum: To get big enough samples of managers’ performances to reliably estimate their true characteristics, we need very long track records. Perhaps as long as 20 years. Markets and the way managers trade them will have changed over the course of that track record: the series is not stationary! I wrote an article demonstrating this for Winton Capital. Not only that, we will have very few managers to choose from and many may be closed to new investors.

### What does all this mean in practical terms?

- The confidence interval for any hedge fund performance statistic is embarrassingly wide.
- The longer a manager’s track record the more confidence you can have in the performance statistics up to a point.
- Even with long track records (e.g. 10+ years) your confidence is not particularly high.
- The longer the track record the more likely performance will have changed along the way.
- The longer track record you insist on, the fewer managers you will have to choose from.
- For short track records you know next to nothing about likely future performance.

### What’s the hack here?

- Use calculation or re-sampling techniques as above to force yourself to confront the uncertainty of any statistic presented or calculated.
- You can effectively increase the amount of data you have and thus increase your confidence by allocating to more managers – it is unlikely that they all punched above their weight.
- Use additional data to improve confidence (e.g. volatility, correlation, benchmarks, and qualitative factors).
- When allocating to managers with short track records, think like a venture capitalist: Place many small equal risk bets.

Following up on item 1, I will publish future articles exploring these same concepts applied to compound annual growth rate, volatility, and correlation, and how this impacts portfolio optimization.

Delving deeper into item 2, the following three articles are planned:

- Adding more programs as an alternative to favoring longer track records.
- Correcting for reversion to mean to better manage your performance expectations.
- Exploring consistency of manager performance from one time period to the next.

As always, feel free to send me a connection request on LinkedIn, and share this article any place you think it might be of interest.

Photo by Laurice Manaligod on Unsplash

## Share

If you found this post informative, please share it!