I hate getting a result I don’t expect, but I like to cross-check every calculation I do, especially when I am relying on the calculation procedures contained in an R package. This is a recipe for “agita”.

I was recently exploring R’s autocorrelation function: acf(x, lag.max). I was applying it to the returns series for the last 5 years of Winton’s diversified fund – try it you might find it interesting. I cross-checked the results using R’s correlation function: cor(x, y). y was set to a lagged version of x i.e. y(t) = x(t – lag). I could not seem to get the results to match; they were close but not exactly the same.

The R documentation cites “Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer-Verlag.” as the source for the code, but I don’t have it. So I had to do some sleuthing…

The standard formula for figuring the correlation coefficient of two series:

So what are the possible ways in which this formula could be different when investigating the correlation of a series with itself, lagged?

Let’s start by changing x_{i} and y_{i} to x_{i} and x_{(i – k)} to make it obvious the values are all drawn from the SAME series, and we will have to change the range of the summation and therefor the denominator preceding the summation (the series length is (n – k + 1) so the denominator becomes (n – k + 1 – 1) = (n – k)) :

where x1 and xk are the original series minus the first k terms, and the original series minus the last k terms, respectively. Rho k is short hand for the autocorrelation coefficient at lag k.

The first thing that jumps out is that we are using different means and standard deviations for x1 and xk, but they are sub-sets of the same series – so we ought to be able to use the mean and standard deviation of the entire series of x for BOTH. This takes us to:

I used to indicate we have to make a correction to sigma for the length of the series. Sigma is the standard deviation for a series of length n, but the series we are correlating have lengths (n – k + 1). Since standard deviation scales as the square root of length:

Which we can substitute into our calculation to get:

The results of this formula match the results of the acf(x, max.lag) function!

I understand the differences and they seem reasonable. Agita gone away. Whew!

### If you enjoyed this post, here are some related posts that may interest you ...

Random Correlated Series GeneratorIt is often a challenge to de-bug code that involves large numbers of long stochastic series - it is very easy to think you have it right and not so easy to make sure. Lately I have needed to generate random correlated series whose means and covariance characteristics I know so I can verify various calculation procedures. I thought I would share a small function I wrote in R that generates the series. I wanted…

R Code for Correlation Matrix and HeatmapIn response to a comment, here is the R code used to generate the data and the heat maps in the preceding post. Lots of explanatory comments added.Apologies in advance for the layout issues - I can never seem to get these boxes to display right!# Input variablespath <- "C:/ ... /Futures/"sub.contracts <- "Contracts/"sample.size <- 500 # how many pairs to use in a samplesample.interval <- 250 # how many bars between each test# process…

Price Series Characterization: RegimesMotivation:A trading system should have as few tuning parameters as possible: simple is better than complex. Simple systems work best when suited to the current price regime. If we can identify N price regimes, we can build N simple trading systems. If current price behavior can be seen as a synthesis of the different price regimes, then our strategy should be the same synthesis of the different trading systems.I am trying to be as general…

## Share

If you found this post informative, please share it!