I hate getting a result I don’t expect, but I like to cross-check every calculation I do, especially when I am relying on the calculation procedures contained in an R package. This is a recipe for “agita”.

I was recently exploring R’s autocorrelation function: acf(x, lag.max). I was applying it to the returns series for the last 5 years of Winton’s diversified fund – try it you might find it interesting. I cross-checked the results using R’s correlation function: cor(x, y). y was set to a lagged version of x i.e. y(t) = x(t – lag). I could not seem to get the results to match; they were close but not exactly the same.

The R documentation cites “Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer-Verlag.” as the source for the code, but I don’t have it. So I had to do some sleuthing…

The standard formula for figuring the correlation coefficient of two series:

So what are the possible ways in which this formula could be different when investigating the correlation of a series with itself, lagged?

Let’s start by changing x_{i} and y_{i} to x_{i} and x_{(i – k)} to make it obvious the values are all drawn from the SAME series, and we will have to change the range of the summation and therefor the denominator preceding the summation (the series length is (n – k + 1) so the denominator becomes (n – k + 1 – 1) = (n – k)) :

where x1 and xk are the original series minus the first k terms, and the original series minus the last k terms, respectively. Rho k is short hand for the autocorrelation coefficient at lag k.

The first thing that jumps out is that we are using different means and standard deviations for x1 and xk, but they are sub-sets of the same series – so we ought to be able to use the mean and standard deviation of the entire series of x for BOTH. This takes us to:

I used to indicate we have to make a correction to sigma for the length of the series. Sigma is the standard deviation for a series of length n, but the series we are correlating have lengths (n – k + 1). Since standard deviation scales as the square root of length:

Which we can substitute into our calculation to get:

The results of this formula match the results of the acf(x, max.lag) function!

I understand the differences and they seem reasonable. Agita gone away. Whew!

### If you enjoyed this post, here are some more that may interest you ...

Random Correlated Series GeneratorIt is often a challenge to de-bug code that involves large numbers of long stochastic series - it is very easy to think you have it right and not so easy to make sure. Lately I have needed to generate random correlated series whose means and covariance characteristics I know so I can verify various calculation procedures. I thought I would share a small function I wrote in R that generates the series. I wanted…

Wrangling "R": Rrangling!Trying to figure out an easy way to do Pearson's Chi-Squared Test in R when verifying goodness of fit. I thought the function chisq.test() would help!Problems? There's no way to tell chisq.test the correct degrees of freedom ("df") - it cannot figure this out itself. My stats text tells me to reduce df by 1 for each value derived from the observations. If I have 10 buckets of counts AND I have calculated the mean…

Variety of Covariance EllipsesI found it helpful to look at a variety of covariance ellipses for tame distributions so that I would have a better feel for those I come across in the wild. Following is a quick tour that I hope will show the effect of changing the parameters of the distributions of the series in question on the covariance ellipses: For uncorrelated series, what is the effect of changing the standard deviations of the series? For…

## Share

If you found this post informative, please share it!