The server called by the java script that implemented my LaTex equations has gone bye-bye. So I am trying to fix past posts so equations render in a legible way – apologies. In the mean time …

I have been getting better acquainted with Monte Carlo simulations. They play an important role in implementing Vince’s LSPM. Since I plan on writing my own version of LSPM, I better get a good understanding of the approach.

A good introduction is to use the “classic” example of estimating pi using the area of a circle. So, imagine a circle of radius, r = 1, centered on the origin. The area of the circle in the first quadrant (x, y >=0) is pi/4. Imagine a square of side = 1, bottom left corner at the origin. The area of the square is 1. If we randomly sprinkle N points over the square, N.pi/4 of them, on average, will end up inside the circle and N.(1 – pi/4) will end up outside. N = 1,000 looks like this:

Monte Carlo techniques are often discussed in terms of integrations. In this case we are integrating the function:

f(x, y) = \left\{\begin{matrix} \textup{0,  } x^{2} + y^{2} > r^{2} \\ \textup{1,  } x^{2} + y^{2} \leq r^{2} \end{matrix}\right.
\frac{\pi }{4} = \int_{y = 0}^{1} \int_{x = 0}^{1} f \left ( x, y \right ) dx.dy

only we are using an approximation:

\frac{\pi}{4} \approx \frac{1}{N} \sum_{i=1}^{N} f \left ( x_{i}, y_{i} \right ) \textup{ where } x_{i},y_{i} \sim \textit{U} \left [ 0, 1 \right ]

How accurate will this estimate of pi/4 be? Well, dropping a point at random into the x, y plane is a Bernoulli trial with a probability of p = pi/4 of being inside the circle. We will estimate p using N trials, so the standard deviation of the distribution of the means will be inline sigma = sqrt{frac{p.left ( 1 – p right )}{N}} allowing us to derive a confidence interval along with our estimate of pi/4.

On average, p = pi/4. So, for an estimate built from 10,000 points, we would expect a sd in the estimate of pi to be 4 x sqrt((pi/4)*(1 – pi/4)/10000) = 0.01642. To see if this is how the process comes out, we should try it out many, many times take the average and sd of the estimates of pi.

Below is an R script that generates a random set of trial x, y coordinates (uniformly distributed between zero and one) of size “trials”. Then it tests each pair of coordinates to see if they are inside or outside the circle. It totals all those inside the circle by the total number of trials to estimate pi/4. It takes 4 lines of code. Finally, the estimate is multiplied by 4 to get an estimate for pi itself.

The script repeats the process “runs” times so we can get a distribution of estimates for pi. Most of the script is to generate the three plots in this blog post: an example of one test (above), a distribution of estimates of pi and a qq-plot of the distribution of estimates to show that it is normally distributed (at the end of the post).

# Ian Rayner July 2011
# Estimating pi using MC simulation – uniform dist’n
# equation of circle of radius r centered on origin is
# x^2 + y^2 = r^2
# So x ~ U(0,1) and y ~ U(0,1), if x^2 + y^2 > r^2 then
# point is outside circle.
# ratio of points inside circle to total points = pi / 4

trials <- 1000
radius <- 1
runs <- 10000
plot.on <- T

pi.est <- vector(mode=”numeric”, length=runs)

for (i in 1:runs){
x.loc <- runif(trials, 0, radius)
y.loc <- runif(trials, 0, radius)
test <- (x.loc^2 + y.loc^2) <= radius^2
pi.est[i] <- sum(test) / trials

pi.est <- 4 * pi.est

w <- max(abs(pi.est – pi))
breaks=pi + seq(-w, w, l=17)
norm.counts <- runs * pnorm(breaks, mean(pi.est), sd(pi.est))
norm.counts <- norm.counts[-1] – norm.counts[-length(norm.counts)]if (plot.on){
plot(x.loc, y.loc, main=”Example Samples”, pch=”.”)
lines(x=seq(0, radius, l=1001),
y=sqrt(radius^2 – seq(0, radius, l=1001)^2),
col=”red”, lwd=2)

histo <- hist(pi.est, breaks=breaks, plot=F)
plot(histo, ylim=c(0, max(histo$counts, norm.counts)),
main=”Histogram of estimates of pi”,
sub=paste(“RED = pi, BLUE = mean =”, round(mean(pi.est), 5),
” sd = “, round(sd(pi.est), 5),
” trials = “, trials, sep=””),
xlab=”Estimate of Pi”)
abline(v=pi, col=”red”, lwd=4)
abline(v=mean(pi.est), col=”blue”, lwd=2)
points(histo$mids, norm.counts,
qqplot(norm.counts, histo$counts,
main=”Q-Q Plot Counts for Est Pi vs Normal Dist’n”)
abline(0, 1)

Below are the results of running the script. Each MC simulation involves 10,000 trials. The script runs the simulation 1,000 times. The left chart below shows a histogram of the results of each run. pi is marked on the x-axis in red. The blue line and blue dots represent the normal approximation to the expected distribution using the mean and the sd of all 1,000 runs. On the right is a Q-Q plot comparing the actual distribution to the normal distribution using the mean and sd.

The mean of the runs is 3.14195 vs pi = 3.14159; an error of 0.01%. The sd of the distribution was 0.01632 vs 0.01642 calculated above; an error of 0.6%. So the MC simulation performs as expected. But can we do better? It is immediately apparent that we are testing a far larger area than we need to. The entire area of interest (where the curve runs) is in the top right half of the square. Maybe we should try a different distribution for the population of test points than a uniform distribution on the unit square. The next post will take this step.

Share This


If you found this post informative, please share it!

New Commodity Pool Launches

Please provide your name and email address so we can send you our monthly compilation of new commodity pools registered with NFA.

You can unsubscribe at any time.

Thank you! Your file will be available after you have completed a two-step confirmation process. Check your in-box for step 1.

Biggest Hedge Funds By $AUM

Please provide your name and email address so we can send you our monthly compilation of biggest hedge funds by $AUM as reported on the SEC's Form ADV.

You can unsubscribe at any time.

Thank you! Your file will be available after you have completed a two-step confirmation process. Check your in-box for step 1.