Inspired by reader Sluggo I am going to take a look at the issue of convergence. I will use the simple Monte Carlo method to estimate π introduced in my previous post. We are going to briefly explore how the estimate of π converges on the true value as we increase the size of our trial (the number of random samples we use).

If we know how accuracy varies with the number of trials, we can better design our method to achieve the accuracy we desire without having to iterate; we can better manage the trade-off between accuracy and absorbing computing resources.

I plan to explore the relationship between the accuracy of the estimate of the integral and the fraction of the search space occupied by the integral in a future post.

Skip the following recap if you already read the prior post:

We are trying to estimate the value of pi by sprinkling, at random, points over a unit square. The unit square contains a quarter-circle of unit radius centered at (0,0). This sets us up so the circle occupies 50% of the square. We sum the points inside the circle (under the curve) and divide by the total number of points to arrive at a fraction, p. This fraction should equal π/4, so π=4.p.

Another way to look at this is we are completing the following integration:

\frac{\pi }{4} = \int_{y = 0}^{1} \int_{x = 0}^{1}f\left ( x, y \right ) dx dy \textit{ where }
f(x, y) = \left\{\begin{matrix}  \textup{0, } x^{2} + y^{2} > r^{2}\\ \textup{1, } x^{2} + y^{2} \leq r^{2}  \end{matrix}\right.

only we are using an approximation:

\frac{\pi }{4} \approx \frac{1}{N} \sum_{i=1}^{N} f\left(x_{i}, y_{i} \right) \textup{ where } x_{i},y_{i} \sim \textit{U} \left[ 0, 1 \right]

Pretty straightforward!

Each time we drop a point onto the unit square we are completing a Bernoulli trial: the point is either inside the circle or it is not with a probability equal to the ratio of the area of the quarter-circle to the area of the square (p = 1/2 π/4). It is this probability we are trying to estimate.

If we have sufficient trials and p is sufficiently far away from 0 and 1, then we can use the normal approximation to the binomial distribution in which the standard deviation (the root mean squared error) of the estimate of p is sqrt{p.left ( 1 – p right ) / {N}} where N is the number of trials. So we should expect the error in our estimate of p to decline with the square root of N: to reduce our error by 50% we need to quadruple the size of our sample.

ASIDE: There are various rules of thumb that purport to tell you what “sufficient trials” and “far away from 0 and 1” mean. For example n.p > 5 AND n.(1 – p) > 5. In the end it depends on how accurate you want your final estimate to be. Consider a simulation of 100 trials estimating a fraction of 0.05. If you repeat this simulation over and over the distribution of outcomes will NOT be approximately normal since zero is the lowest number of points possible in the smaller partition, yet it complies with the rule of thumb.

I wrote a script in R to demonstrate these effects. I try out MC simulations with various numbers of trials from 2 to 100,000. For each simulation I estimate the error in the estimate of pi using the normal approximation. I make an alternate estimate of the error using the binomial distribution itself and calculating the span between the 15.87 percentile and the 84.13 percentile (1 sd either side of the mean). I run each MC simulation 1000 times and average out the estimates of the errors. Finally, I calculate the theoretical values for the errors in the estimates of pi for the various numbers of trials.

All is then plotted on log-log charts and we see that, when N is large enough, we get what we expect – a straight line with a -1/2 slope:

Error vs trials for normal distribution.
Error vs. trials Bernoulli distribution.

The Normal version shows the problem with using the normal approximation for very small trials – the actual error differs widely from the expected error. The Bernoulli version shows a lot of jitter for low number of trials, but theory pretty much agrees with practice.

The jitter is to do with percentiles and low population size. For example, if you only do 2 trials, there are only 3 possible outcomes (both in, both out, one in and the other out). 25% of the time the estimate for π is zero, 50% it is 2 and 25% it is 4.  So the 15th percentile has a value 2 and the 85th has a value 4 so the estimated error is (4 – 2)/2 = 1!  Since one would never do a MC simulation of this size, don’t invest time understanding this part of either chart, it was included for the sake of completeness.

To make use of the normal approximation in order to figure out an appropriate number of trials, we need an estimate of p to start with. Start with a simulation using, say, 1000 trials. This allows us to make an initial estimate of p and determine a confidence interval. Let’s say we estimate p to be 0.785, and thus estimate the error in p to be sqrt(0.785 x 0.215 / 1000) = 0.013. This gives us a 95% confidence interval of 0.760 to 0.810. Our error will be worst if the “true” value is 0.76. Further, let’s say we want the standard error in the estimate of p to be 0.0025, thus:

\sqrt{\frac{0.76.\left ( 1 - 0.76 \right )}{N}} \leq 0.0025 \Rightarrow N \geq 29,184

This is equivalent to an error of 0.01 in the estimate of π (π = 4.p => error in π = 4 x error in p; 4 x 0.0025 = 0.01). Looking at the charts above we see that to get errors of the order of 10^-2 we need trials in the order of 10^4. We would run our MC simulation using, say, 30,000 trials. Finally we would check that our expected error is indeed within tolerance requirements.

ps: Can someone tell me how best to post an R script for easy download? This would be much easier than posting the script in a text box which requires tons of editing for appearance.

EDIT: strike through above – initially I set the radius sqrt(2/π) then thought better of it, forgot to change the text.

EDIT: typos, expanded a couple of calculations to make them easier to follow.

EDIT: clarified (I hope) the derivation of the charts.

New Commodity Pool Launches

Please provide your name and email address so we can send you our monthly compilation of new commodity pools registered with NFA.

You can unsubscribe at any time.

Thank you! Your file will be available after you have completed a two-step confirmation process. Check your in-box for step 1.

Biggest Hedge Funds By $AUM

Please provide your name and email address so we can send you our monthly compilation of biggest hedge funds by $AUM as reported on the SEC's Form ADV.

You can unsubscribe at any time.

Thank you! Your file will be available after you have completed a two-step confirmation process. Check your in-box for step 1.

Share This


If you found this post informative, please share it!