Friday, 17 May 2013

Monte Carlo Integration 2

In the previous post we talked about the simplest Monte Carlo procedure: throwing a bunch of points at your area and seeing which ones land inside. Now we are going to make things a bit more mathematical. By the way if you think the equations don't look good, I have been having trouble getting Mathjax to work with blogger, so I am using codecogs. This looks okay in Firefox but not so good in Chrome, and god help you if you are using IE. Getting the equations to look pretty is a process.

Let's try to evaluate the definite integral $I = \int_a^b f(x) dx$ by taking the interval $[a,b]$ and breaking it up into $N$ sub-regions. Each sub-region has width $\Delta = (b-a)/N$ and mid point $x_i = a + i \Delta / 2$. In a sub-region the function is almost constant so we can approximate it by its value at the mid-point. The integral evaluates the area under the curve $f(x)$ and we are approximating the area as a sum of rectangles, $I = \frac{b-a}{N} \sum_i^N f( x_i )$ click the image below to see how this gets better as $N$ increases. The function is $0.2 + 0.2 \sin(\pi x) sin(2 \pi x) + 0.3 sin(\pi x)$ which was chosen for no reason whatsoever.

This is nothing other than the mean value theorem, which says the integral is equal to the width of the interval times the average value, $\langle f \rangle$, of the function in the interval, to evaluate the integral we use $\langle f \rangle \approx \bar{f} = \frac{1}{N} \sum_i^N f( x_i ).$ We have used a regular grid to evaluate the average of $f(x)$, we could use a more complicated grid to give us a better estimate with fewer points, leading to Simpson's rule and so on. This is called quadrature. Instead we will try to evaluate the average by random sampling.

The Law of Large Numbers says that the sample average converges almost surely to the true average. While this seems kind of obvious eg. flipping a coin an infinite number of times will give on average 50% heads and 50% tails, the proof is long. Accepting that it is true, by sampling random points in the interval $[a,b]$ and using them to evaluate the function we can calculate the average which is equal to the true average for large enough number of samples $N$, $\langle f \rangle \approx \frac{1}{N} \sum_i^N f( x_i )$ where now the points $x_i$ are randomly distributed in $[a,b]$. This is the simplest kind of Monte Carlo integration. If instead of a one dimensional integral we have a multi-dimensional one then the formula for $\bar{f}$ is the same as long as $x_i$ is now interpreted as a random point in the multidimensional integration region, which has volume $V$, $I = \frac{V}{N} \sum_i^N f( x_i ).$ Evaluating the same integral as before by Monte Carlo looks like:

As you can see, this converges much more slowly, so why would this be a good idea? The quadrature scheme introduced above with rectangles is called the midpoint rule, and the error we introduce is proportional to $h^{d+2}$. With a total of $N$ points we can have $N^{1/d}$ little boxes which would give an error $N^{-2/d}$. In the Monte Carlo case? If we assume that evaluating the integral many times would give us a Gaussian distribution of values around the true value (which we will prove, or not, in the next post) we estimate the error from the sample variance: $\sigma^2 = \frac{1}{N-1} \sum_i^N \left( f(x_i) - \bar{f} \right)^2.$ This is $N-1$ instead of $N$ in order to be an unbiased estimate, it's a (slightly) complicated story, for any reasonable value of $N$ you would use the difference is not important. The variance of the integral is then $\frac{1}{N-1} \sum_i^N \left( \frac{ V ( f(x_i) - \bar{f} ) }{N} \right)^2 = \frac{ V^2 \sigma^2 }{N}.$ and the error goes like the square root of this: $\frac{ V \sigma }{\sqrt{N} }.$ The key thing to notice is that this just depends on the number of sample points, not the dimension of the integral.

So now the thing to do is evaluate the same integral using the same number of points for Monte Carlo and the midpoint rule and see which one does better. Assume that the integration region is the unit cube in d dimensions, that is the range $[0,1]$ for each axis. We will use the integral, $\frac{3}{d} \int dx_1 ... dx_d \left( x_1^2 + ... + x_d^2\right) = 1$ as our multidimensional integral example. The (click it) animation below plots the difference between the answer and the estimate from the midpoint rule versus the difference between the answer and the estimate from Monte Carlo.

Input number of dimensions 1 < d < 20:

Plotted is the log of the number of samples versus the log of the error. If $dI = AN^p$ then the log-log plot has slope $p$, the steeper the curve the faster the error is reduced. When $d>4$ the error from Monte Carlo for a given $N$ tends to be smaller. Any quadrature scheme gives an error proportional to $N^{-q/d}$; for some $q$ so there will always be a $d$ for which Monte Carlo is better. In some cases the number of dimensions (in the phase space of a statistical mechanical system for example) is absolutely enormous, $10^{80}$ or larger. No quadrature will work and Monte Carlo, although a not a good choice, is the only choice.