It’s been about five months since I said this post was in draft, so it’s about time I reined in my perfectionism and published the damn thing.

Since this is a graduate student blog at the moment, it seems reasonable I should write a bit more about what I’m learning at any particular time. Late last year, our department had a talk by Erik van Doorn, from the University of Twente, which looked at birth-death processes, and asymptotic results for how quickly they can be expected to converge to their equilibrium distribution. It was an overview of his paper Representations for the decay parameter of a birth-death process based on the Courant-Fischer theorem, to appear in the Journal of Applied Probability.

A good deal of the talk introduced and used basic results on orthogonal polynomials, so I went to see if any of my books mentioned the subject. It turned out there was a chapter on them in Approximation Theory and Methods by Michael Powell – a book that’s been on my bookshelf for about five years but hardly been read – regarding their use in Gaussian quadratures. The following is mostly spliced together from that chapter, and my undergraduate notes on Numerical Analysis.

Interpolation

Before we talk about quadratures, it’s best if we start with interpolation. Say we have some function over some interval, where we can take a few sample values, with no measurement error, but we have no explicit formula and can’t afford to sample it everywhere. We thus would like to use our sample values to fit an approximating function to the whole interval. One simple way to do this is to try to fit a polynomial through the sample points. We can do this by assigning each sample point a Lagrange polynomial

$l_k(x) = \prod_{n \neq k} \frac{x-x_n} {x_k-x_n} ,$

with value one at that sample point and zero at all the others. For example, if we take our sample points at -1,-0.5,0,0.5, and 1, then the Lagrange polynomials are those in the plot below. There’s a light grey line at one to help check they are equal to one or zero at the sample points.

Our fitted curve will then just be a sum of these Lagrange polynomials, multiplied by their corresponding sample value, so we get a polynomial passing through all the sample points, and estimate the function $f(x)$ as

$\hat{f}(x) = \sum_k f(x_k) l_k(x) .$

This gives a curve that passes through all the interpolation points with the smallest-order polynomial possible. It works well for estimating functions that are, indeed, polynomials, but for other functions it can run into problems. In particular, there are cases where the difference between the interpolation curve and the true function at certain points increases when we increase the number of sample points, so we can’t necessarily improve the approximation by adding points. There’s also the question of where to sample the original function, if we have control over that. I’ll pass over these issues, and move on to integration.

Now say that, instead of approximating a function with some samples, we want to approximate a function’s integral by sampling its value at a few points, or

$\int_a^b f(x) \, \textrm{d} x \simeq \omega_0f(x_0) +\omega_1f(x_1) +\ldots +\omega_nf(x_n) .$

If we want to focus on making the integration accurate when $f$ is a low-order polynomial, the quadrature with $n+1$ sample points is exact for polynomials up to order $n$ if we set the weights as

$\omega_k = \int_a^b l_k(x) \, \textrm{d} x.$

In other words, a quadrature is equivalent to fitting an interpolation curve, and integrating over it. For example, if we’re integrating a function over the interval $[-1,1] ,$ we could simply take one sample, with weight one. This would give a quadrature of $2f(x_0) ,$ which is exact for any zero-order, constant function, regardless of the position of $x_0.$

We could take samples at the endpoints, to get the quadrature $\frac{1} {2} f(-1) +\frac{1} {2} f(1) ,$ and we can set $f(x)$ to be constant, or proportional to $x,$ to see the result for first-order polynomials is exact.

We could also take the endpoints and the midpoint. Then we have $\frac{1} {3} f(-1) +\frac{4} {3} f(0) +\frac{1} {3} f(1) ,$ which is exact for polynomials up to order two.

However, occasionally we stumble onto a quadrature that does a little better than expected. For the first quadrature above, since our interval is symmetric around zero, if we let $x_0=0$ any first-order term will be antisymmetric around this midpoint, so this quadrature is exact for first-order polynomials too. Similarly, the second quadrature is exact for quadratic terms, but the third quadrature can still only deal with quadratics, and can’t handle cubics.

Considering what happened when we placed the sample points for the first quadrature at zero, we might guess this is something to do with where we place our sample points. If so, how should we place our sample points, and what’s the highest-order function we can exactly integrate with any set number of samples? To answer this, we can use orthogonal polynomials.

Orthogonal polynomials

We say two vectors are orthogonal when their inner product is equal to zero. For example, if the inner product is simply the dot product, then

$\langle x,y \rangle = \sum_{k=1}^n x_ky_k =0,$

and so vectors are orthogonal if they are perpendicular to each other.

We have a similar definition and example for orthogonal polynomials, but now we choose an inner product that integrates over an interval instead of summing over two vectors’ dimensions:

$\langle f,g \rangle = \int_a^b f(x)g(x)\,\textrm{d} x =0.$

We can then choose a sequence of polynomials with increasing order that are all orthogonal to each other. For example, we can start the sequence with $f_0(x)=1,$ or some multiple of it. We then seek a first-order polynomial $f_1(x)=Ax+B$ such that

$\int_a^b 1\times(Ax+B) \,\textrm{d} x =\frac{A} {2} (b^2-a^2) +B(b-a) =0.$

This can be any multiple of $x-(b+a) /2 .$ In many cases we wish the orthogonal polynomials to have be orthonormal, i.e. $\langle f_k,f_k\rangle =1,$ so for the above we require

$\int_a^b C_0^2 \,\textrm{d}x = C_0^2 (b-a) = 1,$

\begin{aligned} \int_a^b C_1^2(x-(b+a)/2)^2 \,\textrm{d}x &= C_1^2 \frac{1}{3} \left[\left(b-\frac{b+a}{2}\right)^3 -\left(a-\frac{b+a}{2}\right)^3\right] \\&= C_1^2 \frac{2} {3} \left(\frac{b-a} {2}\right)^3 \\&=1,\end{aligned}

and so on, giving a condition for the value of each scaling factor $C_k.$ We can then find the next term by looking for a second-order polynomial that is orthogonal to $1$ and $x,$ and so on. In the case where $a=-1$ and $b=1$ this gives a simple sequence of polynomials that begins with

$f_0(x)=\frac{1}{\sqrt{2}} ,\, f_1(x)=\sqrt{\frac{3}{2}} x,$

$f_2(x)=\sqrt{\frac{5}{2}} (\frac{3}{2} x^2-\frac{1}{2}) ,\, f_3(x) =\sqrt{\frac{7}{2}} (\frac{5}{2}x^3-\frac{3}{2}x),\ldots$

This is an orthonormal version of the Legendre polynomials.

Since any polynomial can then be expressed as a linear combination of members of this sequence, each polynomial in the sequence is also orthogonal to any polynomial with lower order. So, for example, $f_3$ is orthogonal to all constant, linear, and quadratic polynomials.

To be continued
The next post will explain why these orthogonal polynomials help us decide on interpolation points.

After a few more posts, I’m planning to return to quadratures to talk about something I’ve mentioned on other topics: since the above procedure for quadratures gives a point estimate, we have no idea of how much uncertainty we have in our estimate. I’m therefore going to talk a bit about Bayesian quadratures. In particular, I’m going to start with a 1988 paper by Persi Diaconis called “Bayesian Data Analysis”, and fill in the gaps for those of us whose knowledge of integrated Brownian motion isn’t quite up to speed.