# heavytailed

## July 10, 2014

### On testing statistical libraries

Filed under: Uncategorized — Tags: , , , — heavytailed @ 7:54 pm

I have been frustrated for a very long time about the lack of good libraries for computing linear mixed models. The typical R packages (lme, lmer, etc) don’t quite provide enough access to the underlying functionality: I’ve got covariance matrices for my random effects in hand, and can’t specify a single variable “growth ~ fertilizer + rain|field” that will generate the matrix. I’ve been repurposing statistical tools (GCTA) for this purpose, and providing fake GRMs which are just the covariance matrices I want to use for random effects. It works, but it’s not anywhere near an actual library: it’s a command-line tool.

So: I forked statsmodels and started to develop my own. They’ve got a very nice framework for MLE into which LMM/GLMM fits very well (REML/AI-REML will be more of a square peg/round hole situation), and the implementation took about an hour. But the problem is: how do you seriously test a new entry into a statistical package? There are lots of problems here — lme and lmer don’t always agree on effect estimates, and any regularization applied is not documented, so using “truth” for testing in this case is just not possible. We can test degenerate cases (no random effects) – but here, too, decisions about regularization can make testing tricky. At what point are you “close enough”? I’m talking here specifically about testing correctness, as what I want to test is part of a library of methods for statistical inference, and not statistical software per se. That is, concerns about dealing with nasty data or missing values(pdf) fall slightly upstream of this core.

The answer I settled on: test the pieces. If the likelihood is correct, the Jacobian is correct, the Hessian is correct, and the optimizer is doing the right things (and if the system isn’t sharing state in some weird way between these pieces), we can be reasonably sure that the MLE coming out the other end is right. So how to test these pieces?

Testing likelihood

Likelihood is usually pretty easy to test. The function itself typically has a closed form, even if it’s in terms of “special” functions. In this case, being a multivariate normal, the test is even easier. We can  directly evaluate the likelihood by a simple transformation of the data: remove the proposed mean, multiply by the square root of the proposed covariance (one can use Cholesky decomposition to do this), and then the resulting data can be evaluated as independent draws from a normal distribution. It’s reassuring when this very orthogonal approach produces the same likelihood value.

Testing the Jacobian

This is really the point of the whole post. Say you’ve got a closed-form solution for your Jacobian, how do you test that it’s correct? You can re-implement it in the testing file, but then how can you be sure you’ve not been wrong twice? You can verify that it’s zero at the known maximum, but it’s not really hard to get that wrong (what if, for instance, you got the sign wrong?)

The central maxim of testing mathematical functions is test your invariants. For instance, a great way to test an implementation of a binomial coefficient approximation is by verifying that Newton’s identity still holds:

$\sum_k\binom{m}{k}(1 + z)^k = (1 + z)^m \;\;\; z \in [0, 1)$

In the case of the Jacobian, we can test that the Taylor expansion is accurate (or: test that the implementation of the closed-form solution matches the numerically-derived Jacobian). In particular, we care about a system of Taylor expansions about a test point $x$, and we let $f_k = f(x)$ while $f_{k+r} = f(x + r\epsilon)$ for a small $\epsilon$. So for instance we might have

$f_{k-2} = f_k - 2\epsilon f'_k+ \frac{(2\epsilon)^2}{2!}f''_k - \frac{(2\epsilon)^3}{3!}f'''_k + \frac{(2\epsilon)^4}{4!}f^{(4)}_k - \dots$

$f_{k-1} = f_k - \epsilon f'_k + \frac{\epsilon^2}{2!}f''_k - \frac{\epsilon^3}{3!}f'''_k + \frac{\epsilon^4}{4!}f^{(4)}_k - \dots$

$f_{k+1} = f_k + \epsilon f'_k + \frac{\epsilon^2}{2!}f''_k + \frac{\epsilon^3}{3!}f'''_k + \frac{\epsilon^4}{4!}f^{(4)}_k + \dots$

$f_{k+2} = f_k + 2\epsilon f'_k + \frac{(2\epsilon)^2}{2!}f''_k - \frac{(2\epsilon)^3}{3!}f'''_k + \frac{(2\epsilon)^4}{4!}f^{(4)}_k + \dots$

A clever individual can remove the effect of the second-derivative (in fact, all even terms) by differencing, for instance

$D_1 = f_{k+1} - f_{k-1} = 2\epsilon f'_k + 2\frac{\epsilon^3}{3!}f^{(3)}_k + 2\frac{\epsilon^5}{5!}f^{(5)}_k + O(\epsilon^7)$

$D_2 = f_{k+2} - f_{k-2} = 4\epsilon f'_k + 2\frac{(2\epsilon)^3}{3!}f^{(3)}_k + 2\frac{(2\epsilon)^5}{5!}f^{(5)}_k + O(\epsilon^7)$

The pattern being $D_j = \sum_{r=1}^\infty 2 \cdot \frac{(j\epsilon)^{(2r-1)}}{(2r-1)!}f^{(2r-1)}_k$. This becomes powerful when arranged into a matrix, for instance, a 9-th power approximation would be

$\left(\begin{array}{c}D_1 \\ D_2 \\ D_3 \\ D_4\end{array}\right) = \left(\begin{array}{cccc} 2 & 2 & 2 & 2 \\ 4 & 16 & 64 & 256 \\ 6 & 54 & 486 & 4374 \\ 8 & 128 & 2048 & 32768\end{array}\right)\left(\begin{array}{c} \epsilon f'_k \\ \frac{\epsilon^3}{3!} f^{(3)}_k \\ \frac{\epsilon^5}{5!} f^{(5)}_k \\ \frac{\epsilon^7}{7!} f^{(7)}_k\end{array}\right) + O(\epsilon^9) = A_4 g_4(\epsilon, f) + O(\epsilon^9)$

This can be extended indefinitely to arbitrary precision. One might argue that a low-order approximation with very small $\epsilon$ should work, but the problem is that if the likelihood function is not changing very rapidly, evaluating it at a very small $\epsilon$ away may produce the same value, so we need wider spacing to ensure that the evaluation really does produce different values; but we still want very accurate approximations to the jacobian. Because we’d like to have exactitude at around 15 decimal places (machine epsilon is typically $\sim 2^{-52} \approx 2 \times 10^{-16}$ , taking the approximation to $D_7$ which is $O(\epsilon^{13})$ meets this requirement for $\epsilon = 0.01$. The error rate depends also on the higher-order derivatives. In the case of the normal likelihood, these derivatives are very much bounded.

A direct formula for $f'_k$ can be obtained by multiplying together the appropriate widgets; first, $D$ can be written as

$D_3 = \left(\begin{array}{ccccccc} 0 & 0 & -1 & 0 & +1 & 0 & 0 \\ 0 & -1 & 0 & 0 & 0 & +1 & 0 \\ -1 & 0 & 0 & 0 & 0 & 0 & +1\end{array}\right)\left(\begin{array}{c}f_{k-3} \\ f_{k-2} \\ f_{k-1} \\ f_k \\ f_{k+1} \\ f_{k+2} \\ f_{k+3}\end{array}\right)$

(The pattern should be obvious). Given this, a direct formula for $f'_k \approx f'(x)$ is:

$O(\epsilon^{2n-1}) + f'_k = \frac{\mathbf{e}_1^\dagger(A_n^{-1} D_n )f_n}{\epsilon}$

where $D_n$ is the $n$th-order differencing matrix, $A_n$ is the $n$th-order Taylor difference matrix (defined above), $\mathbf{e}_1$ is a $n \times 1$ vector with a 1 in the first position only, and zeros everywhere else, and $f_n$ is the vector of function evaluations at $f(x - n\epsilon), f(x - (n-1)\epsilon), \dots, f(x + (n-1)\epsilon), f(x + n\epsilon)$.  The numerator can be precomputed for very fast evaluation. In fact, for the 13th-order approximation, the quadrature vector is

$q_{7} = \frac{1}{360360} \left(\begin{array}{c} -15 \\ 245 \\ -1911 \\ 9555 \\ -35035 \\ 105105 \\ -315315 \\ 0 \\ 315315 \\ -105105 \\ 35035 \\ -9555 \\ 1911 \\ -245 \\ 15\end{array}\right)$

The approximate partial derivative in the direction of $x + \epsilon$ would then be $\frac{q_{7}^\dagger f_7}{||\epsilon||}$.

Update: Here’s a plot of the error of this for the function $f(x) = \sin(\pi * x^2)$


q7 = 1/360360 * c(-15, 245, -1911, 9555, -35035, 105105, -315315, 0, 315315, -105105, 35035, -9555, 1911, -245, 15)

x = 1:10000/2000

f <- function(x) { sin(pi * x^2)}

df <- function(x) { cos(pi * x^2) * 2 * pi * x}

err <- sapply(1:9000, function(i){t(q7) %*% f(x)[i + 1:15]/(x[2] - x[1]) - df(x)[i + 8]})

plot(x[8 + 1:9000], err, type='l', ylab='absolute error', bty='n', xlab='x')



So pretty low error. It blows up towards the end, but then again, so does $f^{(13)}(x)$.