# heavytailed

## July 16, 2012

### On the joint testing of Score statistics

Filed under: Uncategorized — heavytailed @ 8:25 pm

A recently (and continuing) hot topic in the field of Statistical Genetics is that of testing for the association between rare variation and a particular trait of interest, be it a disease (e.g. Schizophrenia), or a trait (e.g. height). A swarm of different statistical tests have been proposed, ranging from 0-1 comparisons, to a resurgence of Neyman’s c-alpha statistic. In researching for this post, I stumbled upon a nice treatment of artificial regressions that make clear the link between c-alpha tests, and other tests which can be applied in the setting of a generalized linear model.

While the inspiration for this particular post derives from the field of statistical genetics, the subject is (most) applicable to any situation where the effects are hypothesized to derive from infrequent events with different sources. What this looks like in practice is whenever you think that the response is explained by many sparse explanatory variables (e.g. with few non-mean observations) rather than a handful dense regressors. Within this particular setting, the standard model:

$Y = \beta_1X_1 + \beta_2X_2 + \dots + \gamma^\dagger Z = \beta^\dagger X + \gamma^\dagger Z$

lacks the power to say anything substantial about $\beta_1,\beta_2,\dots$, as $X_1,\dots$ are all sparse. It’s probably too obvious to bring up, but this is a different kind of sparseness than that which is amenable to Least Angle Regression or LASSO: it’s not so much that the truly explanatory variables are a sparse subset of all the variables, it’s that the number of effective observations (e.g. those that differ substantially from the mean, say) of the explanatory variable is very small.

Every burden test makes a substitution of the following form:

$Y = \beta^\dagger X + \gamma^\dagger Z \approx \tau S(X,\cdot) + \gamma^\dagger Z$

Where the notation $f(\cdot)$ indicates that the function $f$ can potentially take other parameters, whether endogenous or exogenous. Note that this is a kind of “approximate regression” in the sense that if $S(X,\cdot) \propto \beta^\dagger X$, then the ensuing test (LRT/Wald/Score/C-alpha) is optimally empowered against the null. The entire space of burden tests is subsumed by the form of $S$, and as pointed out by Lin and Tang, the vast majority of such tests take $S(X) = \xi^\dagger X$ and differ only by choice of $\xi$. To my knowledge, only CAST (which treats multiple rare variants as redundant), and SKAT-O (which allows for constant interaction terms) are the only extant models that have a non-linear form. The choice of test from this point forward can vary, but because it is possible for $|X| >> |Z|$, the score statistic is typically used. As geneticists want to control for covariates (like gender), MLE is nearly always used to fit the null model $Y = \gamma^\dagger Z$, and in this cases the C-alpha statistic is exactly (not just asymptotically) equivalent to the Score statistic.

But what if you want to test multiple hypotheses about the same underlying data; say different possibly overlapping subsets of sparse vectors of $X$, or differing functional forms of $S(X,\cdot)$? In the world of genetic association burden testing, one may want to test the hypothesis that all variants within a gene contribute to a trait, and the hypothesis that particular functional classes contribute — clearly highly related hypotheses. The nice thing about the Score statistic is that it depends not at all on the alternate hypothesis, but only on the null:

$U = (Y-g(\gamma^\dagger Z))^\dagger S(X,\cdot)$

where all of the above are vectors. The extension to multiple hypotheses is very straightforward, simply allow $S$ to be a matrix $[S^{(1)},\dots,S^{(k)}]$ and then

$\mathbf{U} = (Y-g(\gamma^\dagger Z))^\dagger \textbf{S}(X,\cdot)$

While this is derived in the context of linear models, its use needn’t be used that way. In particular, there could be cases with small sample sizes wherein it would be preferable to test a set of a priori hypotheses about a parameter (while fitting others), rather than fitting all parameters (for instance testing the distance between means of a mixture of gaussians — while fitting the variance). Because $\textbf{S}(X,\cdot)$ may be correlated, the joint statistic may be distributed very differently from the product of the marginal distributions (that is, treating each $S(X,\cdot)$ as independent). For OLS, the joint distribution (assuming the error term can be characterized by mean and variance alone) is a straightforward calculation, remembering that we only want to fit the null (sorry that some of these are too complex for wordpress to parse out):



so that:



Then write:



Expanding and canceling:

$=\mathbb{E}\left[\tilde{\mathbf{S}}(\epsilon - \tilde{\epsilon})(\epsilon - \tilde{\epsilon})^\dagger\tilde{\mathbf{S}}^\dagger\right]$

Which yields



The more general derivation (e.g. for including link functions) relies on the specifics of Fisher’s score test, which I’ll post as a comment or perhaps as another post. There’s a particular trick that’s somewhat interesting. With the variance in hand, and the knowledge that the null distribution is asymptotically multivariate normal, your set of hypotheses can be jointly tested without the significant loss of power by considering them separate tests and correcting by some standard method.