For my Ph.D. written qualifier (to demonstrate I’m capable of pursuing a Ph.D., I suppose); I have opted to develop algorithms for the estimation of fraternity coefficients (also known as dominance coefficients). While the coancestry (a.k.a. kinship) coefficient has received significant attention since Peter Visscher’s seminal 2006 paper, the fraternity coefficient has received far less attention. There are fairly good reasons for this:

**1)** Equation (11) in Visscher’s paper gives the correlation between coancestry and fraternity as

This means that, for your typical random effects model , the matrices and , in expectation, differ by a multiplicative constant; and therefore the model is in expectation close to non-identifiable. **In siblings**.

**2)** While “unrelated” individuals will share some small fraction of their genome IBD by chance, one needs to effectively square that chance to obtain the fraction of their genome where *both* alleles are shared IBD. That said, the fact that coancestry estimation in unrelated individuals has proven so effective suggests that fraternity estimation may be valuable.

**3)** For common, human diseases, very few genetic variants with dominance effects have been identified. I apologize that my reference for this is an unhelpful google scholar search; I am officially presenting the lack of findings as evidence that there’s very little (such is the nature of publication bias)

Furthermore, the basic analysis has already been done and was published by the Visscher lab this past March. Using the homozygous covariance sharing matrix (see here or here for derivations), Zhu and Yang demonstrate that across ~80 human traits (although not disease), dominance contributes little, if any, to trait heritability.

So why bother? Well there are a couple of reasons, none of which are particularly good on their own, but which together make a reasonable case for choosing this as a topic for what is effectively glorified coursework that is, per requirement, entirely unrelated to my actual thesis. First, the methods above make the assumption that individuals are not inbred. At all. At the limit where we care about estimating the very low probability of two individuals being homozygous for the same allele, it seems strange to make the assumption that their parents could only share alleles across pairs, but not within pairs. And there are populations (comprising perhaps 14% of all of humanity) where consanguineous marriage is common, and prevalence of neurogenetic disorders is increased in these populations. Hardy-Weinberg disequilibrium being expected in such pedigrees, it is entirely possible that dominant effects which have little impact in other populations may contribute somewhat more substantially in these clades. Second, the observation that dominance is not a significant factor for trait heritability applied to a number of quantitative physical traits; but the analysis was not extended to diseases, and in particular complex diseases with two-hit or multi-hit hypotheses. Requiring that two copies of a gene (or two copies of multiple genes) to be affected by mutation is the very definition of dominance. Third, just as genome-wide heritability estimates can be improved (or made more accurate) by restricting the calculation to fixed segments (e.g. coding, regulatory), the same is true of fraternity coefficients, and may yield interesting results. Finally, the methods used for fraternity estimation estimate an allelic covariance, but *not* . The latter is more interesting to me at the moment because it reflects a classical genetic idea; but more importantly requires (in conjunction with inbreeding) multi-allelic sites (or full haplotypes) to calculate. I’ve never had a good opportunity to deal directly with statistical phasing (mostly because there are great algorithms out there already); but the fact that the haplotype assignment is, in this case, a *nuisance parameter*, gives me a good excuse to finally implement an HMM.

Over the next few weeks, I’ll develop this idea into a mathematical estimator; then into a full algorithm; and finally test it on simulated and real data. We’ll see if it works.

## Leave a Reply