In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints.[1][2]
Regression beta coefficient estimates from the Liang-Zeger GEE are consistent, unbiased, and asymptotically normal even when the working correlation is misspecified, under mild regularity conditions. GEE is higher in efficiency than generalized linear models (GLMs) in the presence of high autocorrelation.[1] When the true working correlation is known, consistency does not require the assumption that missing data is missing completely at random.[1] Huber-White standard errors improve the efficiency of Liang-Zeger GEE in the absence of serial autocorrelation but may remove the marginal interpretation. GEE estimates the average response over the population ("population-averaged" effects) with Liang-Zeger standard errors, and in individuals using Huber-White standard errors, also known as "robust standard error" or "sandwich variance" estimates.[3] Huber-White GEE was used since 1997, and Liang-Zeger GEE dates to the 1980s based on a limited literature review.[4] Several independent formulations of these standard error estimators contribute to GEE theory. Placing the independent standard error estimators under the umbrella term "GEE" may exemplify abuse of terminology.
GEEs belong to a class of regression techniques that are referred to as semiparametric because they rely on specification of only the first two moments. They are a popular alternative to the likelihood-based generalized linear mixed model which is more at risk for consistency loss at variance structure specification.[5] The trade-off of variance-structure misspecification and consistent regression coefficient estimates is loss of efficiency, yielding inflated Wald test p-values as a result of higher variance of standard errors than that of the most optimal.[6] They are commonly used in large epidemiological studies, especially multi-site cohort studies, because they can handle many types of unmeasured dependence between outcomes.