# An empirical study of covariate adjustment and confounder-selection strategies in logistic regression

Logistic regression is commonly used in epidemiological research because it allows the analyst to adjust for additional covariates when estimating the relationship between an exposure and disease. One type of covariate, known by epidemiologists as a confounder of the exposure-disease relationship, is of particular concern. The effects of covariate adjustment in logistic regression are often assumed to be identical to the effects observed in classical linear regression, but others have suggested that the effects differ in terms of the bias, precision, and efficiency of the adjusted exposure estimate. The first objective of the present study was to empirically demonstrate the effects of covariate adjustment in logistic regression analyses of cohort data and to examine the distribution of the difference between the crude and adjusted estimates. The second objective was to compare the following four strategies that have been used to identify confounders and other covariates for adjustment in logistic regression models: the percent change-inestimate method; the significance test of the covariate; the significance test of the difference between the crude and adjusted exposure effect estimates; and the significance test of the product of the exposure-covariate and covariate-disease effect estimates. The results of the simulation study suggest that the type I error rate for testing the exposure estimate is often inflated when any of the selection strategies is used, supporting the notion that a priori information should be utilized whenever available. In the absence of such information, however, a selection-strategy is desirable. If the goal of the analysis is to adjust for covariates that will provide the least biased and most efficient exposure estimate, then the percent change-in-estimate strategy performs best at a sample size of 100 and the significance test of the covariate strategy performs best at a sample size of 500. If the goal of the analysis is to identify covariates that cause a change in the exposure estimate due specifically to confounding or mediating effects, then the significance test of the product is more appropriate