# Type I and type II error in hierarchical analysis of variance using logistic regression for dichotomous dependent variables

## Description

Hierarchical models seek to explain relations between individual level and aggregate level data. A new method of analysis for hierarchical data in which the outcome variable is dichotomous has been developed (Pullum, 1991). This method partitions the variance into a within-cluster component and a between-cluster component, and then uses logistic regression models to explain the variance in the two components in a unified equation The question arises as to how the number of contexts and the number of individuals within those contexts affect the power of Pullum's model when there are different degrees of dependence between the outcome variable and the predictors at both the individual level and the cluster level of the model. Also, how does Pullum's model differ in terms of power with the power obtained with conventional logistic regression models To answer these questions, this study uses simulation methodology in which random data sets are generated with constant total samples of 1000 and 500 but varying numbers of individuals per cluster and number of clusters. Situations in which varying degrees of dependence between a dichotomous dependent variable, and dichotomous individual level and cluster level predictors are simulated. These data sets are then analyzed using both Pullum's technique and conventional logistic regression. The improvement Chi-square statistics for these models are determined for each term in the two models. Each randomly generated data set is simulated 1000 times, and the probability of rejecting the null hypothesis (there is no association between outcome and predictor) is calculated. These probabilities are used to determine the Type I and Type II error rates for the two models Results show that the two methods do not differ when there are adequate numbers of individuals per cluster and adequate numbers of clusters. However, the hierarchical analysis is more conservative than the conventional method in the detection of individual level and interaction effects for situations in which there are a small number of individuals per cluster and in the detection of cluster level effects for the individual level variable when there are a small number of clusters