Critical values of Mahalanobis' D-Squared for detection of multivariate outliers
Description
The question of how to detect multivariate outliers has presented both philosophical and statistical problems. The method most widely used for the detection of multivariate outliers is Mahalanobis' D-Squared statistic (D$\sp2$), commonly viewed as analogous to a univariate standard score. D$\sp2$ is simple to calculate, and its asymptotic distribution is known to be the Chi-square distribution (Barnett, 1976, 1978a, 1978b, 1979; Beckman & Cook, 1983; Hawkins, 1974, 1980). Additionally, D$\sp2$ or D$\sp2$/df is available from major statistical packages such as BMDP and SPSSX. The distribution of D$\sp2$s calculated using the sample centroid and variance-covariance matrix is thought to be mathematically intractable (Barnett, 1984; Wilks, 1963). Some researchers have suggested the use of ordered Chi-square (Barnett, 1984; Beckman & Cook, 1983; Hawkins, 1980) or ordinary Chi-square (Comrey, 1984; Rasmussen, 1988; Tabachnick & Fidell, 1983) critical values for evalaution of D$\sp2$ in the detection of outliers. Tables of ordered Chi-square critical values are not available, thus, it was necessary to compute these values for the present study. This study examined the fit of D$\sp2$ with the Chi-square and ordered Chi-square distributions, via Monte Carlo Methods, and determined that neither provided accurate critical values. Consequently, critical values were generated empirically. The resulting tables of critical values cover the largest 25% of ordered D$\sp2$ for the conditions resulting from a full factorial cross of numbers of subjects (20, 30, 40, 50, 100, 200, 300, 500, 1000), and numbers of variables (2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 50)