“Nonparametric Tests to Detect Relationship between Variables in the Presence of Heteroscedastic Treatment Effects”

Abstract: Statistical tools to detect nonlinear relationship between variables are commonly needed in various practices. This dissertation will first present a test of independence between a response variable, which can be discrete or continuous, and a continuous covariate after adjusting for heteroscedastic treatment effects. The method first involves augmenting each pair of the data for all treatments with a fixed number of nearest neighbors as pseudo-replicates. A test statistic is then constructed by taking the difference of two quadratic forms equivalent to the average lagged correlations between the response and nearest neighbor local estimates of the conditional mean of response given the covariate for each treatment level. Using such differences eliminate the need to estimate any nonlinear regression function, consequently reducing the computational time. The asymptotic distribution of the proposed test statistic is obtained under the null hypothesis of independence. Although using a fixed number of nearest neighbors poses significant difficulty in the inference compared to that allowing the number of nearest neighbors to go to infinity, the parametric standardizing rate is obtained for the proposed test statistics. Numerical studies show that the new test procedure not only maintains the intended type I error rate, but also has robust power to detect nonlinear dependency in the presence of outliers that might result from highly skewed distributions.

The second part of the dissertation discusses the theory and numerical studies for testing the nonparametric effects of no covariate-treatment interaction and no main covariate based on the decomposition of the conditional mean of regression function that is potentially nonlinear. A similar test but with the number of pseudo-replicates going to infinity was discussed in Wang and Akritas (2006) for the effects defined through the decomposition of the conditional distribution function. Consequently, their test statistics have a slow convergence rate. In addition, their hypotheses are not suitable to describe the behavior of the observations. The model and tests developed in the second part of this dissertation overcome the limitation of the tests in Wang and Akritas (2006) in both the convergence rate and computational speed. Using a similar approach to part one and two of the dissertation, the last part of the dissertation develops theory and numerical studies to test for no covariate-treatment interaction, no simple covariate and no main covariate effects for cases when the number of factor levels $(a)$ and thus the number of covariate values $(n_i)$ are large.