Variable Selection and Estimation for Error-Prone Regression Models
High-dimensional data has become popular in recent years. Potential challenges induced by high-dimensional data include measurement error and the involvement of irrelevant variables. It is known that the tremendous biases and wrong results would occur if measurement error was ignored and/or irrelevant variables were falsely included. As a result, adjusting measurement error effects and doing variable selection are crucial issues when constructing regression models. With error-prone and non-informative covariates accommodated, in this study, I am going to consider regression models with two types of complex responses: (1) binary responses subject to misclassification and (2) incomplete responses induced by censoring effects. To deal with measurement error, a corrected estimation function or an application the SIMEX procedure are proposed. After that, the boosting algorithm is employed to address variable selection. Unlike conventional regularization methods, the boosting method does not require to handle non-differentiable penalty functions and is valid to handle general estimating functions. Moreover, theoretical results, including convergence of the boosting algorithm, consistency, and asymptotic normality, are well established to justify the validity of the proposed method. Numerical studies are conducted to assess the performance of the proposed method, and numerical findings show that the proposed method is useful to correct for measurement error effects and accurately detect important variables.