Date of Award
Spring 2025
Project Type
Dissertation
Program or Major
Statistics
Degree Name
Doctor of Philosophy
First Advisor
Qi zhang
Second Advisor
Linyuan Li
Third Advisor
Pei Geng
Abstract
Rare features are predictor variables with excessively low rates of nonzeros. It is not uncommon to encounter rare features in settings where data is quantified through one-hot encoding, such as text mining data or genomic data. Rare features pose problems for classic regression techniques due to instability of effect estimates. The problem is compoundedwhen the dimension of the feature space is high. Yan and Bien (2020) explored methods for aggregating rare features in high dimensions by leveraging side-information about relations between features that can be organized as a tree graph. While their work is restricted to standard Gaussian regression, we aim to attack the rare feature aggregation problem for the Generalized Linear Model (GLM) setting. Additionally, we explore the use of a more general graph structure by considering bipartite graph representations of known group memberships of effects.
Recommended Citation
Duckett, Matthew, "Logistic Regression and Cox Hazard Modeling with Sparse High Dimensional Data via Elastic Net Regularization and Graph-Guided Aggregation" (2025). Doctoral Dissertations. 2910.
https://scholars.unh.edu/dissertation/2910