Date of Award

Spring 2025

Project Type

Dissertation

Program or Major

Statistics

Degree Name

Doctor of Philosophy

First Advisor

Qi zhang

Second Advisor

Linyuan Li

Third Advisor

Pei Geng

Abstract

Rare features are predictor variables with excessively low rates of nonzeros. It is not uncommon to encounter rare features in settings where data is quantified through one-hot encoding, such as text mining data or genomic data. Rare features pose problems for classic regression techniques due to instability of effect estimates. The problem is compoundedwhen the dimension of the feature space is high. Yan and Bien (2020) explored methods for aggregating rare features in high dimensions by leveraging side-information about relations between features that can be organized as a tree graph. While their work is restricted to standard Gaussian regression, we aim to attack the rare feature aggregation problem for the Generalized Linear Model (GLM) setting. Additionally, we explore the use of a more general graph structure by considering bipartite graph representations of known group memberships of effects.

Share

COinS