Date of Award

Spring 2022

Project Type


Program or Major

Public Policy

Degree Name

Master of Science

First Advisor

Semra Aytur

Second Advisor

John McInally

Third Advisor

Brian Paciotti


This thesis aims to use health care domain knowledge, statistical techniques, and machine learning methods to conduct an exploratory real-world evidence study of the characteristics of the Health Care for the Homeless of Manchester, NH (HCHM) clinics’ patients in collaboration with academic and clinic partners and the public and community health stakeholders supporting their work. By constructing and analyzing a multivariate feature set created from a sample of anonymized patient data from January 1, 2018, through December 31, 2019, I hope to use machine learning methods to accurately represent 2,265 HCHM clinic patients experiencing homelessness or housing insecurity during the period. By regularly collaborating with analytics and clinical experts at HCHM, I hope to accurately describe the clinics’ service populations and aid staff in identifying care gaps, enabling the enrichment of future interventions for homeless people in the primary care setting. By engaging in strategic science (Bunnell, Ryan & Kent, 2021), I hope to reduce bias around the study of this vulnerable population. The study period pre-dates the COVID-19 pandemic and is designed to provide a baseline analysis that will allow for future comparisons of HCH patients’ sub-population characteristics and health care needs before, during, and after the pandemic.

The introduction outlines the public health crisis of homelessness in our country, connects the goal of providing care for people experiencing homelessness with the ongoing work of ensuring health equity, introduces the National Health Care for the Homeless Council and its care paradigm, and describes care provided by the Manchester, NH clinics within the city context. The chapter on Data describes the data sources used to create the aggregated data set and the data safeguards put in place to protect the privacy and dignity of people whose medical records were used in the study. The Feature Development section details the dataset cleaning process and the development of the multivariate features, including local weather-based features and the creation of ICD-10 code-based condition categories specific to the challenges of persons experiencing homelessness. The Description chapter provides descriptive statistics related to the patient sample and outlines the health risks of clinic patients. The modeling goal was to utilize the full feature set, without removing outliers, to describe the variation in characteristics of clinic patients and group them into meaningful sub-populations by their utilization patterns. The Modeling section provides a detailed discussion of model evolution, and details about the dimension reduction and clustering algorithms applied to partition the data into service groups with specific characteristics, and how those characteristics were discoverable. The Service Groups chapter outlines the relationships between discovered clusters and patient service groups validated by HCH partners. The Discussion and Limitations chapter expands on and summarizes how the insights gleaned from this study may be helpful to the clinics, the community, the clients, and the health care system in providing future care to people experiencing homelessness and advancing health equity. It then discusses the limitations of the data, features, approach, and algorithms used in the study. It touches on study generalizability and ethics and bias considerations in research and algorithmic use and how these considerations were applied here. The thesis concludes with an endorsement of directions for building upon this work in the future.