Date of Award

Spring 2024

Project Type

Dissertation

Program or Major

Mathematics

Degree Name

Doctor of Philosophy

First Advisor

Kevin M Short

Second Advisor

Gregory P Chini

Third Advisor

Mark Lyon

Abstract

Neural network applications are everywhere in our lives today. We can now design and train large neural networks with billions of parameters for a multitude of complex tasks. However, it is still extremely challenging for us to explain the theoretical underpinnings that have led to the successful use of neural networks for a wide range of applications. This dissertation focuses on the mathematical foundations of neural networks.

In Chapter 2, we investigate the error function of fully connected neural networks with the rectifier linear unit (ReLU) activation functions. We prove that for any linear neural network with a single hidden layer, all critical points are global minima. However, for nonlinear neural networks, part of the critical points of the error function come from the least squares solutions of certain local fitting problems, which can lead to a better understanding of the occurrence of local minima.

Chapter 3 first gives a straightforward proof of the universal approximation theorem for ReLU neural networks. Based on this proof, we propose a tensor product-based method for converting a neural network trained with sigmoid activation functions directly into a ReLU neural network and provide some upper bounds for the error estimation of this method.

In Chapter 4, we introduce the concept of the null space of nonlinear maps and analyze the null space of different types of neural networks, which reveals inherent weaknesses exploitable in neural networks. We present an application of the null space analysis to image steganography, showcasing experiments on some common datasets. This technique enables us to hide images within other images, which causes the neural network to classify based on the hidden image.

Chapter 5 introduces the concept of intrinsic dimension, describing the minimum dimension of a linear subspace that is needed to approximate a nonlinear relationship between input and output data effectively. Through two designed experiments, we observe that the first hidden layer of a fully connected neural network typically learns the most important subspace corresponding to the intrinsic dimension of a data set. We also offer some discussions on potential applications, including quantifying the overfitting problem, based on the observations in this chapter.

The thesis concludes with some discussion based on the learnings and gives an outline of some future work.

Share

COinS