Date of Award

Fall 2007

Project Type


Program or Major

Computer Science

Degree Name

Doctor of Philosophy

First Advisor

Philip J Hatcher


GlySpy is a suite of algorithms used to determine the structure of glycans. Glycans, which are orderly aggregations of monosaccharides such as glucose, mannose, and fucose, are often attached to proteins and lipids, and provide a wide range of biological functions. Previous biomolecule-sequencing algorithms have operated on linear polymers such as proteins or DNA but, because glycans form complicated branching structures, new approaches are required. GlySpy uses data derived from sequential mass spectrometry (MSn), in which a precursor molecule is fragmented to form products, each of which may then be fragmented further, gradually disassembling the glycan. GlySpy resolves the structures of the original glycans by examining these disassembly pathways.

The four main components of GlySpy are: (1) OSCAR (the Oligosaccharide Subtree Constraint Algorithm), which accepts analyst-selected MSn disassembly pathways and produces a set of plausible glycan structures; (2) IsoDetect, which reports the MSn disassembly pathways that are inconsistent with a set of expected structures, and which therefore may indicate the presence of alternative isomeric structures; (3) IsoSolve, which attempts to assign the branching structures of multiple isomeric glycans found in a complex mixture; and (4) Intelligent Data Acquisition (IDA), which provides automated guidance to the mass spectrometer operator, selecting glycan fragments for further MSn disassembly.

This dissertation provides a primer for the underlying interdisciplinary topics---carbohydrates, glycans, MSn, and so on-and also presents a survey of the relevant literature with a focus on currently-available tools. Each of GlySpy's four algorithms is described in detail, along with results from their application to biologically-derived glycan samples. A summary enumerates GlySpy's contributions, which include de novo glycan structural analysis, favorable performance characteristics, interpretation of higher-order MSn data, and the automation of both data acquisition and analysis.