Geometric Sparsity in High Dimension
 Publication Year:
 2012

 Bepress 84

 Bepress 8
 Repository URL:
 https://scholar.colorado.edu/math_gradetds/15
 Author(s):
 Tags:
 Geometry; Highdimensional data; Noise; Sparsity; Applied Mathematics
thesis / dissertation description
While typically complex and highdimensional, modern data sets often have a concise underlying structure. This thesis explores the sparsity inherent in the geometric structure of many highdimensional data sets.Constructing an efficient parametrization of a large data set of points lying close to a smooth manifold in high dimension remains a fundamental problem. One approach, guided by geometry, consists in recovering a local parametrization (a chart) using the local tangent plane. In practice, the data are noisy and the estimation of a lowdimensional tangent plane in high dimension becomes ill posed. Principal component analysis (PCA) is often the tool of choice, as it returns an optimal basis in the case of noisefree samples from a linear subspace. To process noisy data, PCA must be applied locally, at a scale small enough such that the manifold is approximately linear, but at a scale large enough such that structure may be discerned from noise.We present an approach that uses the geometry of the data to guide our definition of locality, discovering the optimal balance of this noisecurvature tradeoff. Using eigenspace perturbation theory, we study the stability of the subspace estimated by PCA as a function of scale, and bound (with high probability) the angle it forms with the true tangent space. By adaptively selecting the scale that minimizes this bound, our analysis reveals the optimal scale for local tangent plane recovery. Additionally, we are able to accurately and efficiently estimate the curvature of the local neighborhood, and we introduce a geometric uncertainty principle quantifying the limits of noisecurvature perturbation for tangent plane recovery. An algorithm for partitioning a noisy data set is then studied, yielding an appropriate scale for practical tangent plane estimation.Next, we study the interaction of sparsity, scale, and noise from a signal decomposition perspective. Empirical Mode Decomposition is a timefrequency analysis tool for nonstationary data that adaptively defines modes based on the intrinsic frequency scales of a signal. A novel understanding of the scales at which noise corrupts the otherwise sparse frequency decomposition is presented. The thesis concludes with a discussion of future work, including applications to image processing and the continued development of sparse representation from a geometric perspective.