Dimension Estimation using Random Connection Models
In statistics we often want to discover (sometimes impose) structure on observed data, and dimension plays a crucial role in this task. The setting that I consider in this talk is the following: some high-dimensional data has been collected but it (potentially) lives in some lower dimensional space (this is called the intrinsic dimension of the dataset.) We only assume to have access to a certain graph where each vertex represents an observation, and there is an edge between two vertices if the corresponding observations are close in some metric. The goal is to estimate the intrinsic dimension of the high-dimensional dataset from this graph only. I will give some conditions under which the dimension can be estimated consistently, and some bounds on the probability of correctly recuperating an integer dimension. I will also show some numerical results and compare our estimators with some competing approaches from the literature.
Dimensionality reduction techniques (e.g., PCA, manifold learning) usually rely on knowledge about intrinsic dimension. Knowledge about dimension is also important to try to avoid the curse of dimensionality. From a computational perspective, the dimension of a dataset has impact in terms of the amount of space needed to store data (compressibility). The speed of algorithms is also commonly affected by the dimension of input data.
This is joint work with Michel Mandjes.
KdVI meeting room, room F3.20