Leverage Scores in Large-Scale Machine Learning and Data Analysis
Petros Drineas
Rensselaer Polytechnic Institute
Tuesday, February 18
2:30 p.m., Lecture Hall
The Singular Value Decomposition (SVD) of matrices and the related Principal Components Analysis (PCA) are workhorses of machine learning and data analysis. They express a matrix in terms of eigenvectors, which are optimal in many ways; but they are linear combinations of all the input data, and thus they lack an intuitive physical interpretation, which is also of interest in many machine learning and data analysis applications. Motivated by the application of PCA and SVD in the analysis of populations genetics data, we will discuss the notion of leverage scores: a simple statistic that reveals columns/rows of a matrix that lie in the subspace spanned by the top principal components (left/right singular vectors). We will then use the leverage scores to present matrix decompositions that express the structure in a matrix in terms of actual columns (and/or rows) of the matrix. Such decompositions are easier to interpret in applications, since the selected columns and rows are subsets of the data. We will also discuss extensions of the leverage scores to reveal influential entries of a matrix.