Scalable linear algebra and neural network theory
While deep learning methods have in no doubt transformed certain applications of machine learning (ML) such as Computer Vision (CV) and Natural Language Processing (NLP), its promised impact on many other areas has yet to be seen. The reason for this is the flip side of why it has been successful where it has. In the applications where it has had the most remarkable progress, researchers have adopted the following strategy: get large quantities of data; train a Neural Network (NN) model using stochastic first order methods; and implement and apply the model in a user-facing industrial application. There are many well-known limitations with this general approach, ranging from the need for large quantities of data and obscene compute resources to interpretability and robustness issues. This research aims to address a central technical issue underlying this approach, namely: While linear algebraic techniques are central to the design and use of modern NN models, current methodology uses linear algebra in relatively superficial ways, e.g., matrix multiplication for finding stochastic gradient directions and backpropagating errors in a scalable manner, and if we have stronger control over the complementary algorithmic and statistical basis of linear algebraic methods, we will have a much smaller theory-practice gap, and we will have a more practical theory to guide NN practice in a broad range of applications beyond CV and NLP.
This research is in collaboration with researchers at Stanford University and is funded by a grant from NSF, The National Science Foundation.