Previous Work: The Berkeley Data Analysis System
In this project, researchers at ICSI are extending and applying recent work on randomized algorithms for matrix-based machine learning problems to the computational infrastructure recently developed at the AMPLab, UC Berkeley. One of the challenges in large-scale machine learning is that MapReduce/Hadoop does not perform well for iterative algorithms that are common in matrix-based machine learning. Examples of such iterative algorithms include common algorithms for least-squares approximation, least absolute deviations approximation, low-rank matrix approximation, etc. Spark is a software framework that has been developed at the AmpLab to deal with these and other issues, and there are already efforts underway at the AMPLab to do more sophisticated matrix and graph algorithms within that framework. ICSI scientists will collaborate with researchers from AMPLab and IBM to implement randomized matrix algorithms in this framework and apply it to realistic machine learning use cases.