Ming Yan receives grant from NSF Division of Mathematical Sciences as Principal Investigator
- Mar 26, 2017
In the last two decades, the size of data sets in a large number of areas has grown quickly. In many applications of machine learning, there are massive amounts of training data sets and the data sets may be collected and stored at different locations. Learning a model from these data sets imposes high demands for computation, memory, and data transfer on algorithms. Asynchronous parallel algorithms are applied to solve these large-scale problems via high performance computing and reduced communication and idle time. The performance of asynchronous parallel algorithms is improved largely comparing to synchronous parallel algorithms, especially when the number of cores is large. However, theoretical analysis on the convergence and convergence rates of these algorithms still investigation.
In this proposal, the PI will develop fast and robust generic asynchronous parallel stochastic frameworks with provable convergence for solving large-scale fixed point problems that have applications in a large number of areas. One objective is to develop asynchronous stochastic algorithms for finding a zero point of a random operator, the sum of a random operator and a deterministic operator, and the sum of two random operators and show the convergence of these algorithms. Another objective is to couple coordinate updates into these asynchronous stochastic algorithms and show their convergence. The last objective is to implement these algorithms and develop software to help people without knowledge about parallel computing run asynchronous algorithms. The research in solving fixed point problems is motivated by problems in various computational sciences and engineering, and its development benefits all these fields by providing fast and robust algorithms. Areas impacted by the proposed work include machine learning, optimization, optimal control, statistics, finance, signal and image processing, compressive sensing, as well as other lines of research involving large data sets and distributed data.