News
Published Jan. 12, 2016
The National Science Foundation has awarded ICES Professor Inderjit Dhillon $1.2 million to develop scalable and sophisticated machine learning methods to analyze big data.
"With an ever-growing ability to collect and archive data, massive data sets are becoming increasingly common," said Dhillon, a professor of computer science. "These data sets are often too big to fit into the main memory of a single computer. So there is a great need for developing scalable and sophisticated machine learning methods for their analysis."
Over the next four years, his research through the ICES Center for Big Data Analytics will further develop his open source software, known as NOMAD, that enables practitioners in different application areas to quickly solve big data problems.
Previous attempts have sought to distribute big data computation across multiple machines. However, the stochastic optimization and inference algorithms that are so effective for large-scale machine learning appear to be inherently sequential, which makes the analysis so slow it's impractical.
By contrast, Dhillon, who directs the ICES Center for Big Data Analytics, has begun developing a novel "nomadic" framework that overcomes this barrier.
"This will be done by showing that many modern machine learning problems have a certain "double separability" property," he says. "The aim is to exploit this property to develop convergent, asynchronous, distributed, and fault tolerant algorithms that are well-suited for the commodity hardware prevalent on today's cloud computing platforms."
Specifically, over a four-year period, Dhillon's group will develop: (1) parallel stochastic optimization algorithms for the multi-machine cloud computing setting; (2) theoretical guarantees of convergence; (3) open source code under a permissive license; and (4) application of these techniques to a variety of problem domains such as topic models and mixture models.