Ομιλητής: Anastasios Kyrillidis, Noah Harding Assistant Professor, Computer Science Department, Rice University
Ημερομηνία-χώρος: Παρασκευή 15 Απριλίου, 3-5μμ, ΤΜΗΥΠ, αμφιθέατρο Γ
Περίληψη: Non-convex optimization lies at the heart of many engineering applications with far-reaching societal impacts, especially through the wave that machine learning/artificial intelligence triggers: physics, healthcare, biology, software engineering, chemistry and materials science, among other areas. However, given the lack of theory, practitioners often simply follow trial-and-error procedures, leading to heuristics. Characterizing when heuristics turn out to be provable algorithms is one pressing need for the scientific community, and indeed society as a whole. This talk will focus on two scenarios where non-convex optimization, along with theory, accelerate problem solving in several scenarios. In the first part, I will focus on the problem of low rank matrix inference in large-scale settings. Such problems appear in fundamental applications such as structured inference, recommendation systems and multi-label classification problems. I will introduce a novel theoretical framework for analyzing the performance of non-convex first-order methods, often used as heuristics in practice. These methods lead to computational gains over classic convex approaches, but their analysis is unknown for most problems. This talk will provide precise theoretical guarantees, answering the long-standing question “why such non-convex techniques behave well in practice?” for a wide class of problems. In the second part, we will change gears. Distributed machine learning (ML) can bring more computational resources to bear than single-machine learning, thus enabling reductions in training time. In practice though, distributed ML is challenging when distribution is mandatory, rather than chosen by the practitioner. I will describe a new approach to distributed fully connected neural network learning, called independent subnet training (IST), to handle these cases. In IST, the original network is decomposed into a set of narrow subnetworks with the same depth. These subnetworks are then trained locally before parameters are exchanged to produce new subnets and the training cycle repeats. Such a naturally “model parallel’’ approach limits memory usage by storing only a portion of network parameters on each device. Additionally, no requirements exist for sharing data between workers (i.e., subnet training is local and independent) and communication volume and frequency are reduced by decomposing the original network into independent subnets. These properties of IST can cope with issues due to distributed data, slow interconnects, or limited device memory, making IST a suitable approach for cases of mandatory distribution. This talk will provide results on MLPs, ResNets, CNNs, efficient pretraining tasks, GCNs as well as some theoretical guarantees.
- Optimization for machine learning
- Convex and non-convex algorithms and analysis Large-scale optimization
- Any problem that includes a math-driven criterion and requires an efficient method for its solution.