Ignacio Cano, Markus Weimer, Dhruv Mahajan, Carlo Curino and Giovanni Matteo Fumarola


In large organizations, data is “born” in data centers all around the world. Learning requires a global view of such data. This new class of geo-distributed machine learning (GDML) applications need to cope with: 1) scarce and expensive cross-data center bandwidth, and 2) growing privacy concerns that are pushing for stricter data sovereignty regulations.

In this paper, we formalize this problem, show that the current state-of-the-art lacks proper support for GDML applications, and propose an initial system and algorithm that perform training in a geo-distributed fashion. Our empirical evaluation confirms the general validity of our approach, but many research challenges remain open.

Download PDF

An extended version of the paper is available on as arXiv:1603.09035


  title={Towards Geo-Distributed Machine Learning},
  author={Cano, Ignacio and Weimer, Markus and Mahajan, Dhruv and Curino,Carlo and Matteo Fumarola, Giovanni},
  booktitle={Learning Systems Workshop at NIPS 2015},