Scaling Federated Training on Distributed Infrastructures
TL;DR: Presentation on system challenges when deploying federated training infrastructures and guidelines.
Abstract:
Federated Learning has become the defacto standard for decentralised privacy-preserving DNN training. Nonetheless scaling federated learning remains challenging, either it be a simulation on a large cluster of machines or transitioning from a simulated environment to a production environment with real mobile devices.
In this talk, we will first discuss the practical differences between scaling federated learning through simulations, on the cloud or on real mobile and IoT devices. We will include practical considerations when deploying and scaling these systems under these setups. From targeting different hardware and deploying on a large number of machines, all the way to modelling data distribution and optimizing the federated learning strategy parameters to improve resource ultilization and enable large scale experiments.