{"id":530,"date":"2020-08-20T16:17:26","date_gmt":"2020-08-20T16:17:26","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/08\/20\/introduction-to-federated-learning\/"},"modified":"2020-08-20T16:17:26","modified_gmt":"2020-08-20T16:17:26","slug":"introduction-to-federated-learning","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/08\/20\/introduction-to-federated-learning\/","title":{"rendered":"Introduction to Federated Learning"},"content":{"rendered":"<div id=\"post-\">\n<p>There are over 5 billion mobile device users all over the world. Such users generate massive amounts of data\u2014via cameras, microphones, and other sensors like accelerometers\u2014which can, in turn, be used for building intelligent applications. Such data is then collected in data centers for training machine\/deep learning models in order to build intelligent applications.<\/p>\n<p>However, due to data privacy concerns and bandwidth limitations, common centralized learning techniques aren\u2019t appropriate\u2014users are much less likely to share data, and thus the data will be only available on the devices.<br \/>This is where\u00a0<strong>federated learning<\/strong>\u00a0comes into play. According to Google\u2019s research paper titled,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1602.05629\" rel=\"noopener noreferrer\" target=\"_blank\">Communication-Efficient Learning of Deep Networks from Decentralized Data<\/a>\u00a0[1], the researchers provide the following high-level definition of federated learning:<\/p>\n<blockquote>\n<p>\n<em>A learning technique that allows users to collectively reap the bene\ufb01ts of shared models trained from [this] rich data, without the need to centrally store it. We term our approach Federated Learning, since the learning task is solved by a loose federation of participating devices (which we refer to as\u00a0<\/em><strong><em>clients<\/em><\/strong><em>) which are coordinated by a\u00a0<\/em><strong><em>central server<\/em><\/strong><em>.<\/em>\n<\/p>\n<\/blockquote>\n<p>The outline of the article is as follows:<\/p>\n<ul>\n<li>Data is Available Everywhere\n<\/li>\n<li>What is Federated Learning?\n<\/li>\n<li>Steps for Federated Learning\n<\/li>\n<li>Properties of Problems Solved using Federated Learning\n<\/li>\n<li>Federated Averaging Algorithm\n<\/li>\n<\/ul>\n<p>Let\u2019s get started.<\/p>\n<p>\u00a0<\/p>\n<h3>Data is Available Everywhere<\/h3>\n<p>\u00a0<br \/>Living in the data era, data is a primary requirement for building intelligent applications. From where and how to get the data, then? The good news is that data is available everywhere\u2014the bad news is that much of said data is inaccessible.<\/p>\n<p>Mobile, embedded, and sensor-laden IoT devices are major sources of data nowadays. Being used frequently by its users and available by hand all time, mobile devices are the primary source of data.<\/p>\n<p>According to a recent\u00a0<a href=\"https:\/\/www.gsma.com\/mobileeconomy\" rel=\"noopener noreferrer\" target=\"_blank\">GSMA Mobile Economy<\/a>\u00a0report, the number of mobile users reached 5.2 billion in 2019 and was expected to increase to 5.8 billion by 2025. Out of the 5.2 billion mobile users, there are 3.8 users connected to the internet.<\/p>\n<p><img alt=\"Image for post\" class=\"aligncenter\" src=\"https:\/\/i.ibb.co\/ngF03zM\/gad-federated-0.jpg\" width=\"100%\"><br \/>\u00a0<\/p>\n<p>This means a couple of things. Internet connectivity is an indication of increased data generation, and that users will be in need of intelligent applications to have better experiences. Due to the existence of smart buildings, the\u00a0<a href=\"https:\/\/www.gsma.com\/mobileeconomy\" rel=\"noopener noreferrer\" target=\"_blank\">report<\/a>\u00a0also shows that 12 billion IoT devices are available in 2019 and expected to increase to 24.6 billion devices by 2025.<\/p>\n<p>According to a\u00a0<a href=\"https:\/\/www.pewresearch.org\/global\/2019\/02\/05\/smartphone-ownership-is-growing-rapidly-around-the-world-but-not-always-equally\" rel=\"noopener noreferrer\" target=\"_blank\">Pew Research Center<\/a>\u00a0report, the majority of such mobile devices are smart. The following figure shows a number of countries and the percentage of the adults who use smartphones:<\/p>\n<p><img alt=\"Image for post\" class=\"aligncenter\" src=\"https:\/\/i.ibb.co\/x2ZZcj5\/gad-federated-1.png\" width=\"70%\"><\/p>\n<p>The existence of such large numbers of data generators means data is indeed available everywhere. Each click by a mobile user adds more data about what interests the user and thus can be used to build intelligent applications with better UXs.<\/p>\n<p>To make use of the users\u2019 private data without revealing their privacy,\u00a0<strong>federated learning<\/strong>\u00a0comes into action.<\/p>\n<p>\u00a0<\/p>\n<h3>What is Federated Learning?<\/h3>\n<p>\u00a0<br \/>Federated learning is a new type of learning introduced by Google in 2016 in a paper titled\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1602.05629\" rel=\"noopener noreferrer\" target=\"_blank\">Communication-Efficient Learning of Deep Networks from Decentralized Data<\/a>\u00a0[1]. Besides the definition mentioned at the beginning of the article, let\u2019s add more explanation of federated learning.<\/p>\n<p>Federated learning is a relatively new type of learning that avoids centralized data collection and model training. In a traditional machine learning pipeline, data is collected from different sources (e.g. mobile devices) and stored in a central location (i.e. data center). Once all data is available at a center, a single machine learning model is trained by such data. Because the data must be moved from the users\u2019 devices to a central device for building and training the model, this approach is called\u00a0<strong>centralized learning<\/strong>.<\/p>\n<p>On the other hand,\u00a0<strong>federated learning<\/strong>\u00a0is about training multiple machine learning models on mobile devices (which are referred to as clients) and then combining the results of all such models into a single model that resides at a server. Thus, a model is trained on devices themselves using ground-truth data and just the trained model is shared with a server. This way the user\u2019s data is leveraged to build machine\/deep learning models while keeping data private.<\/p>\n<p>In this case, federated learning benefits from the users\u2019 data without revealing their privacy. The raw data is available at the users\u2019 devices and never moved to a data center\u2014but a model out of this data is created, which in turn is sent to the server.<\/p>\n<p>Using federated learning, the user\u2019s data is not uploaded to the server and thus there is no\u00a0<strong>DIRECT<\/strong>\u00a0access to the data, but there is still the possibility of the data being accessed. Privacy breaking in federated learning will be discussed in a later post.<\/p>\n<blockquote>\n<p>\nMachine learning is rapidly moving closer to where data is collected \u2014 edge devices.\u00a0<a href=\"https:\/\/www.fritz.ai\/newsletter?utm_campaign=fritzai-newsletter-scale6&amp;utm_source=heartbeat\" rel=\"noopener noreferrer\" target=\"_blank\">Subscribe to the Fritz AI Newsletter to learn more about this transition and how it can help scale your business.<\/a>\n<\/p>\n<\/blockquote>\n<p>\u00a0<\/p>\n<h3>Steps for Federated Learning<\/h3>\n<p>\u00a0<br \/>Federated learning in theory is fairly simple and can be summarized in the following steps:<\/p>\n<ol>\n<li>A generic (shared) model is trained server-side.\n<\/li>\n<li>A number of clients are selected for training on top of the generic model.\n<\/li>\n<li>The selected clients download the model.\n<\/li>\n<li>The generic model is trained on the devices, leveraging the users\u2019 private data, based on an optimization algorithm like the stochastic gradient descent.\n<\/li>\n<li>A summary of the changes made to the model (i.e. weights of the trained neural network) is sent to the server.\n<\/li>\n<li>The server aggregates the updates from all devices to improve the shared model. Update aggregation is done using a new algorithm called the\u00a0<strong>federated averaging algorithm<\/strong>.\n<\/li>\n<li>The process of sending the generic model to mobile devices and updating them according to the received summary of updates is repeated.\n<\/li>\n<\/ol>\n<p>The previous steps are summarized in the next figure, based on a\u00a0<a href=\"https:\/\/ai.googleblog.com\/2017\/04\/federated-learning-collaborative.html\" rel=\"noopener noreferrer\" target=\"_blank\">blog post by Google research scientists<\/a>:<\/p>\n<ul>\n<li>\n<strong>A<\/strong>. Your phone personalizes the model locally, based on your usage.\n<\/li>\n<li>\n<strong>B<\/strong>. Many users\u2019 updates are aggregated.\n<\/li>\n<li>\n<strong>C<\/strong>. A change to the shared model is made according to the aggregated updates, after which the procedure is repeated.\n<\/li>\n<\/ul>\n<p><img alt=\"Image for post\" class=\"aligncenter\" src=\"https:\/\/i.ibb.co\/MMQgYj6\/gad-federated-2.png\" width=\"100%\"><\/p>\n<p>You can also watch\u00a0<a href=\"https:\/\/youtu.be\/gbRJPa9d-VU\" rel=\"noopener noreferrer\" target=\"_blank\">this video from Google<\/a>\u00a0that summarizes the definition of federated learning.<\/p>\n<p>\u00a0<\/p>\n<h3>Properties of Problems Solved using Federated Learning<\/h3>\n<p>\u00a0<br \/>According to the Google research paper [1], the ideal problems to be solved by federated learning have 3 properties. The first property is<\/p>\n<blockquote>\n<p>\n<em>Training on real-world data from mobile devices provides a distinct advantage over training on proxy data that\u2019s generally available in the data center.<\/em>\n<\/p>\n<\/blockquote>\n<p>When a single machine learning model is created at the server, it uses data from different users to create a single generic model. Because the users vary in how they use mobile devices, the model should be generic enough to cope with such variety.<\/p>\n<p>Unfortunately, the user experience will not be enhanced by a generic model, but instead by a customized model that seems created specifically for the device. Such personalization is achieved using federated learning and can provide the feeling that the device is created just for the user.<\/p>\n<p>The second property:<\/p>\n<blockquote>\n<p>\n<em>This data is privacy sensitive or large in size (compared to the size of the model), so it is preferable not to log it to the data center purely for the purpose of model training (in service of the focused collection principle).<\/em>\n<\/p>\n<\/blockquote>\n<p>It isn\u2019t practical to ask the user to upload large amounts of data to create a generic model at the server. This adds additional costs to the user. Also, the user is likely to reject uploading private data to help build a model, especially with applications that require sensitive user information. In cases where data is private or large in scale, federated learning is a good option compared to centralized learning.<\/p>\n<p>With this largeness in size, a new challenge is introduced to federated learning. Mobile devices\u2019 resources are limited. Working with large amounts of data will consume time and thus, more power.<\/p>\n<p>To work around this issue, there is a miniature version of TensorFlow suitable for on-device training called\u00a0<a href=\"https:\/\/www.tensorflow.org\/federated\" rel=\"noopener noreferrer\" target=\"_blank\">TensorFlow Federated<\/a>. Training will only take place when the device is IDLE, plugged into the charger, and have a free wireless connection.<\/p>\n<p>The third property is:<\/p>\n<blockquote>\n<p>\n<em>For supervised tasks, labels on the data can be inferred naturally from user interaction.<\/em>\n<\/p>\n<\/blockquote>\n<p>According to Google research paper [1], here are 2 problems that fit the previous 3 properties:<\/p>\n<blockquote>\n<p>\n1. Image classi\ufb01cation for predicting which photos are most likely to be viewed multiple times in the future or shared.<br \/>2. Language models which can be used to improve voice recognition and text entry on touch-screen keyboards by improving decoding, next-word-prediction, and even predicting whole replies.\n<\/p>\n<\/blockquote>\n<p>In supervised learning, a model is trained using labeled data so that the model knows the labels of all training samples. For federated learning to work with supervised learning, the labels of the user\u2019s private data must be available. Here\u2019s the explanation from the Google research paper:<\/p>\n<blockquote>\n<p>\n<em>The labels for the previous 2 problems are directly available: entered text is self-labeled for learning a language model, and photo labels can be de\ufb01ned by natural user interaction with their photo app (which photos are deleted, shared, or viewed).<\/em>\n<\/p>\n<\/blockquote>\n<p>\u00a0<\/p>\n<blockquote>\n<p>\nYou shouldn\u2019t have to be a machine learning expert to unlock its potential. Leave that expertise to us.\u00a0<a href=\"https:\/\/www.fritz.ai\/product\/platform.html?utm_campaign=buildmodels5&amp;utm_source=heartbeat\" rel=\"noopener noreferrer\" target=\"_blank\">Easily build mobile apps that see, hear, sense, and think with Fritz AI.<\/a>\n<\/p>\n<\/blockquote>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<h3>Federated Averaging Algorithm<\/h3>\n<p>\u00a0<br \/>According to the previous discussion, the server aggregates the changes (i.e. weights) received from all the devices. How is this aggregation applied? Using a new algorithm called the federated averaging algorithm.<\/p>\n<p>The devices train the generic neural network model using the gradient descent algorithm, and the trained weights are sent back to the server. The server then takes the average of all such updates to return the final weights. The following pseudocode shows how the federated averaging algorithm works.<\/p>\n<p><img alt=\"Image for post\" class=\"aligncenter\" src=\"https:\/\/i.ibb.co\/ZKW0bQ5\/gad-federated-3.jpg\" width=\"100%\"><\/p>\n<p>At the server,\u00a0<code>K<\/code>\u00a0clients are selected, which are indexed by the variable\u00a0<code>k<\/code>. In parallel, all clients update the generic model weights according to the\u00a0<code>ClientUpdate()<\/code>\u00a0function, which returns the trained weights\u00a0<code>w<\/code>\u00a0back to the server. Finally, the server takes the average of all weights\u00a0<code>w<\/code>\u00a0received from the\u00a0<code>K<\/code>\u00a0clients. The average of the weights is regarded as the new set of weights for the generic model.<\/p>\n<p>\u00a0<\/p>\n<h3>What\u2019s Next<\/h3>\n<p>\u00a0<br \/>This article introduced federated learning, which is a new type of training method for machine learning models that leverages ground-truth data generated by an end device (i.e. a mobile phone) to update a generic or shared model that\u2019s distributed to different devices. The article summarized the federated learning pipeline in 7 steps, starting with preparing a generic model through receiving the trained\/updated versions from the mobile devices.<\/p>\n<p>A primary motivation behind federated learning is to keep the data private and just share a model trained by such data. Unfortunately, this privacy could be broken, which I\u2019ll discuss in my next article.<\/p>\n<p>\u00a0<br \/><b>References<\/b><\/p>\n<ol>\n<li>McMahan, H. Brendan, et al. \u201cCommunication-efficient learning of deep networks from decentralized data.\u201d\u00a0<em>arXiv preprint arXiv:1602.05629<\/em>\u00a0(2016).\n<\/li>\n<\/ol>\n<p>\u00a0<br \/><b>Bio: <a href=\"https:\/\/www.linkedin.com\/in\/ahmedfgad\/\" target=\"_blank\" rel=\"noopener noreferrer\">Ahmed Gad<\/a><\/b> received his B.Sc. degree with excellent with honors in information technology from the Faculty of Computers and Information (FCI), Menoufia University, Egypt, in July 2015. For being ranked first in his faculty, he was recommended to work as a teaching assistant in one of the Egyptian institutes in 2015 and then in 2016 to work as a teaching assistant and a researcher in his faculty. His current research interests include deep learning, machine learning, artificial intelligence, digital signal processing, and computer vision.<\/p>\n<p><a href=\"https:\/\/heartbeat.fritz.ai\/introduction-to-federated-learning-40eb122754a2\" target=\"_blank\" rel=\"noopener noreferrer\">Original<\/a>. Reposted with permission.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2020\/08\/introduction-federated-learning.html<\/p>\n","protected":false},"author":0,"featured_media":531,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/530"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=530"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/530\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/531"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=530"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=530"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=530"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}