{"id":2124,"date":"2020-09-29T03:45:39","date_gmt":"2020-09-29T03:45:39","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/09\/29\/popular-activation-functions-in-neural-networks\/"},"modified":"2020-09-29T03:45:39","modified_gmt":"2020-09-29T03:45:39","slug":"popular-activation-functions-in-neural-networks","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/09\/29\/popular-activation-functions-in-neural-networks\/","title":{"rendered":"Popular Activation Functions In Neural Networks"},"content":{"rendered":"<div id=\"tve_editor\" data-post-id=\"6265\">\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-174d49c3b74\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/09\/1-Activation-functions-in-neural-networks.png?resize=613%2C368&amp;ssl=1\" class=\"tve_image wp-image-6275\" alt=\"Activation functions in neural networks\" data-id=\"6275\" width=\"613\" data-init-width=\"750\" height=\"368\" data-init-height=\"450\" title=\"Activation functions in neural networks\" loading=\"lazy\" data-width=\"613\" data-height=\"368\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6275\" alt=\"Activation functions in neural networks\" data-id=\"6275\" width=\"613\" data-init-width=\"750\" height=\"368\" data-init-height=\"450\" title=\"Activation functions in neural networks\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/09\/1-Activation-functions-in-neural-networks.png?resize=613%2C368&amp;ssl=1\" data-width=\"613\" data-height=\"368\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\" data-css=\"tve-u-174d49c3b7f\">\n<p dir=\"ltr\">In the <a href=\"https:\/\/dataaspirant.com\/neural-network-basics\/\" target=\"_blank\" rel=\"noopener noreferrer\">neural network introduction<\/a> article, we have discussed the basics of neural networks. This article focus is on <strong>different types<\/strong> of activation functions using in building neural networks.\u00a0<\/p>\n<p dir=\"ltr\">In the deep learning literate or in <a href=\"https:\/\/dataaspirant.com\/top-rated-data-science-courses\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">neural network online courses<\/a>, these activation functions are popularly called <strong>transfer functions<\/strong>.<\/p>\n<p dir=\"ltr\">The main focus of this article is to give you a complete overview of various activation functions and their properties. We\u2019ll also see how to implement them in python.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_tw_qs tve_clearfix\" data-url=\"https:\/\/twitter.com\/intent\/tweet\" data-via=\"\" data-use_custom_url=\"\" data-css=\"tve-u-174d49c3bc2\">\n<div class=\"thrv_tw_qs_container\">\n<div class=\"thrv_tw_quote\">\n<p>Popular activation functions in neural networks.<\/p>\n<\/div>\n<p>\n\t\t\t<span><br \/>\n\t\t\t\t<i><\/i><br \/>\n\t\t\t\t<span class=\"thrv_tw_qs_button_text  thrv-inline-text tve_editable\">Click to Tweet<\/span><br \/>\n\t\t\t<\/span>\n\t\t<\/p>\n<\/div>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\" data-css=\"tve-u-174d49c3b7f\">\n<p dir=\"ltr\">Before we drive further, Let\u2019s see the topic you are going to learn in this article.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\" data-css=\"tve-u-174d49c3bc7\">\n<p>So let\u2019s begin with understanding what is the activation function. If you remember decision tree, at each node the <a href=\"https:\/\/dataaspirant.com\/how-decision-tree-algorithm-works\/\" target=\"_blank\" class=\"tve-froala fr-basic\" rel=\"noopener noreferrer\">decision tree algorithm<\/a> needs to take decision to split the further data, we can related this to understand about activation functions.<\/p>\n<h2 class=\"\" id=\"t-1601343500022\">\n<br \/>What is Activation Function?<\/h2>\n<p dir=\"ltr\">The name activation is self explainable. As the name suggests, the activation function is to <strong>alert or fire<\/strong> the neurons\/node in neural networks.<\/p>\n<p dir=\"ltr\">If we treat these functions as a black box, like we treat many <a href=\"https:\/\/dataaspirant.com\/classification-clustering-alogrithms\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">classifiction algorihtms<\/a> , these functions will take input and return output, which helps the neural network to pass the value to the next nodes of the network.<\/p>\n<p dir=\"ltr\">Activation functions are vital components in the neural networks, which helps the network to learn the <strong>intricate patterns<\/strong> in train data, which helps in predicting the future.<\/p>\n<p dir=\"ltr\">In mathematical words, activation functions are used in neural networks to compute the <strong>weighted sum of input and biases<\/strong>, which is used to decide if a neuron can be fired or not.\u00a0<\/p>\n<p dir=\"ltr\">We can related the computing the weighted sum with the <a href=\"https:\/\/dataaspirant.com\/2017\/02\/15\/simple-linear-regression-python-without-any-machine-learning-libraries\/\" target=\"_blank\" rel=\"noopener noreferrer\">linear regression concept<\/a>.<\/p>\n<p dir=\"ltr\">It manipulates the presented data through some <strong>gradient<\/strong> processing, usually gradient descent, and afterward produces an output for the neural network that contains the parameters in the data.\u00a0<\/p>\n<p dir=\"ltr\">Activation functions are often referred to as a <strong>transfer function<\/strong> in deep learning research papers literature. These activation functions are having a set of <strong>properties<\/strong> to follow.\u00a0<\/p>\n<p dir=\"ltr\">Let\u2019s discuss this.<\/p>\n<h2 id=\"t-1601294790452\" class=\"\">Properties of activation functions<\/h2>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-174d4a17ad9\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/09\/2-Properties-of-activation-functions.png?resize=613%2C368&amp;ssl=1\" class=\"tve_image wp-image-6284\" alt=\"Properties of activation functions\" data-id=\"6284\" width=\"613\" data-init-width=\"750\" height=\"368\" data-init-height=\"450\" title=\"Properties of activation functions\" loading=\"lazy\" data-width=\"613\" data-height=\"368\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6284\" alt=\"Properties of activation functions\" data-id=\"6284\" width=\"613\" data-init-width=\"750\" height=\"368\" data-init-height=\"450\" title=\"Properties of activation functions\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/09\/2-Properties-of-activation-functions.png?resize=613%2C368&amp;ssl=1\" data-width=\"613\" data-height=\"368\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">Properties of activation functions<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">These functions having the set of properties to follow,<\/p>\n<ul class=\"\">\n<li>Computational Inexpensive<\/li>\n<li>Differentiable<\/li>\n<li>Zero Centered<\/li>\n<\/ul>\n<h3 id=\"t-1601294790454\" class=\"\">Computational Inexpensive<\/h3>\n<p dir=\"ltr\">The activation function computation has to be <strong>very minimal<\/strong>, as this impacts the neural network training period\/time.\u00a0<\/p>\n<p dir=\"ltr\">For the <strong>complicated neural network architectures<\/strong> such as the Convolutional neural network (CNN), Recurrent Neural Network (RNN) needs many parameters to optimize.\u00a0<\/p>\n<p dir=\"ltr\">This optimization needs to compute the activation functions at each layer. If the activation functions are computational high, it will take a hell lot of time for getting the optimized weights at each layer in the network.\u00a0<\/p>\n<p dir=\"ltr\">So the key properties activation function should follow computational inexpensiveness.<\/p>\n<h3 id=\"t-1601294790455\" class=\"\">Differentiable<\/h3>\n<p dir=\"ltr\">The second fundamental property is <strong>differentiable<\/strong>.<\/p>\n<p dir=\"ltr\">Activation functions have to be differentiable, even though we are having linear functions which are non-differentiable, to learn the <strong>complex patterns<\/strong> in the training data, the activation functions need to be differentiable.\u00a0<\/p>\n<p dir=\"ltr\">Now raised the other questions\u00a0<\/p>\n<blockquote class=\"\"><p><strong>why the activation functions need to differentiable?<\/strong><\/p><\/blockquote>\n<p dir=\"ltr\">If you remember in the <a href=\"https:\/\/dataaspirant.com\/neural-network-basics\/\" target=\"_blank\" rel=\"noopener noreferrer\">neural networks introduction<\/a> article we explained the concept call backpropagation. Using the <strong>backpropagation<\/strong> the networks calculate the errors it\u2019 made <strong>previously<\/strong> and using this information, it updates the weights accordingly to <strong>reduce<\/strong> the overall network error.\u00a0<\/p>\n<p dir=\"ltr\">To perform this the network uses the <strong>gradient descent approach<\/strong> which needs the differential of the activation functions.<\/p>\n<h3 id=\"t-1601294790456\" class=\"\">Zero Centered<\/h3>\n<p dir=\"ltr\">The output of the activation functions needs to be <strong>zero centered<\/strong>, So this will help in the calculated gradients to be in the <strong>same direction<\/strong> and shifting across.\u00a0<\/p>\n<p dir=\"ltr\">We discussed the key properties of the activation functions, now let\u2019s discuss various <strong>categories<\/strong> of these functions.<\/p>\n<h2 id=\"t-1601294790457\" class=\"\">Activation Function Categories<\/h2>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-174d4a37bea\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/09\/3-Activation-Function-Categories.png?resize=613%2C287&amp;ssl=1\" class=\"tve_image wp-image-6287\" alt=\"Activation Function Categories\" data-id=\"6287\" width=\"613\" data-init-width=\"2434\" height=\"287\" data-init-height=\"1140\" title=\"Activation Function Categories\" loading=\"lazy\" data-width=\"613\" data-height=\"287\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6287\" alt=\"Activation Function Categories\" data-id=\"6287\" width=\"613\" data-init-width=\"2434\" height=\"287\" data-init-height=\"1140\" title=\"Activation Function Categories\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/09\/3-Activation-Function-Categories.png?resize=613%2C287&amp;ssl=1\" data-width=\"613\" data-height=\"287\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">Activation Function Categories<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">At a high level, the activation function categorized into <strong>3<\/strong> types.<\/p>\n<ul class=\"\">\n<li>Binary step functions<\/li>\n<li>Linear activation functions<\/li>\n<li>Nonlinear activation functions<\/li>\n<\/ul>\n<h2 class=\"\" id=\"t-1601294790458\">Binary step functions<\/h2>\n<p dir=\"ltr\">The simpler activation function is a step function. The output value depends on the <strong>threshold value<\/strong> we are considering. If the input value is greater than the threshold value the output will be 1, else the output will be 0.<\/p>\n<p dir=\"ltr\">This means if the input value is more than the threshold value which means the node has to fire, else no.<\/p>\n<p dir=\"ltr\">This is similar to the way how <a href=\"https:\/\/dataaspirant.com\/2017\/03\/02\/how-logistic-regression-model-works\/\" target=\"_blank\" rel=\"noopener noreferrer\">logistic regression predicts<\/a> the binary target class.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">In the above graph, we are considering the <strong>threshold value as zero<\/strong>. If the graph is not visible, scroll the code, you will find the graph.\u00a0<\/p>\n<p dir=\"ltr\">This activation function can be used in <a href=\"https:\/\/dataaspirant.com\/implement-logistic-regression-model-python-binary-classification\/\" target=\"_blank\" rel=\"noopener noreferrer\">binary classifications<\/a> as the name says, however, it can not be used in a situation where you have multiple classes to deal with.<\/p>\n<h4 class=\"\">Why is it used?<\/h4>\n<p dir=\"ltr\">Some cases call for a function which applies a <strong>hard threshold<\/strong>: either the output is precisely a single value, or not.\u00a0<\/p>\n<p dir=\"ltr\">The other functions we have looked at have an intrinsic probabilistic output to them i.e. a higher output in decimal format implying a greater probability of being 1 (or high output). <\/p>\n<p dir=\"ltr\">The step function does away with this opting for a definite high or low output depending on some threshold on the input \u00a0T.\u00a0<\/p>\n<p dir=\"ltr\">However, the step-function is <strong>discontinuous<\/strong> and therefore non-differentiable. Therefore the use of this function in practice is not done with <strong>back-propagation.<\/strong><\/p>\n<h2 class=\"\" id=\"t-1601316758559\">Linear activation functions<\/h2>\n<p dir=\"ltr\">The linear activation function is the simplest form of activation. If you use a linear activation function the wrong way, your whole neural network ends up being a regression.<\/p>\n<p dir=\"ltr\">Not sure about that, <\/p>\n<p dir=\"ltr\">Just think <\/p>\n<blockquote class=\"\"><p><strong>How the network will be if we simple use the linear activation functions?<\/strong><\/p><\/blockquote>\n<p dir=\"ltr\">In the end, we need to add all the nodes activation functions, if we are using the linear activation function we will be <strong>adding<\/strong> all the linear functions. So the sum of all the linear functions is a linear function.\u00a0<\/p>\n<p dir=\"ltr\">This makes the network a <a href=\"https:\/\/dataaspirant.com\/2014\/12\/20\/linear-regression-implementation-in-python\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>regression<\/strong> equation<\/a>.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">Linear activations are only needed when you\u2019re considering a <a href=\"https:\/\/dataaspirant.com\/simple-linear-regression-python-without-any-machine-learning-libraries\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">regression problem<\/a>, as the last layer.<\/p>\n<h4 class=\"\">Why is it used?<\/h4>\n<p dir=\"ltr\">If there\u2019s a situation where we want a node to give its output without applying any thresholds, then the <strong>identity or linear function<\/strong> is the way to go.\u00a0<\/p>\n<p dir=\"ltr\">The linear function is <strong>not used<\/strong> in the hidden layers. We must use non-linear transfer functions in the hidden layer nodes or else output will only ever end up being a linearly separable function.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv-page-section thrv-lp-block\" data-inherit-lp-settings=\"1\" data-css=\"tve-u-174d5ed76b0\" data-keep-css_id=\"1\">\n<div class=\"tve-page-section-in tve_empty_dropzone  \" data-css=\"tve-u-174d5ed79dd\">\n<div class=\"thrv_wrapper thrv-columns dynamic-group-kbulxqe6\" data-css=\"tve-u-174d5ed76b1\">\n<div class=\"tcb-flex-row v-2 tcb--cols--2\" data-css=\"tve-u-174d5ed76b2\">\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbulxl9a\" data-css=\"tve-u-174d5ed76b3\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbulxc3q\" data-css=\"tve-u-174d5ed76b4\">\n<div class=\"tve-cb\">\n<h4 class=\"\" id=\"t-1601316758560\" data-css=\"tve-u-174d5f08541\">Pros<\/h4>\n<div class=\"thrv_wrapper thrv-styled_list dynamic-group-kbulx7a0\" data-icon-code=\"icon-check\" data-css=\"tve-u-174d5ed76b8\">\n<ul class=\"tcb-styled-list\">\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d5ed76b9\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d5ed76bb\"><\/p>\n<p>The output value is not binary.<\/p>\n<p><\/span>\n<\/li>\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d5ed76b9\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d5ed76bb\"><\/p>\n<p>Can connect multiple neurons together, if any one fires, we can take the maximum on to take the decision.<\/p>\n<p><\/span>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbulxl9a\" data-css=\"tve-u-174d5ed76bc\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbulxc3q\" data-css=\"tve-u-174d5ed76bd\">\n<div class=\"tve-cb\">\n<h4 class=\"\" id=\"t-1601316758561\" data-css=\"tve-u-174d5f08555\">Cons<\/h4>\n<div class=\"thrv_wrapper thrv-styled_list dynamic-group-kbulx7a0\" data-icon-code=\"icon-times-solid\" data-css=\"tve-u-174d5ed76c0\">\n<ul class=\"tcb-styled-list\">\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d5ed76c1\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d5ed76c3\"><\/p>\n<p>Derivate is constant, which means no use with the gradient descent.<\/p>\n<p><\/span>\n<\/li>\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d5ed76c4\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d5ed76c6\"><\/p>\n<p>Changes in the backpropagation will depend on the constant derivate but not on the actual variable.<\/p>\n<p><\/span>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">Both the binary step function and the linear activation functions are <strong>not<\/strong> so famous in terms of deep learning complex and modern architectures. The nonlinear activation functions are mostly used.\u00a0<\/p>\n<p dir=\"ltr\">So let\u2019s discuss various nonlinear activation functions.<\/p>\n<h2 id=\"t-1601316758562\" class=\"\">Nonlinear activation functions<\/h2>\n<p dir=\"ltr\">We are having numerous non-linear activation functions, in this article we are mainly focussing on the below functions.<\/p>\n<ul class=\"\">\n<li>Sigmoid Function<\/li>\n<li>Tanh Function<\/li>\n<li>Gaussian<\/li>\n<li>Relu<\/li>\n<li>Leaky Relu<\/li>\n<\/ul>\n<p dir=\"ltr\">Let\u2019s start with the sigmoid function.<\/p>\n<h3 id=\"t-1601316758563\" class=\"\">Sigmoid function<\/h3>\n<p dir=\"ltr\">The <a href=\"https:\/\/dataaspirant.com\/difference-between-softmax-function-and-sigmoid-function\/\" target=\"_blank\" rel=\"noopener noreferrer\">sigmoid activation function<\/a> \u00a0is sometimes referred to as the logistic function or squashing function in some literature.\u00a0<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<h4 class=\"\">Why it is used?<\/h4>\n<p dir=\"ltr\">This function maps the input to a value between 0 and 1 (but not equal to 0 or 1). This means the output from the node will be a high signal (if the input is positive) or a low one (if the input is negative).\u00a0<\/p>\n<p dir=\"ltr\">The simplicity of its derivative allows us to efficiently perform backpropagation without using any fancy packages or approximations. The fact that this function is smooth, continuous, monotonic, and bounded means that backpropagation will work well.\u00a0<\/p>\n<p dir=\"ltr\">The sigmoid\u2019s natural threshold is <strong>0.5<\/strong>, meaning that any input that maps to a value above 0.5 will be considered high (or 1) in binary terms.<\/p>\n<p dir=\"ltr\">Similary to this we have the softwax function which can used for <a href=\"https:\/\/dataaspirant.com\/multinomial-logistic-regression-model-works-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">multi classification problems<\/a>. You can have a look at the key difference by reading the <a href=\"https:\/\/dataaspirant.com\/difference-between-softmax-function-and-sigmoid-function\/\" target=\"_blank\" rel=\"noopener noreferrer\">softmax Vs sigmod<\/a> article.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv-page-section thrv-lp-block\" data-inherit-lp-settings=\"1\" data-css=\"tve-u-174d5f55c15\" data-keep-css_id=\"1\">\n<div class=\"tve-page-section-in tve_empty_dropzone  \" data-css=\"tve-u-174d5f55fa0\">\n<div class=\"thrv_wrapper thrv-columns dynamic-group-kbulxqe6\" data-css=\"tve-u-174d5f55c16\">\n<div class=\"tcb-flex-row v-2 tcb--cols--2\" data-css=\"tve-u-174d5f55c17\">\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbulxl9a\" data-css=\"tve-u-174d5f55c18\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbulxc3q\" data-css=\"tve-u-174d5f55c19\">\n<div class=\"tve-cb\">\n<h4 class=\"\">Pros<\/h4>\n<div class=\"thrv_wrapper thrv-styled_list dynamic-group-kbulx7a0\" data-icon-code=\"icon-check\" data-css=\"tve-u-174d5f55c1d\">\n<ul class=\"tcb-styled-list\">\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d5f55c1e\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d5f55c20\"><\/p>\n<p>Interpretability of the output mapped between 0 and 1.<\/p>\n<p><\/span>\n<\/li>\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d5f55c1e\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d5f55c20\">Compute gradient quickly.<\/span>\n<\/li>\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d5f55c1e\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d5f55c20\">It\u2019s has a smooth gradient.<\/span>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbulxl9a\" data-css=\"tve-u-174d5f55c21\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbulxc3q\" data-css=\"tve-u-174d5f55c22\">\n<div class=\"tve-cb\">\n<h4 class=\"\">Cons<\/h4>\n<div class=\"thrv_wrapper thrv-styled_list dynamic-group-kbulx7a0\" data-icon-code=\"icon-times-solid\" data-css=\"tve-u-174d5f55c25\">\n<ul class=\"tcb-styled-list\">\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d5f55c26\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d5f55c28\"><\/p>\n<p>At the end of the sigmoid function, the Y values tend to respond very less to changes in X, this is known as the Vanishing gradient problem.<\/p>\n<p><\/span>\n<\/li>\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d5f55c29\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d5f55c2b\">Sigmoids saturate and kill gradients.<\/span>\n<\/li>\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d5f55c2c\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d5f55c2e\"><\/p>\n<p>The optimization becomes hard when the output is not zero centered.<\/p>\n<p><\/span>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<h3 class=\"\" id=\"t-1601316758564\">Hyperbolic Tangent function<\/h3>\n<p dir=\"ltr\">The hyperbolic tangent function known as the <strong>tanh function<\/strong> is a smoother zero-entered function whose range lies between -1 to 1.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<h4 class=\"\">Why is it used?<\/h4>\n<p dir=\"ltr\">This is a very similar function to the previous sigmoid function and has much of the same properties, even its <strong>derivative<\/strong> is straight forward to compute. However, this function allows us to map the input to any value between <strong>-1 and 1 <\/strong>(but not inclusive of those).\u00a0<\/p>\n<p dir=\"ltr\">In effect, this allows us to apply a penalty to the node (negative) rather than just have the node not fire at all.\u00a0<\/p>\n<p dir=\"ltr\">This function has a natural threshold of <strong>0<\/strong>, meaning that any input value <strong>greater than<\/strong> 0 is considered high (or 1) in binary terms.\u00a0<\/p>\n<p dir=\"ltr\">Again, the fact that this function is <strong>smooth, continuous, monotonic<\/strong>, and bounded means that backpropagation will work well.\u00a0<\/p>\n<p dir=\"ltr\">The subsequent functions don\u2019t have all these properties which makes them more difficult to use in backpropagation.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv-page-section thrv-lp-block\" data-inherit-lp-settings=\"1\" data-css=\"tve-u-174d76d32f1\" data-keep-css_id=\"1\">\n<div class=\"tve-page-section-in tve_empty_dropzone  \" data-css=\"tve-u-174d76d367d\">\n<div class=\"thrv_wrapper thrv-columns dynamic-group-kbulxqe6\" data-css=\"tve-u-174d76d32f2\">\n<div class=\"tcb-flex-row v-2 tcb--cols--2\" data-css=\"tve-u-174d76d32f3\">\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbulxl9a\" data-css=\"tve-u-174d76d32f4\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbulxc3q\" data-css=\"tve-u-174d76d32f5\">\n<div class=\"tve-cb\">\n<h4 class=\"\">Pros<\/h4>\n<div class=\"thrv_wrapper thrv-styled_list dynamic-group-kbulx7a0\" data-icon-code=\"icon-check\" data-css=\"tve-u-174d76d32f9\">\n<ul class=\"tcb-styled-list\">\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d76d32fa\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d76d32fc\">Efficient since it has mean 0 in the middle layers between -1 and 1. <\/span>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">Now the question is<\/p>\n<blockquote class=\"\"><p><strong>what is the difference between sigmoid and hyperbolic tangent?<\/strong><\/p><\/blockquote>\n<p dir=\"ltr\">They both achieve a similar mapping, both are continuous, smooth, monotonic, and differentiable, but give out different values. <\/p>\n<p dir=\"ltr\">For a sigmoid function, a larger negative input generates an <strong>almost zero<\/strong> output. This lack of output will affect all subsequent weights in the network which may not be desirable &#8211; effectively stopping the next nodes from learning. <\/p>\n<p dir=\"ltr\">In contrast, the tanh function <strong>supplies -1<\/strong> for negative values, maintaining the output of the node, and allowing subsequent nodes to learn from it.\u00a0<\/p>\n<h3 class=\"\" id=\"t-1601341923777\">Gaussian Function<\/h3>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<h4 class=\"\">Why is it used?<\/h4>\n<p dir=\"ltr\">The Gaussian function is an even function, thus it gives the same output for equally <strong>positive and negative<\/strong> values of input. It gives its maximal output when there is no input and has decreasing output with increasing distance from zero.\u00a0<\/p>\n<p dir=\"ltr\">We can perhaps imagine this function is used in a node where the input feature is less likely to contribute to the final result.\u00a0<\/p>\n<h3 class=\"\" id=\"t-1601341923778\">Rectified Linear Unit (ReLU)<\/h3>\n<p dir=\"ltr\">This Relu is widely used in <strong>Convolutional Neural networks<\/strong>. As the function is just the max of input and zero, It so is easy to compute and does not saturate and does <strong>not cause<\/strong> the Vanishing Gradient Problem.<\/p>\n<p dir=\"ltr\">You may came accross these activation function in <a href=\"https:\/\/dataaspirant.com\/handwritten-digits-recognition-tensorflow-python\/\" target=\"_blank\" class=\"tve-froala fr-basic\" rel=\"noopener noreferrer\">extracting the text from the hand written images<\/a>.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<h4 class=\"\">Why is it used?<\/h4>\n<p dir=\"ltr\">The ReLU represents a nearly linear function and therefore preserves the properties of linear models that made them easy to optimize, with gradient-descent methods. <\/p>\n<p dir=\"ltr\">This function rectifies the values of the inputs less than zero thereby forcing them to zero and eliminating the vanishing gradient problem observed in the earlier types of the activation function.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv-page-section thrv-lp-block\" data-inherit-lp-settings=\"1\" data-css=\"tve-u-174d771c044\" data-keep-css_id=\"1\">\n<div class=\"tve-page-section-in tve_empty_dropzone  \" data-css=\"tve-u-174d771c48a\">\n<div class=\"thrv_wrapper thrv-columns dynamic-group-kbulxqe6\" data-css=\"tve-u-174d771c045\">\n<div class=\"tcb-flex-row v-2 tcb--cols--2\" data-css=\"tve-u-174d771c046\">\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbulxl9a\" data-css=\"tve-u-174d771c047\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbulxc3q\" data-css=\"tve-u-174d771c048\">\n<div class=\"tve-cb\">\n<h4 class=\"\">Pros<\/h4>\n<div class=\"thrv_wrapper thrv-styled_list dynamic-group-kbulx7a0\" data-icon-code=\"icon-check\" data-css=\"tve-u-174d771c04c\">\n<ul class=\"tcb-styled-list\">\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d771c04d\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d771c04f\"><\/p>\n<p>Easy to implement and quick to compute.<\/p>\n<p><\/span>\n<\/li>\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d771c04d\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d771c04f\"><\/p>\n<p>It avoids and rectifies the vanishing gradient problem.<\/p>\n<p><\/span>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbulxl9a\" data-css=\"tve-u-174d771c050\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbulxc3q\" data-css=\"tve-u-174d771c051\">\n<div class=\"tve-cb\">\n<h4 class=\"\">Cons<\/h4>\n<div class=\"thrv_wrapper thrv-styled_list dynamic-group-kbulx7a0\" data-icon-code=\"icon-times-solid\" data-css=\"tve-u-174d771c054\">\n<ul class=\"tcb-styled-list\">\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d771c055\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d771c057\"><\/p>\n<p>Problematic when we have lots of negative values since the outcome is always 0 and leads to the death of the neuron.<\/p>\n<p><\/span>\n<\/li>\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-174d771c058\">\n<p><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-174d771c05a\"><\/p>\n<p>It has just one issue of not being zero centered. It suffers from the \u201cdying ReLU\u201d problem<\/p>\n<p><\/span>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<h3 class=\"\" id=\"t-1601341923779\">LeakyReLU<\/h3>\n<p dir=\"ltr\">The LeakyRelu is a variant of ReLU. Instead of being 0 when <strong>\ud835\udc67&lt;0 z&lt;0,<\/strong> a leaky ReLU allows a small, non-zero, constant gradient.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<h4 class=\"\">Why is it used?<\/h4>\n<p dir=\"ltr\">As said before leakyRelu is a variant of Relu. Here alpha is a hyperparameter generally set to<strong> 0.01<\/strong>. Leaky ReLU solves the <strong>\u201cdying ReLU\u201d<\/strong> problem to some extent.\u00a0<\/p>\n<p dir=\"ltr\">If you observe if we set \u03b1 as 1 then Leaky ReLU will become a linear function f(x) = x and will be of no use. Hence, the value of <strong>alpha<\/strong> is never set close to 1. If we set alpha as a hyperparameter for each neuron separately, we get parametric <strong>ReLU or PReLU.<\/strong><\/p>\n<p dir=\"ltr\">The activations functions are <strong>not limited<\/strong> to these but we have discussed widely used activation functions in the industry.<\/p>\n<p dir=\"ltr\">The below figure shows the <strong>different types<\/strong> of activation functions.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-174d77a7e4c\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/09\/4-different-activation-functions.png?resize=626%2C540&amp;ssl=1\" class=\"tve_image wp-image-6321\" alt=\"Different activation functions\" data-id=\"6321\" width=\"626\" data-init-width=\"1024\" height=\"540\" data-init-height=\"883\" title=\"Different activation functions\" loading=\"lazy\" data-width=\"626\" data-height=\"540\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6321\" alt=\"Different activation functions\" data-id=\"6321\" width=\"626\" data-init-width=\"1024\" height=\"540\" data-init-height=\"883\" title=\"Different activation functions\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/09\/4-different-activation-functions.png?resize=626%2C540&amp;ssl=1\" data-width=\"626\" data-height=\"540\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">Different activation functions (Source: wikipedia)<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<h2 class=\"\" id=\"t-1601341923780\">Conclusion<\/h2>\n<p dir=\"ltr\">To conclude, we provided a comprehensive summary of the activation functions used in deep learning. <\/p>\n<p dir=\"ltr\">The activation functions have the capability to improve the learning of the patterns in data there by automating the process of feature detection and justifying their use in the hidden layers of the neural networks.<\/p>\n<\/div>\n<h4 class=\"\">Recommended Deep Learning Courses<\/h4>\n<div class=\"thrv_wrapper thrv-page-section thrv-lp-block\" data-inherit-lp-settings=\"1\" data-css=\"tve-u-174d49c3a33\" data-keep-css_id=\"1\">\n<div class=\"tve-page-section-in tve_empty_dropzone  \" data-css=\"tve-u-17481b960b8\">\n<div class=\"thrv_wrapper thrv-columns dynamic-group-kbt3q0q7\" data-css=\"tve-u-17481b95e2b\">\n<div class=\"tcb-flex-row v-2 tcb--cols--3 tcb-medium-no-wrap tcb-mobile-wrap m-edit\" data-css=\"tve-u-174d49c3a34\">\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-174d49c3a4c\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-174d49c3a4f\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/deeplearning-course.jpg?resize=176%2C176&amp;ssl=1\" class=\"tve_image wp-image-5170\" alt=\"Deep Learning python\" data-id=\"5170\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"deeplearning-course\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-174d49c3a50\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-5170\" alt=\"Deep Learning python\" data-id=\"5170\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"deeplearning-course\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/deeplearning-course.jpg?resize=176%2C176&amp;ssl=1\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-174d49c3a50\" data-recalc-dims=\"1\"><br \/>\n<span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-174d49c3a37\">Deep Learning A to Z Course<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-174d49c3a4d\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-174d49c3a5c\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/tensorflow-course.png?resize=176%2C176&amp;ssl=1\" class=\"tve_image wp-image-5175\" alt=\"Tensorflow Course\" data-id=\"5175\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"tensorflow course\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-174d49c3a5d\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-5175\" alt=\"Tensorflow Course\" data-id=\"5175\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"tensorflow course\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/tensorflow-course.png?resize=176%2C176&amp;ssl=1\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-174d49c3a5d\" data-recalc-dims=\"1\"><br \/>\n<span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-174d49c3a3e\">Learn Deep Learning With Tensorflow<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-174d49c3a4e\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-174d49c3a5e\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/tcb_content_templates\/\/images\/mega_menu_img_06-e1592987561232.jpg?resize=176%2C176\" class=\"tve_image wp-image-60932\" alt data-id=\"60932\" width=\"176\" data-init-width=\"400\" height=\"176\" data-init-height=\"400\" title=\"mega_menu_img_06\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-174d49c3a5f\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-60932\" alt=\"\" data-id=\"60932\" width=\"176\" data-init-width=\"400\" height=\"176\" data-init-height=\"400\" title=\"mega_menu_img_06\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/tcb_content_templates\/\/images\/mega_menu_img_06-e1592987561232.jpg?resize=176%2C176\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-174d49c3a5f\" data-recalc-dims=\"1\"><br \/>\n<span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-174d49c3a46\">Python Deep Learning Specialization<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/dataaspirant.com\/popular-activation-functions-neural-networks\/<\/p>\n","protected":false},"author":0,"featured_media":2125,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/2124"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=2124"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/2124\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/2125"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=2124"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=2124"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=2124"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}