{"id":1642,"date":"2020-09-17T03:44:21","date_gmt":"2020-09-17T03:44:21","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/09\/17\/introduction-to-neural-network-basics\/"},"modified":"2020-09-17T03:44:21","modified_gmt":"2020-09-17T03:44:21","slug":"introduction-to-neural-network-basics","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/09\/17\/introduction-to-neural-network-basics\/","title":{"rendered":"Introduction to Neural Network Basics"},"content":{"rendered":"<div data-css=\"tve-u-17497e6f348\">\n<p>This is the first part of a series of blog posts on simple Neural Networks. The basics of neural networks can be found all over the internet. Many of them are the same, each article is written slightly differently.\u00a0<\/p>\n<p dir=\"ltr\">But here we tried a different approach to get a deep understanding of the neural networks by explaining\u00a0<strong>each building block<\/strong> concept to build the neural network.<\/p>\n<p dir=\"ltr\">Literally, we will narrow down to the very basic concepts you should need to build the neural networks. The knowledge you gained in this article will help you understand the various <strong>deep learning models architecture<\/strong> in the long run.\u00a0<\/p>\n<\/div>\n<div>\n<p dir=\"ltr\">Many of us have seen the <strong>pocket calculator<\/strong> in an arithmetic contest. It will never improve its speed or accuracy, no matter how much it practices.\u00a0<\/p>\n<h4 class=\"\">In short:<\/h4>\n<blockquote class=\"\"><p><strong>It doesn&#8217;t learn.<\/strong><\/p><\/blockquote>\n<p dir=\"ltr\">For example, every time I press its square-root button, it computes exactly the same function in exactly the same way. Here the pocket calculator is not learning.<\/p>\n<p dir=\"ltr\">But how can it learn?\u00a0<\/p>\n<p dir=\"ltr\">By computing a function. Our brains can also learn much more efficiently based on the same idea. Before delving deeper into how such networks can learn, let&#8217;s first understand how they can compute.<\/p>\n<p dir=\"ltr\">This computing function is called neural networks models in deep learning, in machine learning literature it\u2019s called a <a href=\"https:\/\/dataaspirant.com\/random-forest-algorithm-machine-learing\/\" target=\"_blank\" rel=\"noopener noreferrer\">machine learning model<\/a>.<\/p>\n<p dir=\"ltr\">Unlike various machine learning models such as <a href=\"https:\/\/dataaspirant.com\/implement-logistic-regression-model-python-binary-classification\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">logistic regression<\/a>, <a href=\"https:\/\/dataaspirant.com\/decision-tree-classifier-implementation-in-r\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">decision trees<\/a>, <a href=\"https:\/\/dataaspirant.com\/random-forest-classifier-python-scikit-learn\/\" target=\"_blank\" rel=\"noopener noreferrer\">randomforest<\/a> the deep learning models are <strong>complete different<\/strong> in the way they learn from data.<\/p>\n<p dir=\"ltr\">Now let\u2019s learn how the neural networks learn from the data we are feeding.<\/p>\n<h2 id=\"t-1600276275451\" class=\"\">Introduction to Neural networks<\/h2>\n<p dir=\"ltr\">A neural network is simply a group of <strong>interconnected neurons<\/strong> that are able to influence each other\u2019s behavior.\u00a0<\/p>\n<p dir=\"ltr\">Your brain contains about as many neurons as there are stars in our galaxy. On average, each of these neurons is connected to a thousand other neurons via junctions called <strong>synapses<\/strong>.<\/p>\n<p dir=\"ltr\">We can schematically draw a neural network as a collection of dots representing neurons connected by lines representing synapses as shown in the below figure.\u00a0<\/p>\n<\/div>\n<div>\n<p dir=\"ltr\">Real-world neurons are very complicated. However, AI researchers have shown that neural networks can still attain <strong>human-level<\/strong> performance on many remarkably complex tasks.\u00a0<\/p>\n<p dir=\"ltr\">Such as <a href=\"https:\/\/dataaspirant.com\/handwritten-digits-recognition-tensorflow-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">hand written text recognition<\/a>, identifiying cancer tumers ..etc<\/p>\n<p dir=\"ltr\">Even if one ignores all these complexities and replaces real biological neurons with extremely simple simulated ones that are all identical and obey very simple rules.\u00a0<\/p>\n<p dir=\"ltr\">Currently the most popular model for such an artificial neural network represents the state of each neuron by a single number and the strength of each synapse by a single number.<\/p>\n<p dir=\"ltr\">In this model, each neuron updates its state at regular time steps by simply <strong>averaging together<\/strong> the inputs from all connected neurons.<\/p>\n<p dir=\"ltr\">Weighting them by the synaptic strengths, optionally adding a constant, and then applying what\u2019s called an <strong>activation function<\/strong> to the result to compute its next state.\u00a0<\/p>\n<h3 id=\"t-1600276832658\" class=\"\">Activation Functions<\/h3>\n<p dir=\"ltr\">The easiest way to use a neural network as a function is to make it <strong>feedforward<\/strong>, with information flowing only in one direction.<\/p>\n<p dir=\"ltr\">In case you like math, two popular choices of these activation functions.<\/p>\n<ul class=\"\">\n<li>Sigmoid Function<\/li>\n<li>Ramp Function<\/li>\n<\/ul>\n<p dir=\"ltr\">Sigmoid function \u00a0<a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Csigma%20%3D%20%5Cfrac%7B1%7D%7B1%20%2B%20e%5E%7B-x%7D%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/A8ffwl45StpOF9WKFyaiXdoZdmnITG1d1EYK3MdMdmyHt9YSpRJK9d9W9CE5nWWvUabhtWzx9OsybDS0KeB3KAutz2zRvQMPNS-tCPLRFe-2e6l8vVotfMKN_OTqDT0J_yx62MTN\" loading=\"lazy\" width=\"84\" height=\"33\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/A8ffwl45StpOF9WKFyaiXdoZdmnITG1d1EYK3MdMdmyHt9YSpRJK9d9W9CE5nWWvUabhtWzx9OsybDS0KeB3KAutz2zRvQMPNS-tCPLRFe-2e6l8vVotfMKN_OTqDT0J_yx62MTN\" width=\"84\" height=\"33\"><\/a> and the ramp function \u0192(x) = max{0, x}, although it\u2019s been proven that almost any function will suffice as long as it\u2019s not linear (a straight line).<\/p>\n<p dir=\"ltr\">Famous model uses<\/p>\n<blockquote class=\"\"><p>\n<em>\u0192(x) = -1 if x &lt; 0 and \u0192(x)= 1 if \u00a0&gt;= 0.<\/em>\u00a0<\/p><\/blockquote>\n<p dir=\"ltr\">If the neuron states are stored in a vector. Then the network is updated by simply <strong>multiplying<\/strong> that vector by a matrix storing the synaptic couplings and then applying the function <em><strong>\u0192<\/strong><\/em> to all elements.<\/p>\n<p dir=\"ltr\">Simple neural networks are universal in the sense that they can compute any function arbitrarily accurately by simply adjusting those synapse strength numbers accordingly.<\/p>\n<p dir=\"ltr\">When I first learned about neural networks, I was mystified by how something so simple could compute something arbitrarily complicated.\u00a0<\/p>\n<p dir=\"ltr\">For example, how can you compute even something as simple as multiplication, when all you\u2019re allowed to do is compute weighted sums and apply a single fixed function?\u00a0<\/p>\n<p dir=\"ltr\">How this works is shown in the below figure.<\/p>\n<p dir=\"ltr\">Which shows how a mere four neurons can multiply two arbitrary numbers together, and how a single neuron can multiply three bits together.<\/p>\n<\/div>\n<div>\n<p dir=\"ltr\">Now let\u2019s see a hello world example of neural networks.<\/p>\n<p dir=\"ltr\">Suppose that we wish to classify megapixel grayscale images into two categories, say cats and dogs. If each of the million pixels can take one of say <strong>256<\/strong> values then there are <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=256%5E%7B1000000%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/JEq0WozDJQ6w2vD5JGF1ChyZu1RTVD4ueow9O3-myDVNFc8q00cOIeK5VOmIxQNCltgRq5wHygt8wOQDK4NwkUR9HoYKxyqCuCaYN7ETGpGbtl6jhLixDvhwa7Q3iBvpVIiJoxYK\" loading=\"lazy\" width=\"64\" height=\"13\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/JEq0WozDJQ6w2vD5JGF1ChyZu1RTVD4ueow9O3-myDVNFc8q00cOIeK5VOmIxQNCltgRq5wHygt8wOQDK4NwkUR9HoYKxyqCuCaYN7ETGpGbtl6jhLixDvhwa7Q3iBvpVIiJoxYK\" width=\"64\" height=\"13\"><\/a> possible images for each one.\u00a0<\/p>\n<p dir=\"ltr\">We wish to compute the probability that it depicts a cat. This means that an arbitrary function that inputs a picture and outputs a probability is defined by a list of <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=256%5E%7B1000000%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/vNVv_6bwOmeun5WQR9MfdyFP1HeDnCmQJCIJiNXp1mm2BGq4eOM2tV-jaB5cSU8SoJ7DtjVbs7fqKl23lEZf40EHEVNEba8GcbRc9jGBdg6jgL-oZHumO5wbgvntczNIBwa_TyXF\" loading=\"lazy\" width=\"64\" height=\"13\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/vNVv_6bwOmeun5WQR9MfdyFP1HeDnCmQJCIJiNXp1mm2BGq4eOM2tV-jaB5cSU8SoJ7DtjVbs7fqKl23lEZf40EHEVNEba8GcbRc9jGBdg6jgL-oZHumO5wbgvntczNIBwa_TyXF\" width=\"64\" height=\"13\"><\/a>\u00a0 probabilities i.e., way more numbers than there are atoms in our universe (about <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=10%5E%7B78%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/98Si7iw2w2S3DL1S7nmnwVPbIX94D5afeggamcQ6uhGv0PfNtl32cQrNaicbH8zhQf205Teii-mAAFTIF276L-c9SKvvUhrstc2ojNpc2zLdsAGD21OhZdYE6c5rgXEHNXPdbUVH\" loading=\"lazy\" width=\"25\" height=\"15\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/98Si7iw2w2S3DL1S7nmnwVPbIX94D5afeggamcQ6uhGv0PfNtl32cQrNaicbH8zhQf205Teii-mAAFTIF276L-c9SKvvUhrstc2ojNpc2zLdsAGD21OhZdYE6c5rgXEHNXPdbUVH\" width=\"25\" height=\"15\"><\/a>).<\/p>\n<p dir=\"ltr\">Now we have the idea of how neural networks work. To frame it simple.<\/p>\n<blockquote class=\"\"><p><strong>\u201cFire together, wire together\u201d<\/strong><\/p><\/blockquote>\n<p dir=\"ltr\">Let\u2019s see the math behind the neural networks.<\/p>\n<h2 id=\"t-1600276832659\" class=\"\">The math behind the neural networks<\/h2>\n<p dir=\"ltr\">At each node in the hidden and output layers of the neural networks (NN) an activation function is executed.\u00a0<\/p>\n<p dir=\"ltr\">The activation function can also be called a transfer function. This function takes in the output of the previous node, and multiples it by some weights. The weights that come out of one node can all be different, that is they will activate different neurons.\u00a0<\/p>\n<p dir=\"ltr\">There can be many forms of the transfer function, we will first look at the sigmoid transfer function as it seems traditional.<\/p>\n<h3 id=\"t-1600276832660\" class=\"\">Sigmoid Function<\/h3>\n<\/div>\n<div>\n<p dir=\"ltr\">Here we are going to refer below index\u2019s:<\/p>\n<p dir=\"ltr\">i &#8211; the <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=i%5E%7B%5Ctext%7Bth%7D%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/qIvIrTvl40xhXrXee-Ls0scgxEWUI5cODYE-UOq3-9oUfQgBuwggcLRyOIid9QWdj68GpP3KRL8d_3QN-4Pi_QfxFJqovk9Isqgo0-8YxjrmiUEzElgmO8m3voP-Z9iLSWeTsYkL\" loading=\"lazy\" width=\"16\" height=\"15\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/qIvIrTvl40xhXrXee-Ls0scgxEWUI5cODYE-UOq3-9oUfQgBuwggcLRyOIid9QWdj68GpP3KRL8d_3QN-4Pi_QfxFJqovk9Isqgo0-8YxjrmiUEzElgmO8m3voP-Z9iLSWeTsYkL\" width=\"16\" height=\"15\"><\/a> node of the <strong>input layer<\/strong> I.<\/p>\n<p dir=\"ltr\">j &#8211; the <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=j%5E%7B%5Ctext%7Bth%7D%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/1U4-T_vcqArCUuksOf-8iHyh55lcinwmU6uoDbL3L61rW3ePFxBuCxNZskIf8zPAhUDj4FHsYC-eIm55tvWRgjVudv-HefFjFpHNGMb1jD3U5G4RS2L2UkDpB9MyZJ1TL-EPe4UE\" loading=\"lazy\" width=\"17\" height=\"17\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/1U4-T_vcqArCUuksOf-8iHyh55lcinwmU6uoDbL3L61rW3ePFxBuCxNZskIf8zPAhUDj4FHsYC-eIm55tvWRgjVudv-HefFjFpHNGMb1jD3U5G4RS2L2UkDpB9MyZJ1TL-EPe4UE\" width=\"17\" height=\"17\"><\/a>\u00a0 node of the <strong>hidden layer<\/strong> J.<\/p>\n<p dir=\"ltr\">k &#8211; the <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=k%5E%7B%5Ctext%7Bth%7D%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/OjQxMDO3_yp4_XiyhaZE1CBp7C5Usa4U6DoAfCGRDN8hVM8_5B3dw3YMLaaZipvAooA7pnzLXXBjFRanaCj5Na72RWPyPGYWwQDFX9srxVZC3Jx9EiXAhnLTYHids4PQH920h_l5\" loading=\"lazy\" width=\"19\" height=\"15\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/OjQxMDO3_yp4_XiyhaZE1CBp7C5Usa4U6DoAfCGRDN8hVM8_5B3dw3YMLaaZipvAooA7pnzLXXBjFRanaCj5Na72RWPyPGYWwQDFX9srxVZC3Jx9EiXAhnLTYHids4PQH920h_l5\" width=\"19\" height=\"15\"><\/a> node of the <strong>output layer<\/strong> K.<\/p>\n<p dir=\"ltr\">The activation function at a <strong>node j <\/strong>in the hidden layer takes the value:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cbegin%7Balign%7D%20x_%7Bj%7D%20%26%3D%20%5Cxi_%7B1%7D%20w_%7B1j%7D%20%2B%20%5Cxi_%7B2%7D%20w_%7B2j%7D%20%5C%5C%20%26%3D%20%5Csum_%7Bi%20%5Cin%20I%7D%20%5Cxi_%7Bi%7D%20w_%7Bi%20j%7D%5Cend%7Balign%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/DlL3iHpUz1KUBou8ArfvtbQQjcADBjnRKTx824u6blpTkPNkrCwRwGoJJL927Umaer-Jtq9kh_mwx3NFpbbGEMof7wZQO1t5aAjeyvwL1VdzJBR140bK8OqFBvakJwKcjE-f9UB_\" loading=\"lazy\" width=\"148\" height=\"117\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/DlL3iHpUz1KUBou8ArfvtbQQjcADBjnRKTx824u6blpTkPNkrCwRwGoJJL927Umaer-Jtq9kh_mwx3NFpbbGEMof7wZQO1t5aAjeyvwL1VdzJBR140bK8OqFBvakJwKcjE-f9UB_\" width=\"148\" height=\"117\"><\/a><\/p>\n<p dir=\"ltr\">Where <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cxi_%7Bi%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/pwpXqFfGMXdJxPPyBYK8FySDaxBQJJIDNSCHmmEE-e_nhk-My6Roi1iBYLaOEueuioYEb8GT7ydShV28x5RY52yzTnX7bNYNeU5_QoUTtz60wOUCa5M2rOT7mA53fi7PQd1Qi7ev\" loading=\"lazy\" width=\"9\" height=\"15\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/pwpXqFfGMXdJxPPyBYK8FySDaxBQJJIDNSCHmmEE-e_nhk-My6Roi1iBYLaOEueuioYEb8GT7ydShV28x5RY52yzTnX7bNYNeU5_QoUTtz60wOUCa5M2rOT7mA53fi7PQd1Qi7ev\" width=\"9\" height=\"15\"><\/a> is the value of the <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=i%5E%7Bth%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/Qo8seOuw57SfrC5mfSRge__8fRQKKqJp19IKVGStiwxWXHr_hiln4qKeNcuQLC2GiUEPlzb5mrTYV3RAggCcDeY3Dq1HoEIuR1MKbyDgdxvQMXDhubHtJS7NuaJl7O2v0zBH5fN3\" loading=\"lazy\" width=\"16\" height=\"15\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/Qo8seOuw57SfrC5mfSRge__8fRQKKqJp19IKVGStiwxWXHr_hiln4qKeNcuQLC2GiUEPlzb5mrTYV3RAggCcDeY3Dq1HoEIuR1MKbyDgdxvQMXDhubHtJS7NuaJl7O2v0zBH5fN3\" width=\"16\" height=\"15\"><\/a> input node and <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=w_%7Bij%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/ZeUpyrh5rnxhM9uL_MI-1dG9jJVT7yaRtQR0FOJ6rzD9QWx61dnIsNis55s7aQNg3zxQapUoI8TBeHzXgRHeOA9wiTwUMWhjS3o_n8j4jzk0mJxL5CPEmkuWuBb3fkdHPyDxJwPk\" loading=\"lazy\" width=\"20\" height=\"12\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/ZeUpyrh5rnxhM9uL_MI-1dG9jJVT7yaRtQR0FOJ6rzD9QWx61dnIsNis55s7aQNg3zxQapUoI8TBeHzXgRHeOA9wiTwUMWhjS3o_n8j4jzk0mJxL5CPEmkuWuBb3fkdHPyDxJwPk\" width=\"20\" height=\"12\"><\/a> is the weight of the connection between the <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=i%5E%7Bth%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/GpawIPYZ_k6cezvSlSsnwXWOF224aFRs8hvNAmg8Aha1byoU9J2tfW2lanaJt-qU2lwm0gK4ffT77RARIRM64BVOBRC4gXeOYvVQ0J_IcFiMAkPaMxVWM1hcm5apvun72XPRk9gc\" loading=\"lazy\" width=\"16\" height=\"15\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/GpawIPYZ_k6cezvSlSsnwXWOF224aFRs8hvNAmg8Aha1byoU9J2tfW2lanaJt-qU2lwm0gK4ffT77RARIRM64BVOBRC4gXeOYvVQ0J_IcFiMAkPaMxVWM1hcm5apvun72XPRk9gc\" width=\"16\" height=\"15\"><\/a> input node and the <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=j%5E%7Bth%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/XQH226Up0Yt_7ekgm0CjsgkQPXa6CfepgZGED5JutYRho28IaHiXGCeMLi_7Jfwu0G8xNFkqcKTbSkUzHD686tgH5LmArLorlaxlVXScQySytn5kL2IRDMJS-QcWgnSFm5hey1kC\" loading=\"lazy\" width=\"17\" height=\"17\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/XQH226Up0Yt_7ekgm0CjsgkQPXa6CfepgZGED5JutYRho28IaHiXGCeMLi_7Jfwu0G8xNFkqcKTbSkUzHD686tgH5LmArLorlaxlVXScQySytn5kL2IRDMJS-QcWgnSFm5hey1kC\" width=\"17\" height=\"17\"><\/a> hidden node.\u00a0<\/p>\n<h4 class=\"\">In short:\u00a0<\/h4>\n<p dir=\"ltr\">At each hidden layer node, multiply each input value by the connection received by the node and add them together.\u00a0<\/p>\n<p dir=\"ltr\">We apply the activation function on\u00a0 <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=x_j#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/joS3BbzB3SdK99L4VhBrSWcUk-0gEzR2_E_TliLvRQ3WUFEMJtmF1aQXOTzqe1SHxZC7sIU36hCSeAIs_Ss6iXSxfutjX_sSjTYe8eJICc5axiU41GyXdhCBUV9eZDc91mWgbyE9\" loading=\"lazy\" width=\"12\" height=\"12\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/joS3BbzB3SdK99L4VhBrSWcUk-0gEzR2_E_TliLvRQ3WUFEMJtmF1aQXOTzqe1SHxZC7sIU36hCSeAIs_Ss6iXSxfutjX_sSjTYe8eJICc5axiU41GyXdhCBUV9eZDc91mWgbyE9\" width=\"12\" height=\"12\"><\/a> at the <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=j%5E%7Bth%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/XQH226Up0Yt_7ekgm0CjsgkQPXa6CfepgZGED5JutYRho28IaHiXGCeMLi_7Jfwu0G8xNFkqcKTbSkUzHD686tgH5LmArLorlaxlVXScQySytn5kL2IRDMJS-QcWgnSFm5hey1kC\" loading=\"lazy\" width=\"17\" height=\"17\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/XQH226Up0Yt_7ekgm0CjsgkQPXa6CfepgZGED5JutYRho28IaHiXGCeMLi_7Jfwu0G8xNFkqcKTbSkUzHD686tgH5LmArLorlaxlVXScQySytn5kL2IRDMJS-QcWgnSFm5hey1kC\" width=\"17\" height=\"17\"><\/a>\u00a0 hidden node and get:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5Cmathcal%7BO%7D_%7Bj%7D%20%26%3D%20%5Csigma(x_%7Bj%7D)#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/oZc4aN-mUyoye9JMFrQNFQrjmwjrVr9L9jYR9FpJfUAwBKQQpQNZsp5-SwIH6jSldtoXSgZtkbMEPPcGAyZisKYy0aZv_mDyxhNYbLbgnpYAvlK_MJTSHugjiZIuANwjJUsHO1RY\" loading=\"lazy\" width=\"76\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/oZc4aN-mUyoye9JMFrQNFQrjmwjrVr9L9jYR9FpJfUAwBKQQpQNZsp5-SwIH6jSldtoXSgZtkbMEPPcGAyZisKYy0aZv_mDyxhNYbLbgnpYAvlK_MJTSHugjiZIuANwjJUsHO1RY\" width=\"76\" height=\"16\"><\/a><\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%26%3D%20%5Csigma(%20%20%5Cxi_%7B1%7D%20w_%7B1j%7D%20%2B%20%5Cxi_%7B2%7D%20w_%7B2j%7D)#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/_wSinxxXfnm7kEhDP1Z4rJNRqapH2K-_X_UO-fksVJ_Jecf_S9LFOWYw6ANqCYYRmbtdmMGDEEY1xCngiX-vL3-s2thvy1mlNFNs_o_Yhi1eRPFluwq7Pudd9P47D1lM03OZEYYp\" loading=\"lazy\" width=\"129\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/_wSinxxXfnm7kEhDP1Z4rJNRqapH2K-_X_UO-fksVJ_Jecf_S9LFOWYw6ANqCYYRmbtdmMGDEEY1xCngiX-vL3-s2thvy1mlNFNs_o_Yhi1eRPFluwq7Pudd9P47D1lM03OZEYYp\" width=\"129\" height=\"16\"><\/a><\/p>\n<p dir=\"ltr\">\u00a0<a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cmathcal%7BO%7D_%7Bj%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/9Aa87hkPGMcI9WKEsCSTasH566kYa-kQkImALRQUYDmuX1tQbRzccdBqziQ-WHr7-v4lVPawvPBb1VWhSHA2WMZNppICfC4BFkQNW5PWw7Gg6mHLTr5ziu_EzW9TN4shKdHaKjm9\" loading=\"lazy\" width=\"16\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/9Aa87hkPGMcI9WKEsCSTasH566kYa-kQkImALRQUYDmuX1tQbRzccdBqziQ-WHr7-v4lVPawvPBb1VWhSHA2WMZNppICfC4BFkQNW5PWw7Gg6mHLTr5ziu_EzW9TN4shKdHaKjm9\" width=\"16\" height=\"16\"><\/a> is the output of the <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=j%5E%7Bth%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/XQH226Up0Yt_7ekgm0CjsgkQPXa6CfepgZGED5JutYRho28IaHiXGCeMLi_7Jfwu0G8xNFkqcKTbSkUzHD686tgH5LmArLorlaxlVXScQySytn5kL2IRDMJS-QcWgnSFm5hey1kC\" loading=\"lazy\" width=\"17\" height=\"17\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/XQH226Up0Yt_7ekgm0CjsgkQPXa6CfepgZGED5JutYRho28IaHiXGCeMLi_7Jfwu0G8xNFkqcKTbSkUzHD686tgH5LmArLorlaxlVXScQySytn5kL2IRDMJS-QcWgnSFm5hey1kC\" width=\"17\" height=\"17\"><\/a> hidden node. This is calculated for each of the j nodes in the hidden layer.\u00a0 The resulting outputs now become the input for the next layer in the network.\u00a0<\/p>\n<p dir=\"ltr\">In our case, this is the final output layer. So for each of the k nodes in K:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5Cmathcal%7BO%7D_%7Bk%7D%20%26%3D%20%5Csigma(x_%7Bk%7D)#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/XwUYp1m2EoOV0H3HWpyCkRQq09Sz52eHhy-ihGsGHdC8bZr0bCq8AcKNFvKLNP-PArhtwz106GIGWXpLOAuVALCGCRTgeU1WFX7iXgVTzWaVHcAIjxpmL65jyPyiBmOTIkFavSiD\" loading=\"lazy\" width=\"77\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/XwUYp1m2EoOV0H3HWpyCkRQq09Sz52eHhy-ihGsGHdC8bZr0bCq8AcKNFvKLNP-PArhtwz106GIGWXpLOAuVALCGCRTgeU1WFX7iXgVTzWaVHcAIjxpmL65jyPyiBmOTIkFavSiD\" width=\"77\" height=\"16\"><\/a><\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%26%3D%20%5Csigma%20%5Cleft(%20%5Csum_%7Bj%20%5Cin%20J%7D%20%20%5Cmathcal%7BO%7D_%7Bj%7D%20w_%7Bjk%7D%20%20%5Cright)#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/xiwLqatmCp0ErCyQjBez9JAI4tS8fk5Mxg7dzlb-vzf1ff9LObD9tVoze-3cfIT91dygSgppz-tb_8rgnTlBbyazJ-aC1zKL6RPLz3yImhh4C0xjNUEWF8ilVE6c2Dc42hJy3N6q\" loading=\"lazy\" width=\"120\" height=\"49\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/xiwLqatmCp0ErCyQjBez9JAI4tS8fk5Mxg7dzlb-vzf1ff9LObD9tVoze-3cfIT91dygSgppz-tb_8rgnTlBbyazJ-aC1zKL6RPLz3yImhh4C0xjNUEWF8ilVE6c2Dc42hJy3N6q\" width=\"120\" height=\"49\"><\/a><\/p>\n<p dir=\"ltr\">This is the end of the feed-forward pass. So how well did our network do at getting the correct result \u00a0<a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cmathcal%7BO%7D_%7Bk%7D%3F#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/f47Ap8_2oEXBMjbPTLkZMigoTfZ2gXwX9Yljnij1S2T2fufv_Xd-Wqj6tckpJrQpQOCXWkbxLRSa3Oqzo42zcdHxZpxIddHEmiAq1G2ARrGmNgi59l-MCrrPYTkC_eSR2q-hfprg\" loading=\"lazy\" width=\"25\" height=\"13\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/f47Ap8_2oEXBMjbPTLkZMigoTfZ2gXwX9Yljnij1S2T2fufv_Xd-Wqj6tckpJrQpQOCXWkbxLRSa3Oqzo42zcdHxZpxIddHEmiAq1G2ARrGmNgi59l-MCrrPYTkC_eSR2q-hfprg\" width=\"25\" height=\"13\"><\/a><\/p>\n<p dir=\"ltr\">As this is the training phase of our network, the true results will be known when we calculate the error.<\/p>\n<h2 id=\"t-1600276832662\" class=\"\">Whas is Error<\/h2>\n<p dir=\"ltr\">We measure error at the end of each forward pass. This allows us to quantify how well our network has performed in getting the correct output. Once the neural networks build completed. We can use the <a href=\"https:\/\/dataaspirant.com\/six-popular-classification-evaluation-metrics-in-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">various evaluation metrics<\/a> to measure the performance of the model.<\/p>\n<p dir=\"ltr\">Let\u2019s define <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=t_%7Bk%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/snoLrXm-hO3P4thproLyIHKQZe3Q-5VYL64llaFsaszat2hOtA8FwsAIkAqqmTnsAJzWJDMA7RCB3vZOfoMKNo7EMTPSavZTifzORPA5dTLc2ACc4z4sTl_lR1nyROeR6WTYcHM1\" loading=\"lazy\" width=\"11\" height=\"12\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/snoLrXm-hO3P4thproLyIHKQZe3Q-5VYL64llaFsaszat2hOtA8FwsAIkAqqmTnsAJzWJDMA7RCB3vZOfoMKNo7EMTPSavZTifzORPA5dTLc2ACc4z4sTl_lR1nyROeR6WTYcHM1\" width=\"11\" height=\"12\"><\/a> as the expected or target value of the <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=k%5E%7Bth%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/HxQJbKtj3NhnP5q0mze7PYCs3ZexTv87sdPXqIid3O5PBssC63NMOWSZOcEs-NFPq3fanTEB4oXVquHxPco4ndt0wvm8varHRnOBRJZP5FVrheO0IKqhx9-UCyQY0SZ6nWyd2XXX\" loading=\"lazy\" width=\"17\" height=\"15\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/HxQJbKtj3NhnP5q0mze7PYCs3ZexTv87sdPXqIid3O5PBssC63NMOWSZOcEs-NFPq3fanTEB4oXVquHxPco4ndt0wvm8varHRnOBRJZP5FVrheO0IKqhx9-UCyQY0SZ6nWyd2XXX\" width=\"17\" height=\"15\"><\/a> node of the output layer K then the <strong>error E<\/strong> on the entire output is:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Ctext%7BE%7D%20%3D%20%5Cfrac%7B1%7D%7B2%7D%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%5E%7B2%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/yC_kQYx-uqnEaTexGxkiPJjP7NymQ58BtSHbLVtimrNMr_5l9c9N2Lf_U57xUzVFoMzOsuUPdTY4WerQhXnQwJIlRRnECLhyISqwDRrIm6-2-wmsaatBewk3R6XWV_rrZKEYegaZ\" loading=\"lazy\" width=\"144\" height=\"41\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/yC_kQYx-uqnEaTexGxkiPJjP7NymQ58BtSHbLVtimrNMr_5l9c9N2Lf_U57xUzVFoMzOsuUPdTY4WerQhXnQwJIlRRnECLhyISqwDRrIm6-2-wmsaatBewk3R6XWV_rrZKEYegaZ\" width=\"144\" height=\"41\"><\/a><\/p>\n<p dir=\"ltr\">Good! Now how does this help us?\u00a0<\/p>\n<p dir=\"ltr\">Our aim here is to find a way to tune our network such that when we do a forward pass of the input data, the output is exactly what we know it should be. But we can\u2019t change the input data, so there are only two things we can change:<\/p>\n<ol class=\"\">\n<li>The weights going into the activation function.\u00a0<\/li>\n<li>The activation function itself.\u00a0<\/li>\n<\/ol>\n<p dir=\"ltr\">The second case will be considered as a separate blog post since there are a lot of activation functions, but the magic of neural networks is all about the <strong>weights<\/strong>.<\/p>\n<p dir=\"ltr\">Getting each weight i.e. each connection between nodes, to be just the perfect value, is what <strong>backpropagation<\/strong> is all about. We\u2019ll look at the backpropagation algorithm in the next section.\u00a0<\/p>\n<p dir=\"ltr\">But let\u2019s go ahead and set it up by considering the following:<\/p>\n<blockquote class=\"\"><p>How much of this <strong>error E<\/strong> has come from each of the <strong>weights<\/strong> in the network?<\/p><\/blockquote>\n<p dir=\"ltr\">What is the proportion of the error coming from each of the <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=W_%7Bjk%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/oW7PEjkGV8ujEfaid883KNFAoc6rymmRRK6Kh2u2qcgmiTNH6s-Lg6afvd4DCEoxqTDCJieIr_GUTUwefCHPnzA-XAJz6bhGJYAN8yI4CFaCG1W3iItaHdDKCqoKE4-NahZpHnyC\" loading=\"lazy\" width=\"25\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/oW7PEjkGV8ujEfaid883KNFAoc6rymmRRK6Kh2u2qcgmiTNH6s-Lg6afvd4DCEoxqTDCJieIr_GUTUwefCHPnzA-XAJz6bhGJYAN8yI4CFaCG1W3iItaHdDKCqoKE4-NahZpHnyC\" width=\"25\" height=\"16\"><\/a> connections between the nodes in the layer J and the output layer K. In mathematical terms:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cfrac%7B%5Cpartial%7B%5Ctext%7BE%7D%7D%7D%7B%5Cpartial%7BW_%7Bjk%7D%7D%7D%20%3D%20%20%5Cfrac%7B%5Cpartial%7B%7D%7D%7B%5Cpartial%7BW_%7Bjk%7D%7D%7D%20%20%5Cfrac%7B1%7D%7B2%7D%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%5E%7B2%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/744grS0eJKLKhnPEDKvGJle3mZUFKJc2vkcv_8pt8UYQUzS4M-XSNdg2NQ2QxL_IiceK0u5X4xqik5AUBhibrpxcHPmAHjCYgEHPWsZtOB-raVqtYX_SR4dqwUfQkmYPBPqbB4oW\" loading=\"lazy\" width=\"212\" height=\"43\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/744grS0eJKLKhnPEDKvGJle3mZUFKJc2vkcv_8pt8UYQUzS4M-XSNdg2NQ2QxL_IiceK0u5X4xqik5AUBhibrpxcHPmAHjCYgEHPWsZtOB-raVqtYX_SR4dqwUfQkmYPBPqbB4oW\" width=\"212\" height=\"43\"><\/a><\/p>\n<p dir=\"ltr\">The derivative of the error function w.r.t weights is then:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cfrac%7B%5Cpartial%7B%5Ctext%7BE%7D%7D%7D%7B%5Cpartial%7BW_%7Bjk%7D%7D%7D%20%20%3D%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%20%5Cmathcal%7BO%7D_%7Bk%7D%20%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bk%7D%20%20%5Cright)%20%5Cmathcal%7BO%7D_%7Bj%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/hlKwFCkSPOet4LWVS0OoYvHCdwEmiRTf4gLLatgKP8J7wV_Jz7Mfxa4FNfE0CMSx3-xG_Jq_Ay53j9Oo6Z5a7xAmXOOQ_QFJD9SdreTXMl5I-7n_AwckZtgMkx9Q3cuAeA7mBJtl\" loading=\"lazy\" width=\"232\" height=\"39\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/hlKwFCkSPOet4LWVS0OoYvHCdwEmiRTf4gLLatgKP8J7wV_Jz7Mfxa4FNfE0CMSx3-xG_Jq_Ay53j9Oo6Z5a7xAmXOOQ_QFJD9SdreTXMl5I-7n_AwckZtgMkx9Q3cuAeA7mBJtl\" width=\"232\" height=\"39\"><\/a><\/p>\n<p dir=\"ltr\">We group the terms involving k and define:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cdelta_%7Bk%7D%20%3D%20%5Cmathcal%7BO%7D_%7Bk%7D%20%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bk%7D%20%20%5Cright)%20%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)#0\" class=\"hasimg tve-froala\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/862vNzosET2ZmBROb-pW-Ip-KuKn8PIAGkDbsFz4eFTc4NgCLXaN69vBBKuJatItVdmBPfHKfQjwL-nw2i_pMflSE0v86sNVeFCpzSctNNfxSviG6I1sY_SehvaJ5mLHg4AUTQ8G\" loading=\"lazy\" width=\"185\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/862vNzosET2ZmBROb-pW-Ip-KuKn8PIAGkDbsFz4eFTc4NgCLXaN69vBBKuJatItVdmBPfHKfQjwL-nw2i_pMflSE0v86sNVeFCpzSctNNfxSviG6I1sY_SehvaJ5mLHg4AUTQ8G\" width=\"185\" height=\"16\"><\/a><\/p>\n<p dir=\"ltr\">And therefore:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cfrac%7B%5Cpartial%7B%5Ctext%7BE%7D%7D%7D%7B%5Cpartial%7BW_%7Bjk%7D%7D%7D%20%20%3D%20%5Cmathcal%7BO%7D_%7Bj%7D%20%5Cdelta_%7Bk%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/XeSFfK-EFs53XkN7d8y9X7_lzHif-p8C30YEbfNY3IyluL4iWdiDhzO4v7om0qSybmEfZMmEfG-Ghe5N4NIWIllgcdlooRGDoCNeSQG-hRn6CdhLdsfMgeHpIpnVm38hdnuAnOyX\" loading=\"lazy\" width=\"92\" height=\"39\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/XeSFfK-EFs53XkN7d8y9X7_lzHif-p8C30YEbfNY3IyluL4iWdiDhzO4v7om0qSybmEfZMmEfG-Ghe5N4NIWIllgcdlooRGDoCNeSQG-hRn6CdhLdsfMgeHpIpnVm38hdnuAnOyX\" width=\"92\" height=\"39\"><\/a><\/p>\n<p dir=\"ltr\">So we have an expression for the amount of error, called \u2018<strong>delta<\/strong>\u2019 (<a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cdelta_%7Bk%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/G8bltReGYxP9DxlwzOqOHk5s1qA6bLMkKfnZdCUKkCEZiewcqerewvXDOQ64acgzANggTqFtccB13kVvpotlbUXj2nbt__yNzvpFtmkA9Uj2t6-U1FJJEGk9uKkvedPnZyT247bh\" loading=\"lazy\" width=\"12\" height=\"13\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/G8bltReGYxP9DxlwzOqOHk5s1qA6bLMkKfnZdCUKkCEZiewcqerewvXDOQ64acgzANggTqFtccB13kVvpotlbUXj2nbt__yNzvpFtmkA9Uj2t6-U1FJJEGk9uKkvedPnZyT247bh\" width=\"12\" height=\"13\"><\/a>). But how does this help us to improve our network? We need to back propagate the error.<\/p>\n<p dir=\"ltr\">When calculating the errors, special care needs to be taken in the form of the loss function. As the neural networks will <a href=\"https:\/\/dataaspirant.com\/handle-overfitting-deep-learning-models\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">tend to overfit the data<\/a> if the data we provided is not diversified enough.<\/p>\n<p dir=\"ltr\">Even though we have various ways to <a href=\"https:\/\/dataaspirant.com\/data-augmentation-techniques-deep-learning\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">create more diversified data<\/a> with the available data, it&#8217;s still worth keeping this in mind.<\/p>\n<h2 id=\"t-1600276832663\" class=\"\">How Back Propagation Works<\/h2>\n<p dir=\"ltr\">Backpropagation takes the error function and uses it to calculate the error on the current layer and updates the weights to that layer by some amount.\u00a0<\/p>\n<p dir=\"ltr\">So far we\u2019ve looked at the error on the output layer, what about the hidden layer?\u00a0<\/p>\n<p dir=\"ltr\">This also has an error, but the error here depends on the output layer\u2019s error too because this is where the difference between the target and output <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cmathcal%7BO%7D_%7Bk%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/xISpGBpsT_xtxyFthySZgDmClRJQ2H0PytmMXt0MTxQf49CIfiSQYPQLfIje8d1eyDagMdI43XQM3PxY9ZZZhhvN57Zw8gcd_tRevN1ePhWLsiOcHpJvV07dnvFRZzhZhjmp2yYY\" loading=\"lazy\" width=\"17\" height=\"13\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/xISpGBpsT_xtxyFthySZgDmClRJQ2H0PytmMXt0MTxQf49CIfiSQYPQLfIje8d1eyDagMdI43XQM3PxY9ZZZhhvN57Zw8gcd_tRevN1ePhWLsiOcHpJvV07dnvFRZzhZhjmp2yYY\" width=\"17\" height=\"13\"><\/a> can be calculated.<\/p>\n<p dir=\"ltr\">Let\u2019s have a look at the error on the weights of the hidden layer <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=W_%7Bij%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/TPmdLxHkuw6RmaA-IBQzH9e0Ih6xD3rxoBFjacp_QezkvZV6wC5uqR3RQZAxEJ3LndKExKAWs2d2kisysjH9gZlG04rZwJGs9VnpNqysjCBmGD05dZXN9SDMGB6MpSVlfe4etSNX\" loading=\"lazy\" width=\"23\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/TPmdLxHkuw6RmaA-IBQzH9e0Ih6xD3rxoBFjacp_QezkvZV6wC5uqR3RQZAxEJ3LndKExKAWs2d2kisysjH9gZlG04rZwJGs9VnpNqysjCBmGD05dZXN9SDMGB6MpSVlfe4etSNX\" width=\"23\" height=\"16\"><\/a>:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cfrac%7B%5Cpartial%7B%5Ctext%7BE%7D%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%3D%20%20%5Cfrac%7B%5Cpartial%7B%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%20%5Cfrac%7B1%7D%7B2%7D%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%5E%7B2%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/a33eQ6SHStUQZPwmAMqdK8SUyGLxoG32ndmFqY_pYWIrjBWi5h3PPxy7wKtjY91GhbhOxV2Vmf0O16Ba2eVRt2VrtXjVGQXWmLBPVkWydh3A8az9cTMpapKV5chXwRvZpzxXSv5u\" loading=\"lazy\" width=\"208\" height=\"43\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/a33eQ6SHStUQZPwmAMqdK8SUyGLxoG32ndmFqY_pYWIrjBWi5h3PPxy7wKtjY91GhbhOxV2Vmf0O16Ba2eVRt2VrtXjVGQXWmLBPVkWydh3A8az9cTMpapKV5chXwRvZpzxXSv5u\" width=\"208\" height=\"43\"><\/a><\/p>\n<p dir=\"ltr\">Now, unlike before, we cannot just drop the summation as the derivative is not directly acting on a subscript k in the summation. We should be careful to note that the output from every node in J is actually connected to each of the nodes in K so the summation should stay.\u00a0<\/p>\n<p dir=\"ltr\">But we can still use the same tricks as before: let\u2019s use the power rule again and move the derivative inside (because the summation is finite):<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5Cfrac%7B%5Cpartial%7B%5Ctext%7BE%7D%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%26%3D%20%20%5Cfrac%7B1%7D%7B2%7D%20%5Ctimes%202%20%5Ctimes%20%5Cfrac%7B%5Cpartial%7B%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%20%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%20%20%5Cmathcal%7BO%7D_%7Bk%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/EpX3UhHELOQJ0EQK3NZnAK_HHw9BiD3IT5NYUWlwsC0rOj2FxCFgwUIMwynEd5kqI_FIX8JBzZUVRytnq1wMzK9VVlxlOFaFniZ5qRuNaF5LAFqaMn7iXtEWVRHaICVpDVQle87-\" loading=\"lazy\" width=\"272\" height=\"43\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/EpX3UhHELOQJ0EQK3NZnAK_HHw9BiD3IT5NYUWlwsC0rOj2FxCFgwUIMwynEd5kqI_FIX8JBzZUVRytnq1wMzK9VVlxlOFaFniZ5qRuNaF5LAFqaMn7iXtEWVRHaICVpDVQle87-\" width=\"272\" height=\"43\"><\/a>\u00a0<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%26%3D%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%20%5Cfrac%7B%5Cpartial%7B%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%5Cmathcal%7BO%7D_%7Bk%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/Q5SA45zv-3kfRMuaz2f-V5foMwwBWXPhV3gAIxyPG2sDVh3JY99hMuPKPApiYZlP1YRlVA5GPyRSn4DwAH2yYbl_g_6KL_pYo3nIClzglaIt1a5ZF6svSxRtHxH7jiPasxbpkY38\" loading=\"lazy\" width=\"168\" height=\"43\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/Q5SA45zv-3kfRMuaz2f-V5foMwwBWXPhV3gAIxyPG2sDVh3JY99hMuPKPApiYZlP1YRlVA5GPyRSn4DwAH2yYbl_g_6KL_pYo3nIClzglaIt1a5ZF6svSxRtHxH7jiPasxbpkY38\" width=\"168\" height=\"43\"><\/a><\/p>\n<p dir=\"ltr\">Again, we substitute \u00a0<a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cmathcal%7BO%7D_%7Bk%7D%20%3D%20%5Csigma(%20x_%7Bk%7D)#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/-15o3VnjAXs5IGpItHcMeLm366K_cyVaPDs6DpHhFps68QT76bP_mMMiU00qLhqdIeNNPvBmjEdIZzv-5Ro885wg3mdkw_WHoj5_Nx-p4O5hMfW7abHH_ZmfZnIR7FV4BAGUDBGU\" loading=\"lazy\" width=\"77\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/-15o3VnjAXs5IGpItHcMeLm366K_cyVaPDs6DpHhFps68QT76bP_mMMiU00qLhqdIeNNPvBmjEdIZzv-5Ro885wg3mdkw_WHoj5_Nx-p4O5hMfW7abHH_ZmfZnIR7FV4BAGUDBGU\" width=\"77\" height=\"16\"><\/a> and its derivative and revert back to our output notation:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5Cfrac%7B%5Cpartial%7B%5Ctext%7BE%7D%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%26%3D%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%20%5Cfrac%7B%5Cpartial%7B%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20(%5Csigma(x_%7Bk%7D)%20)#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/hAMWV0mEDmV6tyr2gyKJVfU3wBQ3932slCgOsMfbqeeydCPpszVeoJ00Z3APXMkJHgwPTVpYWv5IQpfo4Wt5tL_8or9AxmKkxD6lnQ993EeqDl7i3KYVdhLy_0ymPC3gOBA3xMB1\" loading=\"lazy\" width=\"240\" height=\"43\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/hAMWV0mEDmV6tyr2gyKJVfU3wBQ3932slCgOsMfbqeeydCPpszVeoJ00Z3APXMkJHgwPTVpYWv5IQpfo4Wt5tL_8or9AxmKkxD6lnQ993EeqDl7i3KYVdhLy_0ymPC3gOBA3xMB1\" width=\"240\" height=\"43\"><\/a><\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%26%3D%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%20%5Csigma(x_%7Bk%7D)%20%5Cleft(%201%20-%20%5Csigma(x_%7Bk%7D)%20%5Cright)%20%5Cfrac%7B%5Cpartial%7B%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20(x_%7Bk%7D)#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/Q6gQa_WrDtJLZ1iee3H64RCFu5tSJ7bDWpH_3uxrW2vhe-nHygBky2fI4kNu8oJsPNCNdOf9_uryoZs9nFct697Tp5uDW_FlBeO3Wh3AsXzGZkuB-QtZwBKY3a1zJM1p-jnylLH8\" loading=\"lazy\" width=\"300\" height=\"43\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/Q6gQa_WrDtJLZ1iee3H64RCFu5tSJ7bDWpH_3uxrW2vhe-nHygBky2fI4kNu8oJsPNCNdOf9_uryoZs9nFct697Tp5uDW_FlBeO3Wh3AsXzGZkuB-QtZwBKY3a1zJM1p-jnylLH8\" width=\"300\" height=\"43\"><\/a><\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%26%3D%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cright)%20%5Cfrac%7B%5Cpartial%7B%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20(x_%7Bk%7D)#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/wIct6vp_JJGl-xL_X6zEhjPIUrpemN8VbxZKDQcj-cdJ-hWmSBvc5yNjyiRFZIcthsNTHKGBrEy9a3a5iu-4HU7PNB2o37SwokyBlgtRbl7w0H84MVZ5jpbnYfuNUPo6d5_al4yV\" loading=\"lazy\" width=\"263\" height=\"43\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/wIct6vp_JJGl-xL_X6zEhjPIUrpemN8VbxZKDQcj-cdJ-hWmSBvc5yNjyiRFZIcthsNTHKGBrEy9a3a5iu-4HU7PNB2o37SwokyBlgtRbl7w0H84MVZ5jpbnYfuNUPo6d5_al4yV\" width=\"263\" height=\"43\"><\/a><\/p>\n<p dir=\"ltr\">This still looks familiar from the output layer derivative, but now we\u2019re struggling with the derivative of the input to k i.e. <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=x_k#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/OO5HwOymTcLe3i9NJBMiE4ZN60hMvfoMlvhN7THOF60eFspmkda7BbDln-MTZujVQty8NCJUCah0X4onVjr8nH20NW_4jsMHzBhsag9sP5jsJfeEx23vODaCisOOF4n6dMSfxyqY\" loading=\"lazy\" width=\"15\" height=\"9\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/OO5HwOymTcLe3i9NJBMiE4ZN60hMvfoMlvhN7THOF60eFspmkda7BbDln-MTZujVQty8NCJUCah0X4onVjr8nH20NW_4jsMHzBhsag9sP5jsJfeEx23vODaCisOOF4n6dMSfxyqY\" width=\"15\" height=\"9\"><\/a> w.r.t the weights from I to J.<\/p>\n<p dir=\"ltr\">Let\u2019s use the <strong>chain rule to break<\/strong> apart this derivative in terms of the output from J:\u2028<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cfrac%7B%5Cpartial%7B%20x_%7Bk%7D%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%3D%20%5Cfrac%7B%5Cpartial%7B%20x_%7Bk%7D%7D%7D%7B%5Cpartial%7B%5Cmathcal%7BO%7D_%7Bj%7D%7D%7D%5Cfrac%7B%5Cpartial%7B%5Cmathcal%7BO%7D_%7Bj%7D%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/QP9hvOVLrDzv8IDtyBSG-gaYRlU1O_AnT1I5YbKsSAtSV-IBAxTAmCs5FRsaz01eWlpuIhobyqrv5LaJDcVC2tDzlM2SUBwlyO66LA3I3UBfUC7D0IGhbvUcwAWivNMQrbeD8Yfi\" loading=\"lazy\" width=\"125\" height=\"39\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/QP9hvOVLrDzv8IDtyBSG-gaYRlU1O_AnT1I5YbKsSAtSV-IBAxTAmCs5FRsaz01eWlpuIhobyqrv5LaJDcVC2tDzlM2SUBwlyO66LA3I3UBfUC7D0IGhbvUcwAWivNMQrbeD8Yfi\" width=\"125\" height=\"39\"><\/a><\/p>\n<p dir=\"ltr\">The change of the input to the <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=k%5E%7Bth%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/KLMuR2i1PsncfczQ_8u6YzOndXTgRJS8IpRUpV8SEM3eLmtsUIUhpyUTXzqyu1EaYjwV0D3rU76OIOjJzopujeIcfcgzX1tYQrzovVjIQjdart6CIOlYrB2DP6NpEY-OGTdsgY6-\" loading=\"lazy\" width=\"20\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/KLMuR2i1PsncfczQ_8u6YzOndXTgRJS8IpRUpV8SEM3eLmtsUIUhpyUTXzqyu1EaYjwV0D3rU76OIOjJzopujeIcfcgzX1tYQrzovVjIQjdart6CIOlYrB2DP6NpEY-OGTdsgY6-\" width=\"20\" height=\"16\"><\/a> node with respect to the output from <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=j%5E%7Bth%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/OJ8CKr99a8WvXgQs1qNDBJQyzYT1TnZR0uxxGq_qOLZ9Wxzwb_2OP3PE9HDYRwC6Hu71Lfb3YX0OOLkG6eoE5SyycwIA796diH89ZidTftxonjNeV1GPS1-sbOHd8LJTTLhU7t7d\" loading=\"lazy\" width=\"20\" height=\"19\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/OJ8CKr99a8WvXgQs1qNDBJQyzYT1TnZR0uxxGq_qOLZ9Wxzwb_2OP3PE9HDYRwC6Hu71Lfb3YX0OOLkG6eoE5SyycwIA796diH89ZidTftxonjNeV1GPS1-sbOHd8LJTTLhU7t7d\" width=\"20\" height=\"19\"><\/a> the node is down to a product with the weights <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=W_%7Bjk%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/RTfv2JrOG8qYF9lstT5xU_YgpZ_4xjvts1UJj5jAtV4b9C6KJyCYjkbAkuwadHyyGgzwiHXURSPGBjVU5l4vg0Lbi0mo1Z9k-G2V_KfUhvKgEAPMLUicHdhti2xc05t7Cq7gY-nL\" loading=\"lazy\" width=\"28\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/RTfv2JrOG8qYF9lstT5xU_YgpZ_4xjvts1UJj5jAtV4b9C6KJyCYjkbAkuwadHyyGgzwiHXURSPGBjVU5l4vg0Lbi0mo1Z9k-G2V_KfUhvKgEAPMLUicHdhti2xc05t7Cq7gY-nL\" width=\"28\" height=\"16\"><\/a>.<\/p>\n<p dir=\"ltr\">Therefore this derivative just becomes the weights . The final derivative has nothing to do with the subscript k anymore, so we\u2019re free to move this around \u2014 lets put it at the beginning:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5Cfrac%7B%5Cpartial%7B%5Ctext%7BE%7D%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%26%3D%20%5Cfrac%7B%5Cpartial%7B%5Cmathcal%7BO%7D_%7Bj%7D%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cright)%20W_%7Bjk%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/sku0l8ioOf-BkJJrlcpaccpbYR8VOjf1Ji7OkJW1fkFPt589n4_1pUUvm81GlCtXz-W5B4ZPbAOVMKdkiIkrmgJYfNjjV9GeY8HCGhgkeT2spOC_r9oS3OphC2Nz9xyti4PNaZCh\" loading=\"lazy\" width=\"335\" height=\"47\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/sku0l8ioOf-BkJJrlcpaccpbYR8VOjf1Ji7OkJW1fkFPt589n4_1pUUvm81GlCtXz-W5B4ZPbAOVMKdkiIkrmgJYfNjjV9GeY8HCGhgkeT2spOC_r9oS3OphC2Nz9xyti4PNaZCh\" width=\"335\" height=\"47\"><\/a><\/p>\n<p dir=\"ltr\">Let\u2019s finish the derivatives, remembering that the output of the node j is just <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cmathcal%7BO%7D_%7Bj%7D%20%3D%20%5Csigma(x_%7Bj%7D)#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/a_VUwqkTuLV32mzluQwRD4jhHYyJnlYoPt4dOTkVopFZDD_zPuJM5MnegJCY0WttzFplF980pReVigmRLcQHgcDueH_4K80vLW1F3xdB6Z8hJSpJxGMV8xU_O5UFyn20uxRc0adA\" loading=\"lazy\" width=\"83\" height=\"17\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/a_VUwqkTuLV32mzluQwRD4jhHYyJnlYoPt4dOTkVopFZDD_zPuJM5MnegJCY0WttzFplF980pReVigmRLcQHgcDueH_4K80vLW1F3xdB6Z8hJSpJxGMV8xU_O5UFyn20uxRc0adA\" width=\"83\" height=\"17\"><\/a> and we know the derivative of this function too:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5Cfrac%7B%5Cpartial%7B%5Ctext%7BE%7D%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%26%3D%20%5Cfrac%7B%5Cpartial%7B%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%5Csigma(x_%7Bj%7D)%20%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cright)%20W_%7Bjk%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/A0efAEneUs9ha5AaIvAN7zSa5p12H18twWATIRPfpjxJoPjpom1H84xu1AqIQRr0ZvPBd09wR2e9K8iYjRdVlHYsjjkfubBaxmF_K3MyGo9ucHijVsy8s5tv92PdhR-AnME-uDpm\" loading=\"lazy\" width=\"376\" height=\"47\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/A0efAEneUs9ha5AaIvAN7zSa5p12H18twWATIRPfpjxJoPjpom1H84xu1AqIQRr0ZvPBd09wR2e9K8iYjRdVlHYsjjkfubBaxmF_K3MyGo9ucHijVsy8s5tv92PdhR-AnME-uDpm\" width=\"376\" height=\"47\"><\/a><\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%26%3D%20%5Csigma(x_%7Bj%7D)%20%5Cleft(%201%20-%20%5Csigma(x_%7Bj%7D)%20%5Cright)%20%20%5Cfrac%7B%5Cpartial%7Bx_%7Bj%7D%20%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cright)%20W_%7Bjk%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/-RYSDmMd_9iTkaV0aOTKJcOT7NpP6GJeDczvc4yH-VxvCoTI8_OHN8yXcqEAkXj33TNNLuAKQfSlOya5M4_tPfCiL8u_z2oHKQZcnzGtzI4G2PqP472RqfleH98f2CXDjVFY0YJ3\" loading=\"lazy\" width=\"420\" height=\"47\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/-RYSDmMd_9iTkaV0aOTKJcOT7NpP6GJeDczvc4yH-VxvCoTI8_OHN8yXcqEAkXj33TNNLuAKQfSlOya5M4_tPfCiL8u_z2oHKQZcnzGtzI4G2PqP472RqfleH98f2CXDjVFY0YJ3\" width=\"420\" height=\"47\"><\/a><\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%26%3D%20%5Cmathcal%7BO%7D_%7Bj%7D%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bj%7D%20%5Cright)%20%20%5Cfrac%7B%5Cpartial%7Bx_%7Bj%7D%20%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cright)%20W_%7Bjk%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/M4CS3xJtVJTk_R9e4n2ORHo4LLZ7rQqsfvTEBGQnFatAodiZyXx9hQi30JBNrRhBhi--CmWFMuka4eEYQyX7-QEJiEBdzacp6TrwchNe58mRtAlakgIELbuU3T3XKAvIy2lfFNFL\" loading=\"lazy\" width=\"380\" height=\"47\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/M4CS3xJtVJTk_R9e4n2ORHo4LLZ7rQqsfvTEBGQnFatAodiZyXx9hQi30JBNrRhBhi--CmWFMuka4eEYQyX7-QEJiEBdzacp6TrwchNe58mRtAlakgIELbuU3T3XKAvIy2lfFNFL\" width=\"380\" height=\"47\"><\/a><\/p>\n<p dir=\"ltr\">The final derivative is straight forward too, the derivative of the input to j w.r.t the weights is just the previous input, which in our case is ,<a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cmathcal%7BO%7D_%7Bi%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/gFcYacXbYvVk5TjdfeiNDnejM_9NSYv0C4CRfl1dIz2r_YN1rWzSYCPmQlkUdU9AjfbrzD0WK32Vgu93Y6jNM1JR4bTUxYR3QoNY-8f6ADICGwzilzvrsX8R75D6qZo0kA50ly1y\" loading=\"lazy\" width=\"16\" height=\"15\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/gFcYacXbYvVk5TjdfeiNDnejM_9NSYv0C4CRfl1dIz2r_YN1rWzSYCPmQlkUdU9AjfbrzD0WK32Vgu93Y6jNM1JR4bTUxYR3QoNY-8f6ADICGwzilzvrsX8R75D6qZo0kA50ly1y\" width=\"16\" height=\"15\"><\/a>.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5Cfrac%7B%5Cpartial%7B%5Ctext%7BE%7D%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%26%3D%20%5Cmathcal%7BO%7D_%7Bj%7D%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bj%7D%20%5Cright)%20%20%5Cmathcal%7BO%7D_%7Bi%7D%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bk%7D%20%5Cright)%20W_%7Bjk%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/m7boAk8petNpVHbHidsqzJFalOpOGhcvDhdijvdELshJAIFOLHt-QB4RLwPu0edeBgaUEMCSWljjXzyIWRJng4th2Oul_JQsihISgCBuiilhqtxO_wvtS7hBFOmsNgoCwEFhqs1j\" loading=\"lazy\" width=\"404\" height=\"47\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/m7boAk8petNpVHbHidsqzJFalOpOGhcvDhdijvdELshJAIFOLHt-QB4RLwPu0edeBgaUEMCSWljjXzyIWRJng4th2Oul_JQsihISgCBuiilhqtxO_wvtS7hBFOmsNgoCwEFhqs1j\" width=\"404\" height=\"47\"><\/a><\/p>\n<p dir=\"ltr\">Almost there! Recall that we defined earlier, lets substitute that in:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5Cfrac%7B%5Cpartial%7B%5Ctext%7BE%7D%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%26%3D%20%5Cmathcal%7BO%7D_%7Bj%7D%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bj%7D%20%5Cright)%20%20%5Cmathcal%7BO%7D_%7Bi%7D%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cdelta_%7Bk%7D%20W_%7Bjk%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/y3cVDVbDP_55Y7k22Kj9jtaswZmIngdjdc2hOHJKKmeMaZ42tjks_2WP_LwqxMUGEAPyi3t87SD2uof1DHbr3V4RqCCkJftAQyv_CNURNFKGIVNJHfDP0sOL4-sUp7CQopWy4YhZ\" loading=\"lazy\" width=\"252\" height=\"47\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/y3cVDVbDP_55Y7k22Kj9jtaswZmIngdjdc2hOHJKKmeMaZ42tjks_2WP_LwqxMUGEAPyi3t87SD2uof1DHbr3V4RqCCkJftAQyv_CNURNFKGIVNJHfDP0sOL4-sUp7CQopWy4YhZ\" width=\"252\" height=\"47\"><\/a><\/p>\n<p dir=\"ltr\">To clean this up, we now define the <strong>\u2018delta\u2019<\/strong> for our hidden layer:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cdelta_%7Bj%7D%20%3D%20%5Cmathcal%7BO%7D_%7Bi%7D%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bj%7D%20%5Cright)%20%20%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cdelta_%7Bk%7D%20W_%7Bjk%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/FyqAzbCZxDBLczwCro4TecuS84gaTVptPfxvt-nGbz6HMumlWuy1Ow_LxriQ4ezKiUh4gzFZa8VevMDx5P-75CR95ZNZT5Ko_axXhHuoZIfLX0-pQM6y0o0bF7jqWjfC4UsgTpC5\" loading=\"lazy\" width=\"201\" height=\"39\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/FyqAzbCZxDBLczwCro4TecuS84gaTVptPfxvt-nGbz6HMumlWuy1Ow_LxriQ4ezKiUh4gzFZa8VevMDx5P-75CR95ZNZT5Ko_axXhHuoZIfLX0-pQM6y0o0bF7jqWjfC4UsgTpC5\" width=\"201\" height=\"39\"><\/a><\/p>\n<p dir=\"ltr\">That\u2019s the amount of error on each of the <strong>weights<\/strong> going into our hidden layer:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cfrac%7B%5Cpartial%7B%5Ctext%7BE%7D%7D%7D%7B%5Cpartial%7BW_%7Bij%7D%7D%7D%20%20%3D%20%5Cmathcal%7BO%7D_%7Bi%7D%20%5Cdelta_%7Bj%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/Mc88l2DHhYkD3i6LO8M4DYvI-k1uUhuBBJVYyuv6WjH4MqBwyMFQHagTBWKOOPY4FLRSzRokP85EGE_WKhj8hQHYMn3WrI5y34nvd1SDG5b47VY9Iz8vwx8BOy5IWTB4Sy6niu92\" loading=\"lazy\" width=\"87\" height=\"39\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/Mc88l2DHhYkD3i6LO8M4DYvI-k1uUhuBBJVYyuv6WjH4MqBwyMFQHagTBWKOOPY4FLRSzRokP85EGE_WKhj8hQHYMn3WrI5y34nvd1SDG5b47VY9Iz8vwx8BOy5IWTB4Sy6niu92\" width=\"87\" height=\"39\"><\/a><\/p>\n<h2 class=\"\" id=\"t-1600276832664\">What is Bias<\/h2>\n<p dir=\"ltr\">Let&#8217;s remind ourselves what happened inside our <strong>hidden layer nodes<\/strong>:<\/p>\n<ol class=\"\">\n<li>Each feature <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cxi_%7Bi%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/o1kdaSDgrkeLn6U5acbpCVKa6afZl6zhyTZwLekCTCKJaWkLfIDfuWdx3AYVjywgLiqNHMp63VZWOLh_aZFtjYke4ON-3hpVB3pNDjwUGmI2g_y2nK4iyAMScAEKyqnNs8y0l16H\" loading=\"lazy\" width=\"11\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/o1kdaSDgrkeLn6U5acbpCVKa6afZl6zhyTZwLekCTCKJaWkLfIDfuWdx3AYVjywgLiqNHMp63VZWOLh_aZFtjYke4ON-3hpVB3pNDjwUGmI2g_y2nK4iyAMScAEKyqnNs8y0l16H\" width=\"11\" height=\"16\"><\/a> from the input layer I is multiplied by some weight <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=w_%7Bij%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/LEVHeMKHrf5YyOfppm0oMvQPXEIvdqa87-QOVoJ3zrz2ibCD3ueCJGb9qX8FczlPtnjsYAnHZrazlP0NDbVh25GjB0wp6LaAes4-A38q4VEEE2FMWik_RQH_N6YrqL5yiMycGqXD\" loading=\"lazy\" width=\"20\" height=\"12\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/LEVHeMKHrf5YyOfppm0oMvQPXEIvdqa87-QOVoJ3zrz2ibCD3ueCJGb9qX8FczlPtnjsYAnHZrazlP0NDbVh25GjB0wp6LaAes4-A38q4VEEE2FMWik_RQH_N6YrqL5yiMycGqXD\" width=\"20\" height=\"12\"><\/a>.<\/li>\n<li>These are added together to get <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=x_%7Bi%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/ojIxkR8KNcLioAJaAQ7KfZ8oALokANlgMcotpJ3xfVpmI7uNhhNkWdPcBFQxUlFPczs4Ug82IK6Urx-vfqsrQSbtUA3j_11QogsQS83H3znriumB3Gto-4GTf6z2mMyRcTJQcwsU\" loading=\"lazy\" width=\"12\" height=\"11\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/ojIxkR8KNcLioAJaAQ7KfZ8oALokANlgMcotpJ3xfVpmI7uNhhNkWdPcBFQxUlFPczs4Ug82IK6Urx-vfqsrQSbtUA3j_11QogsQS83H3znriumB3Gto-4GTf6z2mMyRcTJQcwsU\" width=\"12\" height=\"11\"><\/a> the total, weighted input from the nodes in I.\u00a0<\/li>\n<li>\n<a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=x_%7Bi%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/1SfVfVMiGtxrOPM-pRQzG4xrTZoxGxuoMWVkIv2PzBexlT_9PlBV-nfNdU1c0no1ys0LTt-R2i9F2bXwIWW5Z_XcycFck1jEsb0_SofGE0yJc6zzT-yXRgDTcgWFoqRD6_ncE-30\" loading=\"lazy\" width=\"12\" height=\"11\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/1SfVfVMiGtxrOPM-pRQzG4xrTZoxGxuoMWVkIv2PzBexlT_9PlBV-nfNdU1c0no1ys0LTt-R2i9F2bXwIWW5Z_XcycFck1jEsb0_SofGE0yJc6zzT-yXRgDTcgWFoqRD6_ncE-30\" width=\"12\" height=\"11\"><\/a> is passed through the activation or transfer function, <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%7B%5Csigma%7D(x_%7Bi%7D)#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/uD-Kinhir2zLPRfGfizE2RijukZKnj9iXiosmEtvI5xV8q_ituVKP0_97bINV_bJ8eu0s6liAqL_WCZg7EGzEDPA7yrKZzulVykzYZ-99wuA05kiKT09kiDxv-xTxHSeilA_tqyb\" loading=\"lazy\" width=\"36\" height=\"17\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/uD-Kinhir2zLPRfGfizE2RijukZKnj9iXiosmEtvI5xV8q_ituVKP0_97bINV_bJ8eu0s6liAqL_WCZg7EGzEDPA7yrKZzulVykzYZ-99wuA05kiKT09kiDxv-xTxHSeilA_tqyb\" width=\"36\" height=\"17\"><\/a>.<\/li>\n<li>This gives the output <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cmathcal%7BO%7D_%7Bj%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/_uMWvrKXgAhH6DNH4DtQfVC0LWdjJAGdP9UXNb86uHKM3Qih7LWmqI5wPZdts3D_NfHKEb-U-SBFsNEHof9hRG-kim-SKmrfjw7rUbAiwDhOq4AGkaqwlUvL-ZcoKVcaNZssJnp_\" loading=\"lazy\" width=\"17\" height=\"17\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/_uMWvrKXgAhH6DNH4DtQfVC0LWdjJAGdP9UXNb86uHKM3Qih7LWmqI5wPZdts3D_NfHKEb-U-SBFsNEHof9hRG-kim-SKmrfjw7rUbAiwDhOq4AGkaqwlUvL-ZcoKVcaNZssJnp_\" width=\"17\" height=\"17\"><\/a> for each of the j nodes in hidden layer J.<\/li>\n<li>\n<a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cmathcal%7BO%7D_%7Bj%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/gCTPNkEymP9klxoOHtJU-6XkpcMphjYCyQaEUwLuRa8lV7ooEdfw-lp6Kgs-BgfFAnfHJ51x0gctJZSpxTSAPdiuVuFuDWmn1uIjC0zstck9jdEV4orHxKkNhD-3LYke0xT8ZCB9\" loading=\"lazy\" width=\"17\" height=\"17\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/gCTPNkEymP9klxoOHtJU-6XkpcMphjYCyQaEUwLuRa8lV7ooEdfw-lp6Kgs-BgfFAnfHJ51x0gctJZSpxTSAPdiuVuFuDWmn1uIjC0zstck9jdEV4orHxKkNhD-3LYke0xT8ZCB9\" width=\"17\" height=\"17\"><\/a> from each of the J nodes becomes <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cxi_%7Bj%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/Vyuj4dCcBHebsATGUmzc9ffagirVVj53L8n85cvozmz3vaBCAMeM6lKtwEVMSMQGfGVji2c_3HDwK3KBS8-6Qih_fCIZPV8RjffSO0-1x7x3GCPiEPUVidkn8zb6MrzbpNzay_ZJ\" loading=\"lazy\" width=\"12\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/Vyuj4dCcBHebsATGUmzc9ffagirVVj53L8n85cvozmz3vaBCAMeM6lKtwEVMSMQGfGVji2c_3HDwK3KBS8-6Qih_fCIZPV8RjffSO0-1x7x3GCPiEPUVidkn8zb6MrzbpNzay_ZJ\" width=\"12\" height=\"16\"><\/a> for the next layer.<\/li>\n<\/ol>\n<p dir=\"ltr\">When we talk about the bias term in neural networks, we are actually talking about an additional parameter that is included in the summation of <strong>step 2<\/strong> above.\u00a0<\/p>\n<p dir=\"ltr\">The bias term is usually denoted with the symbol <strong>\u03b8 (theta)<\/strong>. Its function is to act as a threshold for the activation (transfer) function.<\/p>\n<p dir=\"ltr\">Given the value of <strong>1<\/strong> and is not connected to anything else. As such, this means that any derivative of the node\u2019s output with respect to the bias term would just give a constant, 1.<\/p>\n<p dir=\"ltr\">This allows us to just think of the bias term as an output from the node with the value of 1. This will be <strong>updated<\/strong> later during back propagation to change the threshold at which the node fires.<\/p>\n<p dir=\"ltr\">Lets update the equation of <a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=x_%7Bi%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/wwXRhMaOgsEcFopfDRwkoVWDh_LxgsY_GuXpMP7ZR7H0FjhzQrXSE-wC-LeNwxVycoMscJNKxdYcRAUUq4HzPlfH0tcAVSUdPIXmnvhjez7msggClCgVxIuCHhpMqKCZ0WT0QCkF\" loading=\"lazy\" width=\"12\" height=\"11\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/wwXRhMaOgsEcFopfDRwkoVWDh_LxgsY_GuXpMP7ZR7H0FjhzQrXSE-wC-LeNwxVycoMscJNKxdYcRAUUq4HzPlfH0tcAVSUdPIXmnvhjez7msggClCgVxIuCHhpMqKCZ0WT0QCkF\" width=\"12\" height=\"11\"><\/a>:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=x_%7Bi%7D%20%26%3D%20%5Cxi_%7B1j%7D%20w_%7B1j%7D%20%2B%20%5Cxi_%7B2j%7D%20w_%7B2j%7D%20%2B%20%5Ctheta_%7Bj%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/RLu9E3CHnk8P4zGh7VDsO6i3UlytN6XFG5TyBpLKdc_pbLcw4ytdEaS7qHjdQdaxRLGGuNQANozHiIXvYMno9Efb1v4X9_CFaFdYKSGy2RWTCl9ajm_Xb4fxCAViKmBjGKHnRrx_\" loading=\"lazy\" width=\"187\" height=\"17\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/RLu9E3CHnk8P4zGh7VDsO6i3UlytN6XFG5TyBpLKdc_pbLcw4ytdEaS7qHjdQdaxRLGGuNQANozHiIXvYMno9Efb1v4X9_CFaFdYKSGy2RWTCl9ajm_Xb4fxCAViKmBjGKHnRrx_\" width=\"187\" height=\"17\"><\/a><\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5Csigma(%20x_%7Bi%7D%20)%20%26%3D%20%5Csigma%20%5Cleft(%20%5Csum_%7Bi%20%5Cin%20I%7D%20%5Cleft(%20%5Cxi_%7Bij%7D%20w_%7Bij%7D%20%5Cright)%20%2B%20%5Ctheta_%7Bj%7D%20%5Cright)#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh5.googleusercontent.com\/kTmbXfl2oUGuHNSQFJ0E47q3l4QrERvq6Cj6ZDtDkC1R_zNODZT0Ef0TOFfSmfOjLXt2F9atUPmyiGS3nR6GQBZyb8S7SC039hB6wZocAi5_RhGNUrmLo5FcfC4SzTb1PYWTL7oG\" loading=\"lazy\" width=\"220\" height=\"53\"><img loading=\"lazy\" src=\"https:\/\/lh5.googleusercontent.com\/kTmbXfl2oUGuHNSQFJ0E47q3l4QrERvq6Cj6ZDtDkC1R_zNODZT0Ef0TOFfSmfOjLXt2F9atUPmyiGS3nR6GQBZyb8S7SC039hB6wZocAi5_RhGNUrmLo5FcfC4SzTb1PYWTL7oG\" width=\"220\" height=\"53\"><\/a><\/p>\n<p dir=\"ltr\">Now we have all the pieces to understand the neural networks. The bias we are talking here is completly different from the <strong>bias-variance<\/strong> tradeoff in machine learning.<\/p>\n<h2 class=\"\" id=\"t-1600276832665\">Conclusion<\/h2>\n<p dir=\"ltr\">We\u2019ve got the initial outputs after our feed-forward, we have the equations for the delta terms (the amount by which the error is based on the different weights) and we know we need to update our bias term too.<\/p>\n<p dir=\"ltr\">So what does it look like:<\/p>\n<p class=\"dir=\">1. Input the data into the network and feed-forward.\u00a0<\/p>\n<p class=\"dir=\">2. For each of the output nodes calculate:<\/p>\n<p class=\"dir=\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cdelta_%7Bk%7D%20%3D%20%5Cmathcal%7BO%7D_%7Bk%7D%20%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bk%7D%20%20%5Cright)%20%20%5Cleft(%20%5Cmathcal%7BO%7D_%7Bk%7D%20-%20t_%7Bk%7D%20%5Cright)#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/25boS0-esmbsPUGn87VlmzovkQwsv_U8-h395FGOfMVdg0msrRKe85Kmgc26D0mtGKdfbROuwe3ypUP6AU_h8QvjLz0OkCog5oP6nzYWG7iqLlfZ5-cbsTCrbjesJJNXeiE3Crhq\" loading=\"lazy\" width=\"203\" height=\"17\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/25boS0-esmbsPUGn87VlmzovkQwsv_U8-h395FGOfMVdg0msrRKe85Kmgc26D0mtGKdfbROuwe3ypUP6AU_h8QvjLz0OkCog5oP6nzYWG7iqLlfZ5-cbsTCrbjesJJNXeiE3Crhq\" width=\"203\" height=\"17\"><\/a><\/p>\n<p class=\"dir=\">3. For each of the hidden layer nodes calculate:\u00a0<\/p>\n<p class=\"dir=\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Cdelta_%7Bj%7D%20%3D%20%5Cmathcal%7BO%7D_%7Bi%7D%20%5Cleft(%201%20-%20%5Cmathcal%7BO%7D_%7Bj%7D%20%5Cright)%20%20%20%5Csum_%7Bk%20%5Cin%20K%7D%20%5Cdelta_%7Bk%7D%20W_%7Bjk%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/kQk72QJlGldFh9If8DmIB8ocCX_RgcreqJOt0AervpgYSyUTZamybcrkckJReG-2D88QvXwpEAFlkC8asc41JK9qFhBJaHkVFtLpZ5rnUpY4TY_2kqjur0ZVcRAvPsrUNPAcCFV8\" loading=\"lazy\" width=\"201\" height=\"39\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/kQk72QJlGldFh9If8DmIB8ocCX_RgcreqJOt0AervpgYSyUTZamybcrkckJReG-2D88QvXwpEAFlkC8asc41JK9qFhBJaHkVFtLpZ5rnUpY4TY_2kqjur0ZVcRAvPsrUNPAcCFV8\" width=\"201\" height=\"39\"><\/a><\/p>\n<p class=\"dir=\">4. Calculate the changes that need to be made to the weights and bias terms:<\/p>\n<p class=\"dir=\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5CDelta%20W%20%26%3D%20-%5Ceta%20%5C%20%5Cdelta_%7Bl%7D%20%5C%20%5Cmathcal%7BO%7D_%7Bl-1%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/61tBEEY3wGMbDIheDhHQi2rfwGSlhH4tQIRlkrkNSQ5a-nv5tmwQB6tAcX6WNbaPLz-OrKsGAb41PQQmOOrS7lNK9WZg2QD3MXY_g3e_HUYnvRBuOrsFN9QZhtNGakWcfHMFFo1l\" loading=\"lazy\" width=\"136\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/61tBEEY3wGMbDIheDhHQi2rfwGSlhH4tQIRlkrkNSQ5a-nv5tmwQB6tAcX6WNbaPLz-OrKsGAb41PQQmOOrS7lNK9WZg2QD3MXY_g3e_HUYnvRBuOrsFN9QZhtNGakWcfHMFFo1l\" width=\"136\" height=\"16\"><\/a><\/p>\n<p class=\"dir=\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5CDelta%5Ctheta%20%26%3D%20-%5Ceta%20%5C%20%5Cdelta_%7Bl%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/WohqPx5f9f85src-qy1CWNMwc6uXNQBjjO0ncwNSTvMvAFgaBhVCMr7rc1hd5w4RfijDbxscnBmVjevePKpriDWFSU1wuEXSOD5y6XnH3et5a78Lj_CJHmGGnx7Bwd5at-0z8xJP\" loading=\"lazy\" width=\"85\" height=\"16\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/WohqPx5f9f85src-qy1CWNMwc6uXNQBjjO0ncwNSTvMvAFgaBhVCMr7rc1hd5w4RfijDbxscnBmVjevePKpriDWFSU1wuEXSOD5y6XnH3et5a78Lj_CJHmGGnx7Bwd5at-0z8xJP\" width=\"85\" height=\"16\"><\/a><\/p>\n<p class=\"dir=\">5. Update the weights and biases across the network:<\/p>\n<p class=\"dir=\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=W%20%2B%20%5CDelta%20W%20%26%5Crightarrow%20W#0\" class=\"tve-froala fr-basic hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/P3363HX-NSXlBSGTN94CKrYS9wnYqP60vQ0Ag2oPBr4gvCKhTID0xXYNfftHUB00FW3QW84PWuSULaxxPlK2JWsnlYhvFhxiJaMsn_Y8HwZd17WVN1A_Q69ev63iockD8IcQtTZU\" loading=\"lazy\" width=\"120\" height=\"13\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/P3363HX-NSXlBSGTN94CKrYS9wnYqP60vQ0Ag2oPBr4gvCKhTID0xXYNfftHUB00FW3QW84PWuSULaxxPlK2JWsnlYhvFhxiJaMsn_Y8HwZd17WVN1A_Q69ev63iockD8IcQtTZU\" width=\"120\" height=\"13\"><\/a><\/p>\n<p class=\"dir=\"><a href=\"https:\/\/latex-staging.easygenerator.com\/eqneditor\/editor.php?latex=%5Ctheta%20%2B%20%5CDelta%5Ctheta%20%26%5Crightarrow%20%5Ctheta#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/EJhWhYM5cTU1oqHLa0Zgz68LYy4vnYPs2_tEqXABdkWHedFNR5v2QNeAym7tM_MsIoEwOSJBMzbqBGGZy7LIOuvo54v8h-ZoWWCXv2DBcKU4FAJaOpl7wKXjp0cajVQAbbhpJECv\" loading=\"lazy\" width=\"88\" height=\"13\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/EJhWhYM5cTU1oqHLa0Zgz68LYy4vnYPs2_tEqXABdkWHedFNR5v2QNeAym7tM_MsIoEwOSJBMzbqBGGZy7LIOuvo54v8h-ZoWWCXv2DBcKU4FAJaOpl7wKXjp0cajVQAbbhpJECv\" width=\"88\" height=\"13\"><\/a><\/p>\n<p dir=\"ltr\">This algorithm is looped over and over until the error between the output and the target values is below some set threshold. Depending on the size of the network i.e. the number of layers and number of nodes per layer, it can take a long time to complete one \u2018epoch\u2019 or run through of this algorithm.<\/p>\n<p dir=\"ltr\">In the next article, we\u2019ll discuss different types of activation functions. If you have FOMO \u201cfear of missing out\u201d please <a href=\"https:\/\/www.facebook.com\/dataaspirant\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">follow us<\/a>.\u00a0<\/p>\n<p dir=\"ltr\">If you like the article share it, if not tell us. Be like a neural network, <strong>learn from mistakes<\/strong>.\u00a0<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/dataaspirant.com\/neural-network-basics\/<\/p>\n","protected":false},"author":0,"featured_media":1643,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1642"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=1642"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1642\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/1643"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=1642"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=1642"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=1642"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}