{"id":748,"date":"2020-08-27T14:44:47","date_gmt":"2020-08-27T14:44:47","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/08\/27\/4-ways-to-improve-your-tensorflow-model-key-regularization-techniques-you-need-to-know\/"},"modified":"2020-08-27T14:44:47","modified_gmt":"2020-08-27T14:44:47","slug":"4-ways-to-improve-your-tensorflow-model-key-regularization-techniques-you-need-to-know","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/08\/27\/4-ways-to-improve-your-tensorflow-model-key-regularization-techniques-you-need-to-know\/","title":{"rendered":"4 ways to improve your TensorFlow model \u2013 key regularization techniques you need to know"},"content":{"rendered":"<div id=\"post-\">\n<div class=\"author-link\"><b>By <a href=\"https:\/\/www.kdnuggets.com\/author\/ahmad-anis\" title=\"Posts by Ahmad Anis\" rel=\"author\">Ahmad Anis<\/a>, Machine learning and Data Science Student.<\/b><\/div>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/0*rwwgUuNXEoHNUIpy\" width=\"90%\"><\/p>\n<p><em>Photo by\u00a0<a class=\"bt gj gk gl gm gn\" href=\"https:\/\/unsplash.com\/@oowgnuj?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">Jungwoo Hong<\/a>\u00a0on\u00a0<a class=\"bt gj gk gl gm gn\" href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">Unsplash<\/a>.<\/em><\/p>\n<h3>Reguaralization<\/h3>\n<p>According to Wikipedia,<\/p>\n<blockquote>\n<p><em>In mathematics, statistics, and computer science, particularly in machine learning and inverse problems,\u00a0<strong>regularization<\/strong>\u00a0is the process of adding information in order to solve an ill-posed problem or to prevent overfitting.<\/em><\/p>\n<\/blockquote>\n<p>This means that we add some extra information in order to solve a problem and to prevent overfitting.<\/p>\n<p>Overfitting simply means that our Machine Learning Model is trained on some data, and it will work extremely well on that data, but it will fail to generalize on new unseen examples.<\/p>\n<p>We can see overfitting in this simple example<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/499\/1*ukHiBr6Oh1y5saNcdHwWzA.png\" width=\"90%\"><\/p>\n<p><em><a href=\"http:\/\/mlwiki.org\/index.php\/Overfitting\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/mlwiki.org\/index.php\/Overfitting<\/a><\/em><\/p>\n<p>Where our data is strictly attached to our training examples. This results in poor performance on test\/dev sets and good performance on the training set.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/566\/1*Voqdid9UxmJJ7fusuKXmeQ.png\" width=\"90%\"><\/p>\n<p><em><a href=\"http:\/\/mlwiki.org\/index.php\/Overfitting\">http:\/\/mlwiki.org\/index.php\/Overfitting<\/a><\/em><\/p>\n<p>So in order to improve the performance of the model, we use different regularization techniques. There are several techniques, but we will discuss 4 main techniques.<\/p>\n<ol>\n<li><strong>L1 Regularization<\/strong><\/li>\n<li><strong>L2 Regularization<\/strong><\/li>\n<li><strong>Dropout<\/strong><\/li>\n<li><strong>Batch Normalization<\/strong><\/li>\n<\/ol>\n<p>I will briefly explain how these techniques work and how to implement them in Tensorflow 2.<\/p>\n<p>In order to get good intuition about how and why they work, I refer you to Professor Andrew NG lectures on all these topics, easily available on Youtube.<\/p>\n<p><em>First, I will code a model without Regularization, then I will show how to improve it by adding different regularization techniques. We will use the IRIS data set to show that using regularization improves the same model a lot.<\/em><\/p>\n<p>\u00a0<\/p>\n<h3>Model without Regularization<\/h3>\n<p>\u00a0<\/p>\n<p><strong>Code:<\/strong><\/p>\n<p><strong>model1.summary()<\/strong><\/p>\n<div>\n<pre>Model: \"sequential\"\r\n_________________________________________________________________\r\nLayer (type)                 Output Shape              Param #   \r\n=================================================================\r\ndense_6 (Dense)              (None, 512)               2560      \r\n_________________________________________________________________\r\ndense_7 (Dense)              (None, 256)               131328    \r\n_________________________________________________________________\r\ndense_8 (Dense)              (None, 128)               32896     \r\n_________________________________________________________________\r\ndense_9 (Dense)              (None, 64)                8256      \r\n_________________________________________________________________\r\ndense_10 (Dense)             (None, 32)                2080      \r\n_________________________________________________________________\r\ndense_11 (Dense)             (None, 3)                 99        \r\n=================================================================\r\nTotal params: 177,219\r\nTrainable params: 177,219\r\nNon-trainable params: 0\r\n_________________________________________________________________\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>After training the model, if we evaluate the model using the following code in Tensorflow, we can find our\u00a0<em>accuracy<\/em>,\u00a0<em>loss<\/em>, and\u00a0<em>mse<\/em> at the test set.<\/p>\n<div>\n<pre>loss1, acc1, mse1 = model1.evaluate(X_test, y_test)\r\nprint(f\"Loss is {loss1},nAccuracy is {acc1*100},nMSE is {mse1}\")\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/730\/1*gb_kV5IKFrxhbBkQbsBUcQ.png\" width=\"90%\"><\/p>\n<p>Let\u2019s check the plots for Validation Loss and Training Loss.<\/p>\n<div>\n<pre>import matplotlib.pyplot as plt\r\nplt.style.use('ggplot')\r\nplt.plot(hist.history['loss'], label = 'loss')\r\nplt.plot(hist.history['val_loss'], label='val loss')\r\nplt.title(\"Loss vs Val_Loss\")\r\nplt.xlabel(\"Epochs\")\r\nplt.ylabel(\"Loss\")\r\nplt.legend()\r\nplt.show()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/523\/1*Cmrr1jmoATtyYk848i4oAw.png\" width=\"90%\"><\/p>\n<p>Here, we can see that validation loss is gradually increasing after\u00a0<strong>\u2248\u00a0<\/strong>60 epochs as compared to training loss. This shows that our model is overfitted.<\/p>\n<p>And similarly, for model accuracy plot,<\/p>\n<div>\n<pre>plt.plot(hist.history['acc'], label = 'acc')\r\nplt.plot(hist.history['val_acc'], label='val acc')\r\nplt.title(\"acc vs Val_acc\")\r\nplt.xlabel(\"Epochs\")\r\nplt.ylabel(\"acc\")\r\nplt.legend()\r\nplt.show()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/504\/1*91bxbH5jtobdqiUPn07u6g.png\" width=\"90%\"><\/p>\n<p>This again shows that validation accuracy is low as compared to training accuracy, which again shows signs of overfitting.<\/p>\n<p>\u00a0<\/p>\n<h3>L1 Regularization:<\/h3>\n<p>\u00a0<\/p>\n<p>A commonly used Regularization technique is L1 regularization, also known as Lasso Regularization.<\/p>\n<p>The main concept of L1 Regularization is that we have to penalize our weights by adding absolute values of weight in our loss function, multiplied by a regularization parameter lambda\u00a0<strong>\u03bb,\u00a0<\/strong>where\u00a0<strong>\u03bb\u00a0<\/strong>is manually tuned to be greater than 0.<\/p>\n<p>The equation for L1 is<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/473\/1*S-nRr8nlqpqxWXKbY6spPQ.png\" width=\"378\" height=\"85\"><\/p>\n<p><em>Image Credit:\u00a0<a href=\"https:\/\/towardsdatascience.com\/intuitions-on-l1-and-l2-regularisation-235f2db4c261#2a1f\" target=\"_blank\" rel=\"noopener noreferrer\">Towards Data Science<\/a>.<\/em><\/p>\n<p><strong>Tensorflow Code:<\/strong><\/p>\n<p>Here, we added an extra parameter\u00a0<em>kernel_regularizer<\/em>, which we set it to \u2018l1\u2019 for L1 Regularization.<\/p>\n<p>Let\u2019s Evaluate and plot the model now.<\/p>\n<div>\n<pre>loss2, acc2, mse2 = model2.evaluate(X_test, y_test)\r\nprint(f\"Loss is {loss2},nAccuracy is {acc2 * 100},nMSE is {mse2}\")\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/729\/1*id_EVu7sssjeAjn2li892Q.png\" width=\"90%\"><\/p>\n<p>Hmmm, Accuracy is pretty much the same, let\u2019s check the plots to get better intuition.<\/p>\n<div>\n<pre>plt.plot(hist2.history[\u2018loss\u2019], label = \u2018loss\u2019)\r\nplt.plot(hist2.history[\u2018val_loss\u2019], label=\u2019val loss\u2019)\r\nplt.title(\u201cLoss vs Val_Loss\u201d)\r\nplt.xlabel(\u201cEpochs\u201d)\r\nplt.ylabel(\u201cLoss\u201d)\r\nplt.legend()\r\nplt.show()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/503\/1*kdO46k4idHdp4GQUGNjBUQ.png\" width=\"90%\"><\/p>\n<p>And for Accuracy,<\/p>\n<div>\n<pre>plt.figure(figsize=(15,8))\r\nplt.plot(hist2.history['acc'], label = 'acc')\r\nplt.plot(hist2.history['val_acc'], label='val acc')\r\nplt.title(\"acc vs Val_acc\")\r\nplt.xlabel(\"Epochs\")\r\nplt.ylabel(\"acc\")\r\nplt.legend()\r\nplt.show()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*pG1pr6TN4mszgGnc2evzNA.png\" width=\"90%\"><\/p>\n<p>Well, quite an improvement, I guess, because over validation loss is not increasing that much as it was previously, but validation accuracy is not increasing much. Let\u2019s add\u00a0l1\u00a0in more layers to check if it improves the model or not.<\/p>\n<div>\n<pre>model3 = Sequential([\r\n    Dense(512, activation='tanh', input_shape = X_train[0].shape, kernel_regularizer='l1'),\r\n    Dense(512\/\/2, activation='tanh', kernel_regularizer='l1'),\r\n    Dense(512\/\/4, activation='tanh', kernel_regularizer='l1'),\r\n    Dense(512\/\/8, activation='tanh', kernel_regularizer='l1'),\r\n    Dense(32, activation='relu', kernel_regularizer='l1'),\r\n    Dense(3, activation='softmax')\r\n])\r\nmodel3.compile(optimizer='sgd',loss='categorical_crossentropy', metrics=['acc', 'mse'])\r\nhist3 = model3.fit(X_train, y_train, epochs=350, batch_size=128, validation_data=(X_test,y_test), verbose=2)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>After training, let\u2019s evaluate the model.<\/p>\n<div>\n<pre>loss3, acc3, mse3 = model3.evaluate(X_test, y_test)\r\nprint(f\"Loss is {loss3},nAccuracy is {acc3 * 100},nMSE is {mse3}\")\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/730\/1*wn_VNSXciHX7qwmPEISKZw.png\" width=\"90%\"><\/p>\n<p>Well, the accuracy is quite improved now, it jumped from 92 to 94. Let\u2019s check the plots.<\/p>\n<p><strong>Loss<\/strong><\/p>\n<div>\n<pre>plt.figure(figsize=(15,8))\r\nplt.plot(hist3.history['loss'], label = 'loss')\r\nplt.plot(hist3.history['val_loss'], label='val loss')\r\nplt.title(\"Loss vs Val_Loss\")\r\nplt.xlabel(\"Epochs\")\r\nplt.ylabel(\"Loss\")\r\nplt.legend()\r\nplt.show()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*UTbTdBtefpnNHlKHdul0jw.png\" width=\"90%\"><\/p>\n<p>Now, both lines are approximately overlapping, which means that our model is performing just as same on the test set as it was performing on the training set.<\/p>\n<p><strong>Accuracy<\/strong><\/p>\n<div>\n<pre>plt.figure(figsize=(15,8))\r\nplt.plot(hist3.history['acc'], label = 'ACC')\r\nplt.plot(hist3.history['val_acc'], label='val acc')\r\nplt.title(\"acc vs Val_acc\")\r\nplt.xlabel(\"Epochs\")\r\nplt.ylabel(\"acc\")\r\nplt.legend()\r\nplt.show()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*m_zGAe-sPhwbWIyUOR2Zag.png\" width=\"90%\"><\/p>\n<p>And we can see that the validation loss of the model is not increasing as compared to training loss, and validation accuracy is also increasing.<\/p>\n<p>\u00a0<\/p>\n<h3>L2 Regularization<\/h3>\n<p>\u00a0<\/p>\n<p>L2 Regularization is another regularization technique which is also known as\u00a0<strong>Ridge regularization<\/strong>. In L2 regularization we add the squared magnitude of weights to penalize our lost function.<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/474\/1*IbdA-PCx51P2SmY3eHGubg.png\" width=\"379\" height=\"90\"><\/p>\n<p><em>Image Credit:\u00a0<a href=\"https:\/\/towardsdatascience.com\/intuitions-on-l1-and-l2-regularisation-235f2db4c261#2a1f\" target=\"_blank\" rel=\"noopener noreferrer\">Towards Data Science<\/a>.<\/em><\/p>\n<p><strong>Tensorflow Code:<\/strong><\/p>\n<div>\n<pre>model5 = Sequential([\r\n    Dense(512, activation='tanh', input_shape = X_train[0].shape, kernel_regularizer='l2'),\r\n    Dense(512\/\/2, activation='tanh'),\r\n    Dense(512\/\/4, activation='tanh'),\r\n    Dense(512\/\/8, activation='tanh'),\r\n    Dense(32, activation='relu'),\r\n    Dense(3, activation='softmax')\r\n])\r\nmodel5.compile(optimizer='sgd',loss='categorical_crossentropy', metrics=['acc', 'mse'])\r\nhist5 = model5.fit(X_train, y_train, epochs=350, batch_size=128, validation_data=(X_test,y_test), verbose=2)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>After training, let\u2019s evaluate the model.<\/p>\n<div>\n<pre>loss5, acc5, mse5 = model5.evaluate(X_test, y_test)\r\nprint(f\"Loss is {loss5},nAccuracy is {acc5 * 100},nMSE is {mse5}\")\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>And the output is<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/715\/1*QG_wzJ6qIXY7kk7yV74QbA.png\" width=\"90%\"><\/p>\n<p>Here we can see that validation accuracy is 97%, which is quite good. Let\u2019s plot for more intuition.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*pzirnqp9eJKe_UNxjJ0XkQ.png\" width=\"90%\"><\/p>\n<p>Here we can see that we are not overfitting our data. Let\u2019s plot accuracy.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*FZeNK8yEaUV7c2OSf5bt6w.png\" width=\"90%\"><\/p>\n<p>Adding \u201cL2\u201d Regularization in just 1 layer has improved our model a lot.<\/p>\n<p>Let\u2019s Now add\u00a0<strong>L2<\/strong>\u00a0in all other layers.<\/p>\n<div>\n<pre>model6 = Sequential([\r\n    Dense(512, activation='tanh', input_shape = X_train[0].shape, kernel_regularizer='l2'),\r\n    Dense(512\/\/2, activation='tanh', kernel_regularizer='l2'),\r\n    Dense(512\/\/4, activation='tanh', kernel_regularizer='l2'),\r\n    Dense(512\/\/8, activation='tanh', kernel_regularizer='l2'),\r\n    Dense(32, activation='relu', kernel_regularizer='l2'),\r\n    Dense(3, activation='softmax')\r\n])\r\nmodel6.compile(optimizer='sgd',loss='categorical_crossentropy', metrics=['acc', 'mse'])\r\nhist6 = model6.fit(X_train, y_train, epochs=350, batch_size=128, validation_data=(X_test,y_test), verbose=2)\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Here we have L2 in all layers. After training, let\u2019s evaluate it.<\/p>\n<div>\n<pre>loss6, acc6, mse6 = model6.evaluate(X_test, y_test)\r\nprint(f\"Loss is {loss6},nAccuracy is {acc6 * 100},nMSE is {mse6}\")\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/701\/1*8yuHfTOykx9HxwE93y7LWw.png\" width=\"90%\"><\/p>\n<p>Let\u2019s plot to get more intuitions.<\/p>\n<div>\n<pre>plt.figure(figsize=(15,8))\r\nplt.plot(hist6.history['loss'], label = 'loss')\r\nplt.plot(hist6.history['val_loss'], label='val loss')\r\nplt.title(\"Loss vs Val_Loss\")\r\nplt.xlabel(\"Epochs\")\r\nplt.ylabel(\"Loss\")\r\nplt.legend()\r\nplt.show()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*ah_UcmODu7Gj9WLGWe3UIw.png\" width=\"90%\"><\/p>\n<p>And for accuracy<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*6rYgi_F_VErXpzmWVxHj3g.png\" width=\"90%\"><\/p>\n<p>We can see that this model is also good and not overfitting to the dataset.<\/p>\n<p>\u00a0<\/p>\n<h3>Dropout<\/h3>\n<p>\u00a0<\/p>\n<p>Another common way to avoid regularization is by using the Dropout technique. The main idea behind using dropout is that we randomly turn off some neurons in our layer based on some probability. You can learn more about it\u2019s working by Professor NG\u2019s video\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=ARq74QuavAo\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>.<\/p>\n<p>Let\u2019s code it in Tensorflow.<\/p>\n<p>All previous imports are the same, and we are just adding an extra import here.<\/p>\n<p>In order to implement DropOut, all we have to do is to add a\u00a0<em>Dropout\u00a0<\/em>layer from\u00a0<em>tf.keras.layers<\/em>\u00a0and set a dropout rate in it.<\/p>\n<div>\n<pre>import tensorflow as tf\r\nmodel7 = Sequential([\r\n    Dense(512, activation='tanh', input_shape = X_train[0].shape),\r\n    tf.keras.layers.Dropout(0.5), #dropout with 50% rate\r\n    Dense(512\/\/2, activation='tanh'),\r\n    \r\n    Dense(512\/\/4, activation='tanh'),\r\n    Dense(512\/\/8, activation='tanh'),\r\n    Dense(32, activation='relu'),\r\n    Dense(3, activation='softmax')\r\n])\r\nmodel7.compile(optimizer='sgd',loss='categorical_crossentropy', metrics=['acc', 'mse'])\r\nhist7 = model7.fit(X_train, y_train, epochs=350, batch_size=128, validation_data=(X_test,y_test), verbose=2)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>After training, let\u2019s evaluate it on the test set.<\/p>\n<div>\n<pre>loss7, acc7, mse7 = model7.evaluate(X_test, y_test)\r\nprint(f\"Loss is {loss7},nAccuracy is {acc7 * 100},nMSE is {mse7}\")\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*0tZfsIC0IaHQEOFgmoK_hw.png\" width=\"90%\"><\/p>\n<p>And wow, our results are very promising, we performed 97% on our test set. Let\u2019s plot the loss and accuracy for better intuition.<\/p>\n<div>\n<pre>plt.figure(figsize=(15,8))\r\nplt.plot(hist7.history['loss'], label = 'loss')\r\nplt.plot(hist7.history['val_loss'], label='val loss')\r\nplt.title(\"Loss vs Val_Loss\")\r\nplt.xlabel(\"Epochs\")\r\nplt.ylabel(\"Loss\")\r\nplt.legend()\r\nplt.show()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*_UumgoKb-_At-xQNjRdPrw.png\" width=\"90%\"><\/p>\n<p>Here, we can see that our model is performing better on validation data as compared to training data, which is good news.<\/p>\n<p>Let\u2019s plot accuracy now.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*ST9H9EjCc37L7359BmvFyg.png\" width=\"90%\"><\/p>\n<p>And we can see that our model is performing better on the validation dataset as compared to the training set.<\/p>\n<p>Let\u2019s add more dropout layers to see how our model performs.<\/p>\n<div>\n<pre>model8 = Sequential([\r\n    Dense(512, activation='tanh', input_shape = X_train[0].shape),\r\n    tf.keras.layers.Dropout(0.5),\r\n    Dense(512\/\/2, activation='tanh'),\r\n    tf.keras.layers.Dropout(0.5),\r\n    Dense(512\/\/4, activation='tanh'),\r\n    tf.keras.layers.Dropout(0.5),\r\n    Dense(512\/\/8, activation='tanh'),\r\n    tf.keras.layers.Dropout(0.3),\r\n    Dense(32, activation='relu'),\r\n    Dense(3, activation='softmax')\r\n])\r\nmodel8.compile(optimizer='sgd',loss='categorical_crossentropy', metrics=['acc', 'mse'])\r\nhist8 = model8.fit(X_train, y_train, epochs=350, batch_size=128, validation_data=(X_test,y_test), verbose=2)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Let\u2019s Evaluate it.<\/p>\n<div>\n<pre>loss8, acc8, mse8 = model8.evaluate(X_test, y_test)\r\nprint(f\"Loss is {loss8},nAccuracy is {acc8 * 100},nMSE is {mse8}\")\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*XAbz8eiP0g40o2mOkacPUg.png\" width=\"90%\"><\/p>\n<p>This model is also very good, as it is performing 98% on the test set. Let\u2019s plot to get better intuitions.<\/p>\n<div>\n<pre>plt.figure(figsize=(15,8))\r\nplt.plot(hist8.history['loss'], label = 'loss')\r\nplt.plot(hist8.history['val_loss'], label='val loss')\r\nplt.title(\"Loss vs Val_Loss\")\r\nplt.xlabel(\"Epochs\")\r\nplt.ylabel(\"Loss\")\r\nplt.legend()\r\nplt.show()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*JwI7MVCDQ-0qqKtHtZ8pjA.png\" width=\"90%\"><\/p>\n<p>And we can see that adding more dropout layers makes the model perform slightly less good while training, but on the validation set, it is performing really well.<\/p>\n<p>Let\u2019s plot the accuracy now.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*vptpLPMYReRUB6xiSfGMvg.png\" width=\"90%\"><\/p>\n<p>And we can see the same pattern here, that our model is not performing as good while training, but when we are evaluating it, it is performing really good.<\/p>\n<p>\u00a0<\/p>\n<h3>Batch Normalization<\/h3>\n<p>\u00a0<\/p>\n<p>The main idea behind batch normalization is that we normalize the input layer by using several techniques (<em>sklearn.preprocessing.StandardScaler<\/em>) in our case, which improves the model performance, so if the input layer is benefitted by normalization, why not normalize the hidden layers, which will improve and fasten learning even further.<\/p>\n<p>To learn maths and get more intuition about it, I will redirect you again to Professor NG\u2019s lecture\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=tNIpEZLv_eg\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>\u00a0and\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=nUUqwaxLnWs\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a>.<\/p>\n<p>To add it in your TensorFlow model, just add\u00a0<em>tf.keras.layers.BatchNormalization()<\/em>\u00a0after your layers.<\/p>\n<p>Let\u2019s see the code.<\/p>\n<div>\n<pre>model9 = Sequential([\r\n    Dense(512, activation='tanh', input_shape = X_train[0].shape),\r\n    Dense(512\/\/2, activation='tanh'),\r\n    tf.keras.layers.BatchNormalization(),\r\n    Dense(512\/\/4, activation='tanh'),\r\n    Dense(512\/\/8, activation='tanh'),\r\n    Dense(32, activation='relu'),\r\n    Dense(3, activation='softmax')\r\n])\r\nmodel9.compile(optimizer='sgd',loss='categorical_crossentropy', metrics=['acc', 'mse'])\r\nhist9 = model9.fit(X_train, y_train, epochs=350,  validation_data=(X_test,y_test), verbose=2)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Here, if you noticed that I have removed the option for <em>batch_size<\/em>. This is because adding\u00a0batch_size\u00a0argument while only using\u00a0<em>tf.keras.BatchNormalization()<\/em>\u00a0as regularization, this results in a really poor performance of the model. I have tried to find the reason for this on the internet, but I could not find it. You can also change the optimizer from\u00a0<em>sgd\u00a0<\/em>to\u00a0<em>rmsprop\u00a0<\/em>or\u00a0<em>adam\u00a0<\/em>if you really want to use\u00a0batch_size\u00a0while training.<\/p>\n<p>After training, let\u2019s evaluate the model.<\/p>\n<div>\n<pre>loss9, acc9, mse9 = model9.evaluate(X_test, y_test)\r\nprint(f\"Loss is {loss9},nAccuracy is {acc9 * 100},nMSE is {mse9}\")\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*Q62tURQ19xIhhIyBJvAQoQ.png\" width=\"90%\"><\/p>\n<p>Validation accuracy for 1 Batch Normalization accuracy is not as good as compared to other techniques. Let\u2019s plot the loss and acc for better intuition.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*SWz3Y1FuTaCY5Uqed7Eiqw.png\" width=\"90%\"><\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*4pwJdAX68V4Zbm9zXlZbFg.png\" width=\"90%\"><\/p>\n<p>Here we can see that our model is not performing as well on validation set as on test set. Let\u2019s add normalization to all the layers to see the results.<\/p>\n<div>\n<pre>model11 = Sequential([\r\n    Dense(512, activation='tanh', input_shape = X_train[0].shape),\r\n    tf.keras.layers.BatchNormalization(),\r\n    Dense(512\/\/2, activation='tanh'),\r\n    tf.keras.layers.BatchNormalization(),\r\n    Dense(512\/\/4, activation='tanh'),\r\n    tf.keras.layers.BatchNormalization(),\r\n    Dense(512\/\/8, activation='tanh'),\r\n    tf.keras.layers.BatchNormalization(),\r\n    Dense(32, activation='relu'),\r\n    tf.keras.layers.BatchNormalization(),\r\n    Dense(3, activation='softmax')\r\n])\r\nmodel11.compile(optimizer='sgd',loss='categorical_crossentropy', metrics=['acc', 'mse'])\r\nhist11 = model11.fit(X_train, y_train, epochs=350,  validation_data=(X_test,y_test), verbose=2)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Let\u2019s Evaluate it.<\/p>\n<div>\n<pre>loss11, acc11, mse11 = model11.evaluate(X_test, y_test)\r\nprint(f\"Loss is {loss11},nAccuracy is {acc11 * 100},nMSEis {mse11}\")\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*QSGoSL_ft5cDjZPC_q5r7Q.png\" width=\"90%\"><\/p>\n<p>By adding Batch normalization in every layer, we achieved good accuracy. Let\u2019s plot the loss and accuracy.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*B6NUlzAZhQQ04cmmjddGGw.png\" width=\"90%\"><\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*cQd7oD2EEb7xA7EWNz2LKg.png\" width=\"90%\"><\/p>\n<p>By plotting accuracy and loss, we can see that our model is still performing better on the Training set as compared to the validation set, but still, it is improving in performance.<\/p>\n<p>\u00a0<\/p>\n<h3>Outcome:<\/h3>\n<p>\u00a0<\/p>\n<p>This article was a brief introduction on how to use different techniques in Tensorflow. If you lack the theory, I would suggest Course 2 and 3 of Deep Learning Specialization at Coursera to learn more about Regularization.<\/p>\n<p>You also have to learn when to use which technique, and when and how to combine different techniques in order to produce really fruitful results.<\/p>\n<p>Hopefully, now you have an idea of how to implement different regularization techniques in Tensorflow 2.<\/p>\n<p>\u00a0<\/p>\n<p><b>Related:<\/b><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2020\/08\/tensorflow-model-regularization-techniques.html<\/p>\n","protected":false},"author":0,"featured_media":749,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/748"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=748"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/748\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/749"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=748"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=748"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=748"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}