{"id":8338,"date":"2021-06-15T23:20:33","date_gmt":"2021-06-15T23:20:33","guid":{"rendered":"https:\/\/wealthrevelation.com\/data-science\/2021\/06\/15\/beginners-guide-to-debugging-tensorflow-models\/"},"modified":"2021-06-15T23:20:33","modified_gmt":"2021-06-15T23:20:33","slug":"beginners-guide-to-debugging-tensorflow-models","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2021\/06\/15\/beginners-guide-to-debugging-tensorflow-models\/","title":{"rendered":"Beginners Guide to Debugging TensorFlow Models"},"content":{"rendered":"<div id=\"post-\">\n<div class=\"author-link\"><b>By <a href=\"https:\/\/www.kdnuggets.com\/author\/ahmad-anis\" title=\"Posts by Ahmad Anis\" rel=\"author\">Ahmad Anis<\/a>, Machine learning and Data Science Student.<\/b><\/div>\n<p><!-- post_author Ahmad Anis -->  <\/p>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-full wp-image-128654\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/beginners-guide-debugging-tensorflow-models.jpg\" alt=\"\" width=\"90%\"><\/p>\n<p><em>Photo by\u00a0<a href=\"https:\/\/unsplash.com\/@fotograw?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener\">Dmitriy Demidov<\/a>\u00a0on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener\">Unsplash<\/a>.<\/em><\/p>\n<p>TensorFlow is one of the most famous deep learning models, and it is easy to learn. This article will discuss the most common errors a beginner can face while learning TensorFlow, the reasons, and how to solve these errors. We will discuss the solutions and also what experts from StackOverflow say about them.<\/p>\n<p>\u00a0<\/p>\n<h3>Example 1: Wrong Input Shape for CNN layer<\/h3>\n<p>\u00a0<\/p>\n<p>Suppose you are making a Convolutional Neural\u00a0Network, now if you are aware of the theory of CNN, you must know that a CNN (2D) takes in a complete image as its input shape. And a complete image has 3 color channels that are red, green, black. So the shape of a normal image would be (height, width, color channels). But if you pass in a grayscale image, it is normally (height, width), and the color channel is excluded as shown in the code.<\/p>\n<div>\n<pre>model = Sequential([\r\n    Conv2D(32, 5, input_shape=(28,28), activation=\u2019relu\u2019),\r\n    Flatten(),\r\n    Dense(10, activation=\u2019softmax\u2019)\r\n])\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Now, if you train this model, you would get an error.<\/p>\n<div>\n<pre>ValueError: Input 0 of layer conv2d_1 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (None, 28, 28)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>This is because we are passing the input shape of (28,28) and 1 extra dimension added by TensorFlow for Batch size, so the error message says that it found\u00a0<em>ndim=3<\/em>, but the CNN has\u00a0expected <em>min_ndim=4<\/em>, 3 for the image size and 1 for the batch size.<\/p>\n<p>So you can solve this error by changing the input shape in the first CNN layer and reshaping your inputs before passing it to CNN.<\/p>\n<div>\n<pre>X_train=x_train.reshape(number_of_rows,28,28,1)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>This will change your input from (number_of_rows, height, width) to (number_of_rows, height, width, color_channel) where color_channel is equal to 1, showing that it is a grayscale image. Now your CNN is ready to work. You can check this StackOverflow question for more\u00a0<a href=\"https:\/\/stackoverflow.com\/questions\/47665391\/keras-valueerror-input-0-is-incompatible-with-layer-conv2d-1-expected-ndim-4\" target=\"_blank\" rel=\"noopener\">details<\/a>.<\/p>\n<p>\u00a0<\/p>\n<h3>Example 2: Negative Dimension Size<\/h3>\n<p>\u00a0<\/p>\n<p>This is one of the common errors that new practitioners deal with when playing with CNNs or other models that change input shape after each layer. Now the output shape of CNN depends on several factors such as the number of filters, kernel size, padding type, and stride size. Let\u2019s say you have a model:<\/p>\n<div>\n<pre>model = Sequential([\r\n    Conv2D(32, 5, input_shape=(28,28,1), activation='relu'),\r\n    Conv2D(32, 5, activation='relu'),\r\n    Conv2D(32, 5, activation='relu'),\r\n    Conv2D(32, 5, activation='relu'),\r\n    Conv2D(32, 5, activation='relu'),\r\n    Conv2D(32, 5, activation='relu'),\r\n])\r\n\r\nmodel.summary()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/731\/1*CR3EQuUUR4mC4JpPRo41Tg.png\" width=\"90%\"><\/p>\n<p>You can see that the input size is getting smaller and smaller, and if you add any more CNN layers, it will reduce to negative and hence raise a negative dimension error. So you need to understand how tuning different CNN parameters will affect your output shape. The error message after adding another CNN layer would be a long traceback message, but you need to find the right part of it, that is:<\/p>\n<div>\n<pre>ValueError: Negative dimension size caused by subtracting 5 from 4 for \u2018{{node conv2d_16\/Conv2D}} = Conv2D[T=DT_FLOAT, data_format=\u201dNHWC\u201d, dilations=[1, 1, 1, 1], explicit_paddings=[], padding=\u201dVALID\u201d, strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](Placeholder, conv2d_16\/Conv2D\/ReadVariableOp)\u2019 with input shapes: [?,4,4,32], [5,5,32,32].\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>There are different solutions to it, such as changing the padding or stride size, changing the number of layers, and tuning the other hyperparameters while keeping the output shape of that layer in your mind. You can have a look at a detailed discussion on it on StackOverflow\u00a0<a href=\"https:\/\/stackoverflow.com\/questions\/41651628\/negative-dimension-size-caused-by-subtracting-3-from-1-for-conv2d\" target=\"_blank\" rel=\"noopener\">here<\/a>\u00a0and\u00a0<a href=\"https:\/\/stackoverflow.com\/questions\/45645276\/negative-dimension-size-caused-by-subtracting-3-from-1-for-conv2d-2-convolution\/45647715#45647715\" target=\"_blank\" rel=\"noopener\">here<\/a>, where several good solutions with reasons are offered.<\/p>\n<p>\u00a0<\/p>\n<h3>Example # 3: Wrong Output Shape<\/h3>\n<p>\u00a0<\/p>\n<p>This is also a common error that beginners face of having the wrong number of nodes in the last layer. In the last layer of any neural network, you need to have the number of nodes equal to the number of classes you have or the number of outputs you want. For example, in a regression task, you normally have 1 node in the output layer because you need a single continuous value as output. In the classification task, you have <em>n<\/em> number of nodes in the output layers, which is equal to the total number of unique classes.<\/p>\n<p>Let\u2019s say you have 10 unique classes in your example, and you specify 9 in your output layer as follows:<\/p>\n<div>\n<pre>model = Sequential([\r\n    Conv2D(32, 5, input_shape=(28,28,1), activation=\u2019relu\u2019),\r\n    MaxPool2D((2,2)),\r\n    Conv2D(32,3, activation=\u2019relu\u2019),\r\n    Flatten(),\r\n    Dense(9, activation=\u2019softmax\u2019)\r\n])\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Now when you train your model after compiling it, it will raise an error message.<\/p>\n<div>\n<pre>model.compile(\u2018adam\u2019,\u2019categorical_crossentropy\u2019, [\u2018acc\u2019])\r\n\r\nmodel.fit(X_train, y_train, epochs=3)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Error Message:<\/p>\n<div>\n<pre>ValueError: Shapes (32, 10) and (32, 9) are incompatible\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>And you can see that as the error message says, shapes are not compatible. You can have a look at\u00a0<a href=\"https:\/\/stackoverflow.com\/questions\/61742556\/valueerror-shapes-none-1-and-none-2-are-incompatible\" target=\"_blank\" rel=\"noopener\">this<\/a>\u00a0thread in StackOverflow for more details regarding this error.<\/p>\n<p>\u00a0<\/p>\n<h3>Example # 4: Unknown Loss function<\/h3>\n<p>\u00a0<\/p>\n<p>This error, as the name shows, generates when you have used a loss function that does not exist in Tensorflow.<\/p>\n<p>Let\u2019s say you compile a model<\/p>\n<div>\n<pre>model.compile('adam','sparse_categorical_crossentropy', ['acc'])\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Now you have used the\u00a0<em>sparse_categorical_crossentropy<\/em>\u00a0loss function here, which does not exist due to wrong spelling, and a lot of beginners make similar spelling mistakes. Now the tricky part is that you will not get the error message on the compilation. In fact, you will get the error message when you fit the model.<\/p>\n<div>\n<pre>model.fit(X_train, y_train, epochs=1)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Now you will receive the long error message, which you can trace and find useful information from it.<\/p>\n<div>\n<pre>ValueError Traceback (most recent call last)\r\n \u2014 -&gt; 11 model.fit(X_train, y_train, epochs=1)\r\n#Long Error traceback\r\nValueError: Unknown loss function: sparse_categoricalcrossentropy\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<h3>Example #5 Shape not compatible with appropriate function<\/h3>\n<p>\u00a0<\/p>\n<p>This is a very common error message, and it appears when you are using a function or a layer or anything similar, and it expects a specific type of shape, but the shape you are passing is different from the required shape. All of these will generate a related or similar error message that is\u00a0<em>ValueError: Shape mismatch<\/em>:<\/p>\n<p>Let\u2019s see an example.<\/p>\n<p>Suppose your output labels are in one-hot-matrix format.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/0*VMj6xUYWqvFn45NJ\" width=\"90%\"><\/p>\n<p><em>Credits: Medium.<\/em><\/p>\n<p>Now, these output labels require a special loss function for simple classification tasks that is\u00a0<em>categorical_crossentropy<\/em>, and if you pass in\u00a0<em>sparse_categorcial_crossentropy\u00a0<\/em>that is used for Label Encoded Output labels, this will generate a shape-mismatch error.<\/p>\n<div>\n<pre>model = Sequential([\r\n    Conv2D(32, 5, input_shape=(28,28,1), activation=\u2019relu\u2019),\r\n    MaxPool2D((2,2)),\r\n    Conv2D(32,3, activation=\u2019relu\u2019),\r\n    Flatten(),\r\n    Dense(10, activation=\u2019softmax\u2019)\r\n])\r\nmodel.compile(\u2018adam\u2019,\u2019sparse_categorical_crossentropy\u2019, [\u2018acc\u2019])\r\nmodel.fit(X_train, y_train, epochs=1)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Since our output\u00a0<em>y_train\u00a0<\/em>is in the one-hot matrix format, it will generate an error.<\/p>\n<div>\n<pre>ValueError: Shape mismatch: The shape of labels (received (320,)) should equal the shape of logits except for the last dimension (received (32, 10)).\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>This error means that the sparse categorical cross-entropy loss function expects a single vector, and we are passing a matrix.<\/p>\n<p>\u00a0<\/p>\n<h3>Example # 6 Wrong Loss function<\/h3>\n<p>\u00a0<\/p>\n<p>This is not an error, but rather a mistake where your model\u2019s performance would not be improving and giving very bad results. While there can be many different reasons for it, a common reason behind it is using the wrong loss function. For example, in classification tasks, you are supposed to use the cross-entropy or related loss function, and if you are using a loss function that is not suitable for classification tasks, then your model will not improve.<\/p>\n<div>\n<pre>model = Sequential([\r\n    Conv2D(32, 5, input_shape=(28,28,1), activation='relu'),\r\n    MaxPool2D((2,2)),\r\n    Conv2D(32,3, activation='relu'),\r\n    Flatten(),\r\n    Dense(10)\r\n])\r\nmodel.compile('rmsprop','mae', ['accuracy'])\r\nmodel.fit(X_train, y_train, epochs=3)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Here we are using the mean absolute error loss function instead of the cross-entropy loss function, and as a result of this, our model is not performing.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*69l5zfe6jMOP5QVU2W1F-g.png\" width=\"90%\"><\/p>\n<p>You can see the accuracy stuck at 10% and loss is also not improving. If you are stuck at a similar stage where your model is not improving, I suggest you take a look at every single step back where you have specified things and think upon what can be the possible problem due to which my model is not training correctly, because you are not going to see any error message in this case. Also, you can ask questions at any community, such as r\/learnmachinelearning on Reddit or StackOverflow.<\/p>\n<p>\u00a0<\/p>\n<p><b>Related:<\/b><\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2021\/06\/beginners-guide-debugging-tensorflow-models.html<\/p>\n","protected":false},"author":0,"featured_media":8339,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8338"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8338"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8338\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8339"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8338"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8338"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8338"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}