{"id":4217,"date":"2020-10-15T03:48:22","date_gmt":"2020-10-15T03:48:22","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/10\/15\/difference-between-r-squared-and-adjusted-r-squared\/"},"modified":"2020-10-15T03:48:22","modified_gmt":"2020-10-15T03:48:22","slug":"difference-between-r-squared-and-adjusted-r-squared","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/10\/15\/difference-between-r-squared-and-adjusted-r-squared\/","title":{"rendered":"Difference Between R-Squared and Adjusted R-Squared"},"content":{"rendered":"<div id=\"tve_editor\" data-post-id=\"6475\">\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17529df8710\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/1-Difference-between-R-Squared-and-Adjusted-R-Squared.png?resize=626%2C315&amp;ssl=1\" class=\"tve_image wp-image-6479\" alt=\"Difference between R-Squared and Adjusted R-Squared\" data-id=\"6479\" width=\"626\" data-init-width=\"1024\" height=\"315\" data-init-height=\"516\" title=\"Difference between R-Squared and Adjusted R-Squared\" loading=\"lazy\" data-width=\"626\" data-height=\"315\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6479\" alt=\"Difference between R-Squared and Adjusted R-Squared\" data-id=\"6479\" width=\"626\" data-init-width=\"1024\" height=\"315\" data-init-height=\"516\" title=\"Difference between R-Squared and Adjusted R-Squared\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/1-Difference-between-R-Squared-and-Adjusted-R-Squared.png?resize=626%2C315&amp;ssl=1\" data-width=\"626\" data-height=\"315\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\" data-css=\"tve-u-17529df871a\">\n<p dir=\"ltr\">While building regression algorithms, the common question which comes to our mind is <strong>how to evaluate regression models<\/strong>. Even though we are having various statistics to quantify the regression models performance, the straight forward methods are R-Squared and Adjusted R-Squared.<\/p>\n<p dir=\"ltr\">People tend to use the R Squared method, but the catch is r-squared alone is not a good measure for evaluating the regression models. Where comes the hero \ud83d\ude42 adjusted r-square method.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_tw_qs tve_clearfix\" data-url=\"https:\/\/twitter.com\/intent\/tweet\" data-via=\"\" data-use_custom_url=\"\" data-css=\"tve-u-17529df875c\">\n<div class=\"thrv_tw_qs_container\">\n<div class=\"thrv_tw_quote\">\n<p class=\"\">Learn the key difference between r-squared and adjusted r-squared. #machinelearning #datascience #regression<\/p>\n<\/div>\n<p>\n\t\t\t<span><br \/>\n\t\t\t\t<i><\/i><br \/>\n\t\t\t\t<span class=\"thrv_tw_qs_button_text thrv-inline-text tve_editable\">Click to Tweet<\/span><br \/>\n\t\t\t<\/span>\n\t\t<\/p>\n<\/div>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\" data-css=\"tve-u-17529df875d\">\n<p dir=\"ltr\">Even in <a href=\"https:\/\/dataaspirant.com\/recommends\/ds-courses\/educative-machine-learning-interveiw\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">data science interviews<\/a> the frequent asked question is<\/p>\n<blockquote class=\"\"><p>Could you please explain the key difference between <strong>R-Squared and Adjusted R-Squared<\/strong> ?<\/p><\/blockquote>\n<p dir=\"ltr\">Do you the answer for that <img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/s.w.org\/images\/core\/emoji\/13.0.0\/svg\/263a.svg\" role=\"img\" class=\"emoji\" alt=\"\u263a\"><\/p>\n<p><img role=\"img\" class=\"emoji\" alt=\"\u263a\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/13.0.0\/svg\/263a.svg\"><br \/>\n\u00a0\u00a0<\/p>\n<p dir=\"ltr\">You are crystal clear about r squared but forgot about adjusted r-squared right. Don\u2019t worry, these concepts are a bit confusing. All we need is a regular refresh on the concepts, not regular but at least before you <a href=\"https:\/\/dataaspirant.com\/how-to-get-first-job-data-scientist\/\" target=\"_blank\" class=\"tve-froala fr-basic\" rel=\"noopener noreferrer\">start looking for a new data scientist job<\/a>.<\/p>\n<p dir=\"ltr\">This article is an ideal place for this.<\/p>\n<p dir=\"ltr\">We hope you are aware of the R Squared method. Still if you are not aware of the R Squared method, state tune till the end of this article. You will learn all the topics along with the key differences between R Squared and Adjusted R Square.<\/p>\n<p dir=\"ltr\">Before we drive further, let\u2019s see the table of contents for this article.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\" data-css=\"tve-u-17529df8760\">\n<p dir=\"ltr\">Let\u2019s start with understanding the key concepts in regression concepts, these concepts are not <a href=\"https:\/\/dataaspirant.com\/linear-regression\/\" target=\"_blank\" rel=\"noopener noreferrer\">about the regression algorithm<\/a>. These concepts are the basic blocks which help in understanding the key between R-squared and Adjusted R-Squared in a much deeper level.<\/p>\n<p dir=\"ltr\">Why wait, let\u2019s start!<\/p>\n<h2 id=\"t-1602725251932\" class=\"\">Basic regression concepts<\/h2>\n<p dir=\"ltr\">Unlike any <a href=\"https:\/\/dataaspirant.com\/classification-clustering-alogrithms\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">machine learning classification algorithms<\/a>, the regression models are having various evaluation methods. These evaluation methods are completely different from the <a href=\"https:\/\/dataaspirant.com\/six-popular-classification-evaluation-metrics-in-machine-learning\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">classification evaluation methods<\/a>.<\/p>\n<p dir=\"ltr\">For any regression model evaluation method aim is to show how the <strong>residuals are distributed<\/strong>. The way the residuals are used in various formulas changes from one evaluation method to another.<\/p>\n<p dir=\"ltr\">To understand about R-Squared and Adjusted R-Squared we need to know the below basic concepts. In a way we need to get the answers for the below questions.<\/p>\n<ul class=\"\">\n<li>What is Error\/Residuals?<\/li>\n<li>What residual Sum of squares?<\/li>\n<li>What is the total sum of squares?<\/li>\n<\/ul>\n<p dir=\"ltr\">Let\u2019s start the discussion with residuals.<\/p>\n<h3 id=\"t-1602725251933\" class=\"\">What is Error\/Residuals?<\/h3>\n<p dir=\"ltr\">Suppose we have two line equations. If someone asks, <\/p>\n<blockquote class=\"\"><p>Is the two equations are<strong> different or not<\/strong>?\u00a0<\/p><\/blockquote>\n<p dir=\"ltr\">How do we answer that? <\/p>\n<p dir=\"ltr\">The simplest way is to see how the two equations are different in a graphical way and see if each data point in the two lines are deviating or separated.<\/p>\n<p dir=\"ltr\">Now let\u2019s come to the regression model. When we <a href=\"https:\/\/dataaspirant.com\/simple-linear-regression-python-without-any-machine-learning-libraries\/\" target=\"_blank\" rel=\"noopener noreferrer\">build regression models<\/a>, we will have two line equations.<\/p>\n<ol class=\"\">\n<li>Line plotted using the actual data<\/li>\n<li>Line plotted using the forecasted data<\/li>\n<\/ol>\n<p dir=\"ltr\">To add more context to the discussion, let\u2019s say we are forecasting <strong>sales<\/strong> for a \u00a0product using the historical sales data. In our case one line is the actual sales graph and the other is the furcating sales graph. <\/p>\n<p dir=\"ltr\">The difference between the individual actual and the forecasted sales is the called as <strong>residuals<\/strong> or error.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17529e43516\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/2-Residual-Graph.png?resize=626%2C397&amp;ssl=1\" class=\"tve_image wp-image-6486\" alt=\"Residual Graph\" data-id=\"6486\" width=\"626\" data-init-width=\"1024\" height=\"397\" data-init-height=\"649\" title=\"Residual Graph\" loading=\"lazy\" data-width=\"626\" data-height=\"397\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6486\" alt=\"Residual Graph\" data-id=\"6486\" width=\"626\" data-init-width=\"1024\" height=\"397\" data-init-height=\"649\" title=\"Residual Graph\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/2-Residual-Graph.png?resize=626%2C397&amp;ssl=1\" data-width=\"626\" data-height=\"397\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">Residual Graph<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">If you see the above graph the line is the actual sale graph and the <strong>blue dots<\/strong> are the forecasted sales, the difference between the actual sales and the forecasted sales is the residuals at individual level. \u00a0In the image which is represented in <strong>dotted lines<\/strong>.\u00a0<\/p>\n<p dir=\"ltr\">The sum of all the residuals is called the <strong>total error<\/strong>.<\/p>\n<h3 id=\"t-1602725251934\" class=\"\">What is Residual Sum of Squares?<\/h3>\n<p dir=\"ltr\">To calculate the total error we are just performing the summation of all the residuals. If we square the individual residual and then perform the summation it\u2019s called the <strong>residual sum of squares<\/strong>.<\/p>\n<p dir=\"ltr\">This value helps us understand how close the forecasted sales line is with the actual sales line. In the regression world we say how accurate the fitted regression model is on the train dataset.<\/p>\n<p dir=\"ltr\">If you are not aware about <a href=\"https:\/\/dataaspirant.com\/classification-and-prediction\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">difference between classification algorithms<\/a> and regression algorithms, it&#8217;s worth to spend time to understand that first.<\/p>\n<p dir=\"ltr\">Why can\u2019t we simplify use the total error instead of the residual sum of squares? Right?<\/p>\n<p dir=\"ltr\">Using the Residual sum of squares has <strong>two<\/strong> main advantages.<\/p>\n<ol class=\"\">\n<li>Handel&#8217;s overestimation and underestimation.<\/li>\n<li>Helps in penalizing the high residuals.<\/li>\n<\/ol>\n<p dir=\"ltr\">If the above two advantages do not make any sense, Let us simplify these.<\/p>\n<h4 class=\"\">Handel\u2019s the overestimation and underestimation<\/h4>\n<p dir=\"ltr\">Suppose the actual sale value is 30 and the forecasted value is 9, as the residual formal says,\u00a0<\/p>\n<p dir=\"ltr\">The difference between the actual and forecasted value is residual.\u00a0<\/p>\n<p dir=\"ltr\">The residual value is 30 \u2013 18 = <strong>12<\/strong>,\u00a0<\/p>\n<p dir=\"ltr\">Suppose the actual sale value is 8 and the forecasted value is 20. In this case the residual value is 8 \u2013 20 = <strong>-12<\/strong>.<\/p>\n<p dir=\"ltr\">The first example is for <strong>under estimation<\/strong> and the second example is for <strong>over estimation<\/strong>. If we sum up these two residuals, the result will be 0.<\/p>\n<p dir=\"ltr\">Does it mean our <strong>actual and forecasted<\/strong> values are the same?<\/p>\n<p dir=\"ltr\"><strong>No right.<\/strong>\u00a0<\/p>\n<p dir=\"ltr\">To overcome this we use the squared sum of the residual rather than just the summation.<\/p>\n<p dir=\"ltr\">If we apply the squared sum for these two examples, the output results are completely different.<\/p>\n<p dir=\"ltr\">The first squared residual value is <strong>144 <\/strong>and the second squared residual value is 144. So the residual squared sum value is <strong>288. <\/strong><\/p>\n<h4 class=\"\">Helps in penalizing the high residuals<\/h4>\n<p dir=\"ltr\">Now let\u2019s understand the penalization part.<\/p>\n<p dir=\"ltr\">In the <a href=\"https:\/\/dataaspirant.com\/ensemble-methods-bagging-vs-boosting-difference\/\" target=\"_blank\" rel=\"noopener noreferrer\">bagging Vs boosting ensemble method<\/a> we explained how the weak learners penalized the misclassified sample with higher weightage than the correctly classified samples.<\/p>\n<p dir=\"ltr\">The smart way to perform this is, applying the square on the error term.\u00a0<\/p>\n<p dir=\"ltr\">Let\u2019s consider the below two actual and the forecasted sales.<\/p>\n<ul class=\"\">\n<li class=\"dir=\">\n<strong>Data point 01:<\/strong><\/p>\n<ul>\n<li>Actual value: 45<\/li>\n<li>Forecasted value: 45.6<\/li>\n<li>Error: -0.6<\/li>\n<li>Squared Error: 0.36<\/li>\n<\/ul>\n<\/li>\n<li class=\"dir=\">\n<strong>Data point 02:<\/strong><\/p>\n<ul>\n<li>Actual value: 45<\/li>\n<li>Forecasted value: 35<\/li>\n<li>Error: 10<\/li>\n<li>Squared Error: 100<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p dir=\"ltr\">If you see the above results. When the error is so minimal, squaring it makes the error much <strong>smaller<\/strong>. Whereas the error is of considerable value, Squaring the error <strong>magnifies<\/strong> it. \u00a0Makes bigger.<\/p>\n<p dir=\"ltr\">This ideal way to see where our regression model is failing. This helps in optimizing the errors for those magnified values.<\/p>\n<p dir=\"ltr\">In mathematical way the below is the formula for residual sum of squares.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Cleft(%7By%7D_%7Bi%7D-%5Cwidehat%7By%7D%5Cright)%5E%7B2%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/R9WBKQ4kDBoX9fLGikTrB5VkhJAxsL9BFnIIJ1zGCjMxsO8rIomQiGPfqWsiQJu0cvH6xFSrbM6tHTrfCfrXacgIv_k2zB3toS84BovM4BRK1-ZZANZFWqBgTVfNtmUbGnEOOygT\" loading=\"lazy\" width=\"84\" height=\"44\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/R9WBKQ4kDBoX9fLGikTrB5VkhJAxsL9BFnIIJ1zGCjMxsO8rIomQiGPfqWsiQJu0cvH6xFSrbM6tHTrfCfrXacgIv_k2zB3toS84BovM4BRK1-ZZANZFWqBgTVfNtmUbGnEOOygT\" width=\"84\" height=\"44\"><\/a><\/p>\n<p dir=\"ltr\"><strong>Where:<\/strong>\u00a0<\/p>\n<p dir=\"ltr\">In the upcoming section of this article we will be using the residual sum of squares function to calculate the RSS value with a dummy data.<\/p>\n<h3 id=\"t-1602725251935\" class=\"\">What is the Total Sum of Squares?<\/h3>\n<p dir=\"ltr\">Now let\u2019s have a look at the total sum of squares. In the earlier discussion we explained the residual sum of squares, this value says how close the prediction line or model is inline with the actual sales data points.<\/p>\n<p dir=\"ltr\">In other words residual sum of squares explains <strong>how the forecasted sales values are deviating from the actual sales values<\/strong>. This is more like statistics on the external values.\u00a0<\/p>\n<p dir=\"ltr\">How about statistics on the internal data points. In our case the actual sales data points. We can check how \u00a0the <strong>sales are deviating from the average sales<\/strong>. This concept is known as the total sum of squares.<\/p>\n<p dir=\"ltr\">In the residual sum of squares we are subtracting the actual sales value with the forecasted sales value. Whereas in the total sum of squares we subtract the actual sales value with the <strong>average sales<\/strong> or the mean sales value.\u00a0<\/p>\n<p dir=\"ltr\">The below is the function for the total sum of squares.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Cleft(y_%7Bi%7D-%5Cbar%7By%7D%5Cright)%5E%7B2%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh4.googleusercontent.com\/4G2qMmvypZnGFGh11AKKEhIBUIcT0bQ_6jcsKPtHvNTESg8dVJgQ19TKSQkScQNt17zxIHlCedxDkoJGLCgu6A_m_pPXv8eavlb5fW2R_OUioaJMF5DpN0vs8R9RbF6w33UM1_mW\" loading=\"lazy\" width=\"84\" height=\"44\"><img loading=\"lazy\" src=\"https:\/\/lh4.googleusercontent.com\/4G2qMmvypZnGFGh11AKKEhIBUIcT0bQ_6jcsKPtHvNTESg8dVJgQ19TKSQkScQNt17zxIHlCedxDkoJGLCgu6A_m_pPXv8eavlb5fW2R_OUioaJMF5DpN0vs8R9RbF6w33UM1_mW\" width=\"84\" height=\"44\"><\/a><\/p>\n<p dir=\"ltr\"><strong>Where:<\/strong><\/p>\n<p dir=\"ltr\">If we hold for a second and think about this, unlike the residual sum of square cases for each actual sales value we can\u2019t expect a value to subtract. As the mean for the actual values is the same for <strong>all the sales<\/strong> data points.<\/p>\n<p dir=\"ltr\">So, to calculate the total sum of squares all we need to do is, take the actual sales value subtract it with the average sales value. Take a square of that value and perform summation on all those values. This gives us the total sum of square values.<\/p>\n<p dir=\"ltr\">I hope the above explanation is clear, still if it is not clear we can have a look at the below sales data. We will be calculating the residual sum of square and total sum of square.<\/p>\n<h3 id=\"t-1602725251936\" class=\"\">Residual Sum of Square and Total Sum of Square Example<\/h3>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17529e72ea7\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/3-Calculating-Residual-Sum-of-Squares.png?resize=626%2C359&amp;ssl=1\" class=\"tve_image wp-image-6491\" alt=\"Calculating Residual Sum of Squares\" data-id=\"6491\" width=\"626\" data-init-width=\"1024\" height=\"359\" data-init-height=\"588\" title=\"Calculating Residual Sum of Squares\" loading=\"lazy\" data-width=\"626\" data-height=\"359\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6491\" alt=\"Calculating Residual Sum of Squares\" data-id=\"6491\" width=\"626\" data-init-width=\"1024\" height=\"359\" data-init-height=\"588\" title=\"Calculating Residual Sum of Squares\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/3-Calculating-Residual-Sum-of-Squares.png?resize=626%2C359&amp;ssl=1\" data-width=\"626\" data-height=\"359\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">Calculating Residual Sum of Squares<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">Now let\u2019s understand how we calculate the residual sum of square and total sum of square for this data.<\/p>\n<p dir=\"ltr\">In the above dataset, we are having the actual sales and forecasted sales values. Using these we calculated the residuals which is just the difference between the actual sales and forecasted sales. Then we are squaring each residual.<\/p>\n<p dir=\"ltr\">At the end we are just summing all the residual squares, this gives us the residual sum of square value.\u00a0<\/p>\n<p dir=\"ltr\">In the same way let\u2019s compute the total sum of squares.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17529e81f01\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/4-Calculating-Total-Sum-Of-Squares.png?resize=626%2C290&amp;ssl=1\" class=\"tve_image wp-image-6494\" alt=\"Calculating Total Sum Of Squares\" data-id=\"6494\" width=\"626\" data-init-width=\"1024\" height=\"290\" data-init-height=\"474\" title=\"Calculating Total Sum Of Squares\" loading=\"lazy\" data-width=\"626\" data-height=\"290\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6494\" alt=\"Calculating Total Sum Of Squares\" data-id=\"6494\" width=\"626\" data-init-width=\"1024\" height=\"290\" data-init-height=\"474\" title=\"Calculating Total Sum Of Squares\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/4-Calculating-Total-Sum-Of-Squares.png?resize=626%2C290&amp;ssl=1\" data-width=\"626\" data-height=\"290\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">Calculating Total Sum Of Squares<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">In the above dataset we are having the actual sales data points. Using the actual sales values we have computed the mean of sales, Which is just the average of all the sales. Then for each sale value we are taking the difference with the mean sales value. Next we are squaring the result. \u00a0<\/p>\n<p dir=\"ltr\">The sum of all these values is the total sum of squares.<\/p>\n<h2 id=\"t-1602725251937\" class=\"\">R-Squared Explanation<\/h2>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17529e91408\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/5-What-is-R-Squared-.png?resize=626%2C397&amp;ssl=1\" class=\"tve_image wp-image-6497\" alt=\"What is R-Squared ?\" data-id=\"6497\" width=\"626\" data-init-width=\"1024\" height=\"397\" data-init-height=\"649\" title=\"What is R-Squared ?\" loading=\"lazy\" data-width=\"626\" data-height=\"397\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6497\" alt=\"What is R-Squared ?\" data-id=\"6497\" width=\"626\" data-init-width=\"1024\" height=\"397\" data-init-height=\"649\" title=\"What is R-Squared ?\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/5-What-is-R-Squared-.png?resize=626%2C397&amp;ssl=1\" data-width=\"626\" data-height=\"397\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">What is R-Squared ?<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">By now we are ready to understand about R-Squared. We will consider both the residual sum of square and total sum of square calculated values to populate the R-Squared value.\u00a0<\/p>\n<p dir=\"ltr\">This will be much clear in the <strong>R-squared formula<\/strong> section.<\/p>\n<p dir=\"ltr\">The calculated R-Squared explains how the <a href=\"https:\/\/dataaspirant.com\/linear-regression-implementation-in-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">regression model fit for the actual data points<\/a>. In some the literature says the R-squared value ranges from 0 to 1. Some literature says the value ranges from 1 to 100. Whatever the range, the <strong>max value <\/strong>says the regression model fits so close to the actual values.<\/p>\n<p dir=\"ltr\">This R-squared is treated as a measure to explain how much the <strong>variance is explained by the model<\/strong>. For the ideal regression model the R-Squared value should be anywhere near to 1.<\/p>\n<p dir=\"ltr\">Now let\u2019s look at the R-Squared formula and see how it can calculate the value for any given actual and forecasted values.<\/p>\n<h3 id=\"t-1602725251938\" class=\"\">R-Squared Formula<\/h3>\n<p dir=\"ltr\">Below is the actual formula for calculating the R-Squared value.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Ctext%20R%5E%7B2%7D%3D%5Cfrac%7BTSS-RSS%7D%7BTSS%7D#0\" class=\"hasimg tve-froala\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/9kfTPOQa9qIKOblnXu1NsI5AbfF-tQ6TRVIcAsCMLw-X3RpDJ4dXIuNWo8PUJ37t1ke5khT5PmMofcU4hpCemgcgriB2x55vobFg7jUd__i5APik1Q8-hTfb0d-XW4DwEuYtH4ol\" loading=\"lazy\" width=\"128\" height=\"33\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/9kfTPOQa9qIKOblnXu1NsI5AbfF-tQ6TRVIcAsCMLw-X3RpDJ4dXIuNWo8PUJ37t1ke5khT5PmMofcU4hpCemgcgriB2x55vobFg7jUd__i5APik1Q8-hTfb0d-XW4DwEuYtH4ol\" width=\"128\" height=\"33\"><\/a><\/p>\n<p dir=\"ltr\">We can simplify the about formula further.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Ctext%20R%5E%7B2%7D%3D1-%5Cfrac%7BRSS%7D%7BTSS%7D#0\" class=\"hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/ND-krBspLLcRTr7_hBlWVO3M2zYaABuux68YmZrmgW__mvS5w6UCZYMyGELH6BwoNcGJdIsYlbkKIumLg8BZ7jQQqu2IJqB2XAy6R10JvBb6JYvqWltus0ACXZ9XPdWHYFM8xjGX\" loading=\"lazy\" width=\"103\" height=\"33\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/ND-krBspLLcRTr7_hBlWVO3M2zYaABuux68YmZrmgW__mvS5w6UCZYMyGELH6BwoNcGJdIsYlbkKIumLg8BZ7jQQqu2IJqB2XAy6R10JvBb6JYvqWltus0ACXZ9XPdWHYFM8xjGX\" width=\"103\" height=\"33\"><\/a><\/p>\n<p dir=\"ltr\">Where:<\/p>\n<ul class=\"\">\n<li>\n<strong>RSS:<\/strong> Residual Sum of Square<\/li>\n<li>\n<strong>TSS:<\/strong> Total Sum of Square<\/li>\n<\/ul>\n<p dir=\"ltr\">The above is the simplified version for calculating the R-squared value. It uses both the residual sum of square and total sum of square.<\/p>\n<p dir=\"ltr\">The formula is easy to remember. <\/p>\n<p dir=\"ltr\">All we are doing is <strong>fractions<\/strong> of RSS and TSS then we are removing the value from 1. For the ideal model the RSS value will be <strong>zero<\/strong>, so the R^2 value will be 1. Which mean to say a regression model is good, it should get a R-square value <strong>near one<\/strong>.<\/p>\n<h3 id=\"t-1602725251939\" class=\"\">Calculating R-Squared In Python<\/h3>\n<p dir=\"ltr\">We are going to use the below data for all the calculations for this article.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17529ea883e\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/7-sales-data-for-calculating-both-r-squared-and-adjusted-r-squared.png?resize=626%2C216&amp;ssl=1\" class=\"tve_image wp-image-6500\" alt=\"Sales data for calculating both r-squared and adjusted r-squared\" data-id=\"6500\" width=\"626\" data-init-width=\"1024\" height=\"216\" data-init-height=\"353\" title=\"Sales data for calculating both r-squared and adjusted r-squared\" loading=\"lazy\" data-width=\"626\" data-height=\"216\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6500\" alt=\"Sales data for calculating both r-squared and adjusted r-squared\" data-id=\"6500\" width=\"626\" data-init-width=\"1024\" height=\"216\" data-init-height=\"353\" title=\"Sales data for calculating both r-squared and adjusted r-squared\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/7-sales-data-for-calculating-both-r-squared-and-adjusted-r-squared.png?resize=626%2C216&amp;ssl=1\" data-width=\"626\" data-height=\"216\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">Sales data for calculating both r-squared and adjusted r-squared<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p>Let\u2019s see how we can calculate the R-squared value using the python.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">We created functions for calculating the residual sum of squares and total sum of squares. Then we are using those function to calcuate the R-squared value.<\/p>\n<p dir=\"ltr\">For cross cheaking the implementation, we check the results on the sales data we showed before. We are getting the same results. Residual sum of square is <strong>189<\/strong> and total sum of square is <strong>1704.4<\/strong><\/p>\n<p dir=\"ltr\">For this data, we are getting r-squared as <strong>0.89<\/strong><\/p>\n<h3 id=\"t-1602725251940\" class=\"\">Limitation of R-Squared<\/h3>\n<p dir=\"ltr\">If you clearly observe the R-Squared formula, it\u2019s lagging with the concepts of <strong>number of features<\/strong> used. As there is <strong>no component<\/strong> for changing the number of features used in the regression model. The R-squared value will be the same or higher if we include more number of features in the regression model.\u00a0<\/p>\n<p dir=\"ltr\">If you compare this with classification evaluation metrics, for all classificaiton models we can&#8217;t completely depend on <a href=\"https:\/\/dataaspirant.com\/confusion-matrix-sklearn-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">confusion matrixs<\/a> right, the same apply&#8217;s here too but we have key reason why should not consider the just the r-squared for \u00a0regression models.<\/p>\n<p dir=\"ltr\">In the above graph we show how the sales growth is impacted by the advertisement spent. In this case we are considering only the advertisement sent as a feature for forecasting the sales growth.\u00a0<\/p>\n<p dir=\"ltr\">However, if we include multiple features, such as price_reduction, sales_season \u2026 etc then the regression models R-squared value will be the same as the previous (only with advertisement spent) or higher. It\u2019s not sure if the newly added features are helping in forecasting the sales.<\/p>\n<p dir=\"ltr\">If the above explanation is not clear. Don&#8217;t worry while explaning the key difference between the r-squared and adjusted r-squared sections, we are going to learn this with <strong>s<\/strong><strong>ales growth case<\/strong> study.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17529edb7d8\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/6-R-Squared-Performance-While-Increasing-the-Features.png?resize=626%2C339&amp;ssl=1\" class=\"tve_image wp-image-6504\" alt=\"R-Squared Performance While Increasing the Features\" data-id=\"6504\" width=\"626\" data-init-width=\"1024\" height=\"339\" data-init-height=\"555\" title=\"R-Squared Performance While Increasing the Features\" loading=\"lazy\" data-width=\"626\" data-height=\"339\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6504\" alt=\"R-Squared Performance While Increasing the Features\" data-id=\"6504\" width=\"626\" data-init-width=\"1024\" height=\"339\" data-init-height=\"555\" title=\"R-Squared Performance While Increasing the Features\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/6-R-Squared-Performance-While-Increasing-the-Features.png?resize=626%2C339&amp;ssl=1\" data-width=\"626\" data-height=\"339\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">R-Squared Performance While Increasing Features<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">In the above image we are showing how the R-Squared values is behaving when we are increasing the features. Even though we are not sure about the extra added features impact in improving the model accuracy still the R-Square value will increase with increase in features.<\/p>\n<p dir=\"ltr\">The above result is just <strong>manually created<\/strong> one, to show how the r-squared value will change with increase in features. We haven&#8217;t build any <a href=\"https:\/\/dataaspirant.com\/random-forest-classifier-python-scikit-learn\/\" target=\"_blank\" rel=\"noopener noreferrer\">fancy machine learning model<\/a> yet.<\/p>\n<p dir=\"ltr\">This limitation can be overcome with the Adjusted R-Squared value.<\/p>\n<p dir=\"ltr\">The key thing to note here is, when you are having multiple features in the regression model it\u2019s always better to use Adjusted R-Squared value than just the R-Squared value.<\/p>\n<h2 id=\"t-1602725251941\" class=\"\">Adjusted R-Squared Explanation<\/h2>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17529eeb9e4\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/8-What-is-adjusted-r-squared.png?resize=626%2C397&amp;ssl=1\" class=\"tve_image wp-image-6507\" alt=\"What is adjusted r-squared\" data-id=\"6507\" width=\"626\" data-init-width=\"1024\" height=\"397\" data-init-height=\"649\" title=\"What is adjusted r-squared\" loading=\"lazy\" data-width=\"626\" data-height=\"397\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6507\" alt=\"What is adjusted r-squared\" data-id=\"6507\" width=\"626\" data-init-width=\"1024\" height=\"397\" data-init-height=\"649\" title=\"What is adjusted r-squared\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/8-What-is-adjusted-r-squared.png?resize=626%2C397&amp;ssl=1\" data-width=\"626\" data-height=\"397\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">What is adjusted r-squared<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">By now we are aware about the limitations of R-Squared, using the adjusted R-squared we can overcome this.<\/p>\n<p dir=\"ltr\">The adjusted R-Squared method will say whether adding the new feature will improve the performance of the model are not.<\/p>\n<h3 id=\"t-1602725251942\" class=\"\">Adjusted R-Squared Formal<\/h3>\n<p dir=\"ltr\"><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=%5Ctext%20%7BAdjusted%20%7D%20R%5E%7B2%7D%3D1-%5Cfrac%7B%5Cleft(1-R%5E%7B2%7D%5Cright)(N-1)%7D%7BN-p-1%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh3.googleusercontent.com\/yDM9AYoupBqM3N7gBG1mzQ2i_SbQy-uQmBL3WgxzhJ_MwVNzMWUuYp1HCGTdMeYqPohWqwnFi72fOmQi632nCWaz2ToZL2CuBy5kLWW-tJ0-Oe1PTrTyH2H3B5DNnoUL8exH1Bev\" loading=\"lazy\" width=\"256\" height=\"39\"><img loading=\"lazy\" src=\"https:\/\/lh3.googleusercontent.com\/yDM9AYoupBqM3N7gBG1mzQ2i_SbQy-uQmBL3WgxzhJ_MwVNzMWUuYp1HCGTdMeYqPohWqwnFi72fOmQi632nCWaz2ToZL2CuBy5kLWW-tJ0-Oe1PTrTyH2H3B5DNnoUL8exH1Bev\" width=\"256\" height=\"39\"><\/a><\/p>\n<p dir=\"ltr\"><strong>Where:<\/strong><\/p>\n<ul class=\"\">\n<li>\n<a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=R%5E%7B2%7D#0\" class=\"tve-froala hasimg\"><img src=\"https:\/\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif\" data-lazy-src=\"https:\/\/lh6.googleusercontent.com\/8lQe529DiFFlIkHjj7lbE5heoJxs9kdEj6UyaDKm7UyngT3rl_GGqNHbQq0DSH8Z1emE-_dvgGxULrsnrs0bxLA6hVZc9pR0TNS0qBm3cSvrnLcLp7asp4fDE6_2_hN_0SKKrwkg\" loading=\"lazy\" width=\"16\" height=\"13\"><img loading=\"lazy\" src=\"https:\/\/lh6.googleusercontent.com\/8lQe529DiFFlIkHjj7lbE5heoJxs9kdEj6UyaDKm7UyngT3rl_GGqNHbQq0DSH8Z1emE-_dvgGxULrsnrs0bxLA6hVZc9pR0TNS0qBm3cSvrnLcLp7asp4fDE6_2_hN_0SKKrwkg\" width=\"16\" height=\"13\"><\/a> is the R-squared value<\/li>\n<li>N is the number of samples<\/li>\n<li>p is the number of features used<\/li>\n<\/ul>\n<p dir=\"ltr\">If we consider the sales data, we are having <strong>3 features<\/strong> such as email campaign spend, google adword spend, season and we have 10 observations.\u00a0<\/p>\n<p dir=\"ltr\">For this sale data, <strong>p is 3<\/strong> if we use these 3 features for building a regression model. <strong>N value will be 10<\/strong> as we are having 10 observations.<\/p>\n<p dir=\"ltr\">In the next section, let\u2019s use this formula to calculate the adjusted R-squared value.<\/p>\n<h3 id=\"t-1602725251943\" class=\"\">Calculating Adjusted R-Squared in Python<\/h3>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">Here we are just using the previoues functions we created and passing the calculated r-squared value to adjusted r-squared function to calculate the adjusted r-squared value.<\/p>\n<h2 id=\"t-1602725251944\" class=\"\">Difference Between R-Squared and Adjusted R-Squared methods<\/h2>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17529f1a1b7\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/9-Difference-between-R-square-and-Adjusted-R-Squared.png?resize=626%2C0&amp;ssl=1\" class=\"tve_image wp-image-6512\" alt=\"Difference between R-square and Adjusted R-Squared\" data-id=\"6512\" width=\"626\" data-init-width=\"1024\" height=\"0\" data-init-height=\"611\" title=\"Difference between R-square and Adjusted R-Squared\" loading=\"lazy\" data-width=\"626\" data-height=\"0\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6512\" alt=\"Difference between R-square and Adjusted R-Squared\" data-id=\"6512\" width=\"626\" data-init-width=\"1024\" height=\"0\" data-init-height=\"611\" title=\"Difference between R-square and Adjusted R-Squared\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/9-Difference-between-R-square-and-Adjusted-R-Squared.png?resize=626%2C0&amp;ssl=1\" data-width=\"626\" data-height=\"0\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">Difference between R-square and Adjusted R-Squared<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">We have seen how r-squared and adjusted r-squared is calculated individually. But we know where r-squared will fail and where adjusted r-squared captures it. To understand that let\u2019s take the sales data.<\/p>\n<h3 id=\"t-1602725251945\" class=\"\">Advertisement VS Sales Growth Case Study<\/h3>\n<p dir=\"ltr\">To address the limitations of r-squared we are considering the below data. Which has the same sales data, where we remove the <strong>dummy_forecast_value<\/strong>. We will be using different combinations of features to <a href=\"https:\/\/dataaspirant.com\/linear-regression-implementation-in-python\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">build the regression models<\/a> and to see the behaviour over r-squared vs adjusted r-squared.<\/p>\n<p dir=\"ltr\">You can download the below dataset in our <a href=\"https:\/\/github.com\/saimadhu-polamuri\/DataAspirant_codes\/tree\/master\/r-squared-vs-adjusted-r-squared\" target=\"_blank\" rel=\"noopener noreferrer\">Github account<\/a><\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17529f26a9e\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/10-Sales-data-for-regression-model.png?resize=626%2C292&amp;ssl=1\" class=\"tve_image wp-image-6514\" alt=\"Sales data for regression model\" data-id=\"6514\" width=\"626\" data-init-width=\"1024\" height=\"292\" data-init-height=\"477\" title=\"Sales data for regression model\" loading=\"lazy\" data-width=\"626\" data-height=\"292\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6514\" alt=\"Sales data for regression model\" data-id=\"6514\" width=\"626\" data-init-width=\"1024\" height=\"292\" data-init-height=\"477\" title=\"Sales data for regression model\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/10-Sales-data-for-regression-model.png?resize=626%2C292&amp;ssl=1\" data-width=\"626\" data-height=\"292\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">Sales data for regression model<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">We are having <strong>3 features<\/strong>.<\/p>\n<ol class=\"\">\n<li>email campaign spend<\/li>\n<li>google adwords spend<\/li>\n<li>season<\/li>\n<\/ol>\n<p dir=\"ltr\">The target is <strong>sales<\/strong> values. We are going to build 3 models with the below features combinations.<\/p>\n<ul class=\"\">\n<li class=\"dir=\">\n<strong>Model 01: <\/strong><\/p>\n<ul>\n<li>Features: email campaign spend, google adwords spend\u00a0<\/li>\n<li>Target: sales<\/li>\n<\/ul>\n<\/li>\n<li class=\"dir=\">\n<strong>Model 02: <\/strong><\/p>\n<ul>\n<li>Features: google adwords spend, season\u00a0<\/li>\n<li>Target: sales<\/li>\n<\/ul>\n<\/li>\n<li class=\"dir=\">\n<strong>Model 03: <\/strong><\/p>\n<ul>\n<li>Features: email campaign, google adwords, season\u00a0<\/li>\n<li>Target: sales<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3 id=\"t-1602725251946\" class=\"\">Calculate R-Squared and Adjusted R-Squared In Python<\/h3>\n<p dir=\"ltr\">We are going to implement 3 functions: model1 , model2, model3. For each model we will compute the both the r-squared and adjusted r-squared value.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p>We have placed the 3 models results in tabular form for better understanding.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17529f47d8a\">\n<span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/11-R-Squared-Vs-Adjusted-R-Squared-Comparison.png?resize=626%2C328&amp;ssl=1\" class=\"tve_image wp-image-6518\" alt=\"R-Squared Vs Adjusted R-Squared Comparison\" data-id=\"6518\" width=\"626\" data-init-width=\"1024\" height=\"328\" data-init-height=\"537\" title=\"R-Squared Vs Adjusted R-Squared Comparison\" loading=\"lazy\" data-width=\"626\" data-height=\"328\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-6518\" alt=\"R-Squared Vs Adjusted R-Squared Comparison\" data-id=\"6518\" width=\"626\" data-init-width=\"1024\" height=\"328\" data-init-height=\"537\" title=\"R-Squared Vs Adjusted R-Squared Comparison\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/10\/11-R-Squared-Vs-Adjusted-R-Squared-Comparison.png?resize=626%2C328&amp;ssl=1\" data-width=\"626\" data-height=\"328\" data-recalc-dims=\"1\"><\/span><\/p>\n<p class=\"thrv-inline-text wp-caption-text\">R-Squared Vs Adjusted R-Squared Comparison<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">For the model 01 we are having a r-squared value of 03 and adjusted r-squared value of 0.1. Which means the model is <strong>not good enough<\/strong> for forecasting sales values.<\/p>\n<p dir=\"ltr\">As a next step we have taken a second feature set to build the regression model, even in the model 02 the results are not so promising. In fact the results are worse than the model 01 results.\u00a0<\/p>\n<p dir=\"ltr\">In the last iteration. We have taken all the features of model 01 and added the new feature from model 02.<\/p>\n<p dir=\"ltr\">We know that the model 02 is not performing well, so we should expect the low r-squared and adjusted r-squared. But the model 3 <strong>r-squared<\/strong> is more than the model 01 r-squared value.\u00a0<\/p>\n<p dir=\"ltr\">This is a limitation of r-squared, if we see the adjusted r-squared value which is much lower than the model 01 adjusted r-squared value. Which is more reasonable. The other thing to note, r-squared value will range in between 0 to 1 whereas adjusted r-squared can be less than 0 and negative.<\/p>\n<h4 class=\"\">Story in short:\u00a0<\/h4>\n<p dir=\"ltr\">Always <strong>consider<\/strong> the adjusted r-squared as the evaluation metrics unless we build a model with single feature. In this case both r-squared and adjusted r-squared will be the same.<\/p>\n<h3 id=\"t-1602725251947\" class=\"\">Which method should we use?<\/h3>\n<p dir=\"ltr\">By now you know the answer for this question, Which method should we use. If you don\u2019t please read the article again. Just kidding. We should always consider the adjusted r-squared method as the evaluation metrics for the regression kind of problems.<\/p>\n<h2 id=\"t-1602725251948\" class=\"\">Additional Internal Resources<\/h2>\n<p dir=\"ltr\">Below we listed the must read related articles, if you have time please go through these.\u00a0<\/p>\n<h2 id=\"t-1602725251949\" class=\"\">Conclusion<\/h2>\n<p dir=\"ltr\">In this article we learned about residual sum of square and total sum square calculations. We used these calculations to calculate the r-squared and adjusted r-squared values. Below are the key points to keep in mind.<\/p>\n<ul class=\"\">\n<li>Always consider the <strong>adjusted r-squared value as the evaluation<\/strong> metrics for the regression problem over r-squared method.<\/li>\n<li>The r-squared value ranges from <strong>0 to 1,<\/strong> whereas adjusted r-squared value can be <strong>negative<\/strong> too.<\/li>\n<\/ul>\n<p>You can get the complete code of this article in <a href=\"https:\/\/github.com\/saimadhu-polamuri\/DataAspirant_codes\/tree\/master\/r-squared-vs-adjusted-r-squared\" target=\"_blank\" rel=\"noopener noreferrer\">dataaspirant Gitub account<\/a>. Feel free to frok.<\/p>\n<\/div>\n<h4 class=\"\">Recommended Machine Learning Courses<\/h4>\n<div class=\"thrv_wrapper thrv-page-section thrv-lp-block\" data-inherit-lp-settings=\"1\" data-css=\"tve-u-17529df85e9\" data-keep-css_id=\"1\">\n<div class=\"tve-page-section-in tve_empty_dropzone  \" data-css=\"tve-u-17481b960b8\">\n<div class=\"thrv_wrapper thrv-columns dynamic-group-kbt3q0q7\" data-css=\"tve-u-17481b95e2b\">\n<div class=\"tcb-flex-row v-2 tcb--cols--3 tcb-medium-no-wrap tcb-mobile-wrap m-edit\" data-css=\"tve-u-17529df85ea\">\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-17529df8601\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-17529df8604\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/deeplearning-course.jpg?resize=176%2C176&amp;ssl=1\" class=\"tve_image wp-image-5170\" alt=\"Deep Learning python\" data-id=\"5170\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"deeplearning-course\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17529df8605\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-5170\" alt=\"Deep Learning python\" data-id=\"5170\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"deeplearning-course\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/deeplearning-course.jpg?resize=176%2C176&amp;ssl=1\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17529df8605\" data-recalc-dims=\"1\"><br \/>\n<span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-17529df85ec\">Machine Learning A to Z Course<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-17529df8602\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-17529df8611\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/machine_learning_interview_prep.png?resize=176%2C176&amp;ssl=1\" class=\"tve_image wp-image-4799\" alt=\"machine learning interview prep\" data-id=\"4799\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"machine_learning_interview_prep\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17529df8612\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-4799\" alt=\"machine learning interview prep\" data-id=\"4799\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"machine_learning_interview_prep\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/machine_learning_interview_prep.png?resize=176%2C176&amp;ssl=1\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17529df8612\" data-recalc-dims=\"1\"><br \/>\n<span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-17529df85f4\">Python Data Science Specialization Course<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-17529df8603\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-17529df8613\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/educative-machine-learning.png?resize=176%2C176&amp;ssl=1\" class=\"tve_image wp-image-4693\" alt=\"educative-machine-learning\" data-id=\"4693\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"educative-machine-learning\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17529df8614\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-4693\" alt=\"educative-machine-learning\" data-id=\"4693\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"educative-machine-learning\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/educative-machine-learning.png?resize=176%2C176&amp;ssl=1\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17529df8614\" data-recalc-dims=\"1\"><br \/>\n<span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-17529df85fb\">Complete Supervised Learning Algorithms<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/dataaspirant.com\/difference-between-r-squared-and-adjusted-r-squared\/<\/p>\n","protected":false},"author":0,"featured_media":4218,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/4217"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=4217"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/4217\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/4218"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=4217"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=4217"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=4217"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}