{"id":1087,"date":"2020-09-07T02:07:02","date_gmt":"2020-09-07T02:07:02","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/09\/07\/machine-learning-to-enhance-cost-effective-decision-making-by-housing-developers\/"},"modified":"2020-09-07T02:07:02","modified_gmt":"2020-09-07T02:07:02","slug":"machine-learning-to-enhance-cost-effective-decision-making-by-housing-developers","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/09\/07\/machine-learning-to-enhance-cost-effective-decision-making-by-housing-developers\/","title":{"rendered":"Machine Learning to Enhance Cost-Effective Decision Making by Housing Developers"},"content":{"rendered":"<div>\n<p>Ames is a city in Story County, Iowa, United States, located approximately 30 miles north of Des Moines in central Iowa. It is best known as the home of Iowa State University, with leading agriculture, design, engineering, and veterinary medicine colleges. This project aims to help\u00a0 house builders in Ames, Iowa\u00a0 identify house features that\u00a0<\/p>\n<h2>Data<\/h2>\n<p>The Ames housing data set consists of about\u00a02500\u00a0house sale records between\u00a02006\u22122010  and with 81 columns including sale price as a target variable. We first analyzed the missing values and applied the necessary imputation.<\/p>\n<figure class=\"wp-block-gallery columns-1 is-cropped\">\n<ul class=\"blocks-gallery-grid\">\n<li class=\"blocks-gallery-item\">\n<figure><a class=\"attachment-link\" href=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66668\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/impute.png-091355-wGTSVMIb-300x165.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/impute.png-091355-wGTSVMIb-600x330.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/impute.png-091355-wGTSVMIb-768x422.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/impute.png-091355-wGTSVMIb-1024x563.png 1024w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/impute.png-091355-wGTSVMIb-1536x845.png 1536w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/impute.png-091355-wGTSVMIb-2048x1126.png 2048w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/impute.png-091355-wGTSVMIb.png 2293w\" loading=\"lazy\" width=\"2293\" height=\"1261\" alt=\"\" data-id=\"66668\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/impute.png-091355-wGTSVMIb.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66668\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/impute.png-091355-wGTSVMIb.png\" data-sizes=\"(max-width: 2293px) 100vw, 2293px\" class=\"wp-image-66668 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"2293\" height=\"1261\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/impute.png-091355-wGTSVMIb.png\" alt=\"\" data-id=\"66668\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/impute.png-091355-wGTSVMIb.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66668\" class=\"wp-image-66668\"><\/a><\/figure>\n<\/li>\n<\/ul>\n<\/figure>\n<p>These are the methods we used to impute missing values for each column:<\/p>\n<ul>\n<li>\n<span>For <\/span><i><span>Pool Quality Rating<\/span><\/i><span>, <\/span><i><span>Miscellaneous<\/span><\/i><span>, <\/span><i><span>Alley<\/span><\/i><span>, <\/span><i><span>Fence<\/span><\/i><span>, <\/span><span><br \/><\/span><span>and <\/span><i><span>Fireplace<\/span><\/i> <i><span>Quality<\/span><\/i><span>:\u00a0 fill missing values with <\/span><b>None<\/b><span>.<\/span>\n<\/li>\n<li>\n<span>For <\/span><i><span>Electrical<\/span><\/i><span>, <\/span><i><span>Zoning Classification<\/span><\/i><span>, <\/span><i><span>Utilities<\/span><\/i><span>, <\/span><i><span>Home Functionality<\/span><\/i><span>, <\/span><i><span>Type of Sale<\/span><\/i><span>, <\/span><i><span>Kitchen Quality<\/span><\/i><span>, and both<\/span> <span>the <\/span><i><span>Exterior Covering<\/span><\/i><span> features:\u00a0 fill with the <\/span><b>mode<\/b><span>, the most frequent value.<\/span>\n<\/li>\n<li>\n<span>For <\/span><i><span>Lot<\/span><\/i> <i><span>Frontage<\/span><\/i><span> feature: fill with the <\/span><b>median<\/b><span> neighborhood value.\u00a0 <\/span><span><br \/><\/span>\n<\/li>\n<\/ul>\n<h2>Exploratory Data Analysis<\/h2>\n<p>We created a new column price per square foot which was derived from the sale price and a gross living area of the house. We used this variable to analyze the interaction with house features.<\/p>\n<h3>Overall and Kitchen Quality<\/h3>\n<p>How does the overall quality and kitchen quality influence price per square foot?<\/p>\n<figure class=\"wp-block-gallery columns-2 is-cropped\"><\/figure>\n<ul>\n<li>As the overall quality of the house increases,\u00a0 the price per square foot increases.<\/li>\n<li>As the kitchen quality increases, the price per square foot increases.<\/li>\n<\/ul>\n<h3>Bedroom and Bathroom Combinations<\/h3>\n<p>Which combinations of bedrooms and bathrooms have the highest price per square foot?<\/p>\n<figure class=\"wp-block-gallery columns-1 is-cropped\">\n<ul class=\"blocks-gallery-grid\">\n<li class=\"blocks-gallery-item\">\n<figure><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/bedbath.png-928150-IigBik6S-300x177.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/bedbath.png-928150-IigBik6S-600x355.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/bedbath.png-928150-IigBik6S-768x454.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/bedbath.png-928150-IigBik6S-1024x605.png 1024w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/bedbath.png-928150-IigBik6S-1536x908.png 1536w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/bedbath.png-928150-IigBik6S-2048x1210.png 2048w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/bedbath.png-928150-IigBik6S.png 2146w\" loading=\"lazy\" width=\"2146\" height=\"1268\" alt=\"\" data-id=\"66678\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/bedbath.png-928150-IigBik6S.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66678\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/bedbath.png-928150-IigBik6S.png\" data-sizes=\"(max-width: 2146px) 100vw, 2146px\" class=\"wp-image-66678 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"2146\" height=\"1268\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/bedbath.png-928150-IigBik6S.png\" alt=\"\" data-id=\"66678\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/bedbath.png-928150-IigBik6S.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66678\" class=\"wp-image-66678\"><\/figure>\n<\/li>\n<\/ul>\n<\/figure>\n<ul>\n<li>The price per square foot of a house was higher when the difference between bedrooms and bathrooms is 1 or less.<\/li>\n<\/ul>\n<p>\u00a0<\/p>\n<h3>Central AC and Fireplace<\/h3>\n<p>Does Central Air Conditioning or a Fireplace affect price per square foot?<\/p>\n<figure class=\"wp-block-gallery columns-1 is-cropped\">\n<ul class=\"blocks-gallery-grid\">\n<li class=\"blocks-gallery-item\">\n<figure><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/ac.png-874098-jqLJ8jip-300x163.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/ac.png-874098-jqLJ8jip-600x327.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/ac.png-874098-jqLJ8jip-768x418.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/ac.png-874098-jqLJ8jip-1024x557.png 1024w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/ac.png-874098-jqLJ8jip-1536x836.png 1536w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/ac.png-874098-jqLJ8jip-2048x1115.png 2048w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/ac.png-874098-jqLJ8jip.png 2260w\" loading=\"lazy\" width=\"2260\" height=\"1230\" alt=\"\" data-id=\"66681\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/ac.png-874098-jqLJ8jip.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66681\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/ac.png-874098-jqLJ8jip.png\" data-sizes=\"(max-width: 2260px) 100vw, 2260px\" class=\"wp-image-66681 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"2260\" height=\"1230\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/ac.png-874098-jqLJ8jip.png\" alt=\"\" data-id=\"66681\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/ac.png-874098-jqLJ8jip.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66681\" class=\"wp-image-66681\"><\/figure>\n<\/li>\n<\/ul>\n<\/figure>\n<ul>\n<li>Both features increase the price per square foot. But if you had to choose just one, Central Air tends to give a higher value, not surprising given Aimes average high temperature in the 80&#8217;s for the months of June, July, and August.<\/li>\n<\/ul>\n<h3>Age of the house when sold<\/h3>\n<p>How does the age of a house influence price per square foot?<\/p>\n<figure class=\"wp-block-gallery columns-1 is-cropped\">\n<ul class=\"blocks-gallery-grid\">\n<li class=\"blocks-gallery-item\">\n<figure><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/age.png-889273-GYImytlf-300x187.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/age.png-889273-GYImytlf-600x374.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/age.png-889273-GYImytlf-768x479.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/age.png-889273-GYImytlf-1024x639.png 1024w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/age.png-889273-GYImytlf-1536x958.png 1536w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/age.png-889273-GYImytlf.png 2020w\" loading=\"lazy\" width=\"2020\" height=\"1260\" alt=\"\" data-id=\"66684\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/age.png-889273-GYImytlf.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66684\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/age.png-889273-GYImytlf.png\" data-sizes=\"(max-width: 2020px) 100vw, 2020px\" class=\"wp-image-66684 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"2020\" height=\"1260\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/age.png-889273-GYImytlf.png\" alt=\"\" data-id=\"66684\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/age.png-889273-GYImytlf.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66684\" class=\"wp-image-66684\"><\/figure>\n<\/li>\n<\/ul>\n<\/figure>\n<ul>\n<li>As the age of the house increases, the price per square foot increases.<\/li>\n<\/ul>\n<h3>Garage Capacity and Driveway<\/h3>\n<p>Do garage capacity or a paved driveway result in a higher price per square foot?<\/p>\n<figure class=\"wp-block-gallery columns-1 is-cropped\">\n<ul class=\"blocks-gallery-grid\">\n<li class=\"blocks-gallery-item\">\n<figure><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/drive.png-648774-bFi22wla-300x162.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/drive.png-648774-bFi22wla-600x325.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/drive.png-648774-bFi22wla-768x416.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/drive.png-648774-bFi22wla-1024x554.png 1024w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/drive.png-648774-bFi22wla-1536x831.png 1536w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/drive.png-648774-bFi22wla-2048x1108.png 2048w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/drive.png-648774-bFi22wla.png 2227w\" loading=\"lazy\" width=\"2227\" height=\"1205\" alt=\"\" data-id=\"66687\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/drive.png-648774-bFi22wla.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66687\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/drive.png-648774-bFi22wla.png\" data-sizes=\"(max-width: 2227px) 100vw, 2227px\" class=\"wp-image-66687 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"2227\" height=\"1205\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/drive.png-648774-bFi22wla.png\" alt=\"\" data-id=\"66687\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/drive.png-648774-bFi22wla.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66687\" class=\"wp-image-66687\"><\/figure>\n<\/li>\n<\/ul>\n<\/figure>\n<ul>\n<li>A paved driveway with three car garage had a higher price per square foot.<\/li>\n<\/ul>\n<p>\u00a0<\/p>\n<p>We will use machine learning models to predict the sale price and select important house features. We will start by checking the response variable sale price.<\/p>\n<p>On the left, the graph shows the sale price which is right-skewed.  We use log transformation on the right, to make the sale price normally distributed. We will use the log-transformed sale price in fitting all machine learning models.<\/p>\n<figure class=\"wp-block-gallery columns-1 is-cropped\">\n<ul class=\"blocks-gallery-grid\">\n<li class=\"blocks-gallery-item\">\n<figure><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/log2.png-715192-LQllsW1e-300x123.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/log2.png-715192-LQllsW1e-600x246.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/log2.png-715192-LQllsW1e-768x315.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/log2.png-715192-LQllsW1e-1024x420.png 1024w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/log2.png-715192-LQllsW1e-1536x630.png 1536w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/log2.png-715192-LQllsW1e-2048x840.png 2048w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/log2.png-715192-LQllsW1e.png 2130w\" loading=\"lazy\" width=\"2130\" height=\"874\" alt=\"\" data-id=\"66720\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/log2.png-715192-LQllsW1e.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66720\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/log2.png-715192-LQllsW1e.png\" data-sizes=\"(max-width: 2130px) 100vw, 2130px\" class=\"wp-image-66720 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"2130\" height=\"874\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/log2.png-715192-LQllsW1e.png\" alt=\"\" data-id=\"66720\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/log2.png-715192-LQllsW1e.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66720\" class=\"wp-image-66720\"><\/figure>\n<\/li>\n<\/ul>\n<\/figure>\n<h3>Stepwise regression using BIC<\/h3>\n<ul>\n<li>In stepwise regression, we begin with a model that only has an intercept term, then sequentially adds the predictor that most improve the fit based on Bayesian Information Criterion(BIC).  The better-fitted models are identified by smaller BIC values.<\/li>\n<\/ul>\n<p>We first check the assumptions of Multiple Linear Regression such as normality, linearity, constant variance, independent errors, and multicollinearity.<\/p>\n<p>Based on the plots, the assumptions for multiple linear regression were met.<\/p>\n<figure class=\"wp-block-gallery columns-2 is-cropped\"><\/figure>\n<p>Thes adjusted R-squared of the model is\u00a0 <strong>0.9263<\/strong>. The following features were selected based on their importance.<\/p>\n<p><span>OverallQual , GrLivArea , Neighborhood , BsmtFinSF1\u00a0 , <\/span><span>OverallCond, ageWhenSold , TotalBsmtSF,\u00a0 BldgType , GarageCars , <\/span><span>Fireplaces , SaleCondition , CentralAir , LotArea , Condition2 ,\u00a0<\/span><span>KitchenQual, BsmtExposure , YearRemodAdd , ScreenPorch ,\u00a0 <\/span><span>MSZoning,\u00a0 Functional, BsmtFullBath , EnclosedPorch , HeatingQC , <\/span><span>PavedDrive , bb_diff , BsmtFinSF2<\/span><\/p>\n<h3>Penalized Linear Regression<\/h3>\n<p>The original MLR model is unbiased, but it often has a very high model<br \/>variance induced by multicollinearity of the features. To solve this problem, we will use Lasso and Ridge penalized linear models. With a suitably chosen lambda, the Lasso and Ridge will balance &#8216;bias-variance trade-off.  And to find the best suitable lambda, we will use the sklearn&#8217;s GridSearchCV.<\/p>\n<figure class=\"wp-block-gallery columns-1 is-cropped\">\n<ul class=\"blocks-gallery-grid\">\n<li class=\"blocks-gallery-item\">\n<figure><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/penal.png-398069-QSb5AuDQ-300x168.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/penal.png-398069-QSb5AuDQ-600x336.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/penal.png-398069-QSb5AuDQ-768x431.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/penal.png-398069-QSb5AuDQ-1024x574.png 1024w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/penal.png-398069-QSb5AuDQ-1536x861.png 1536w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/penal.png-398069-QSb5AuDQ-2048x1148.png 2048w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/penal.png-398069-QSb5AuDQ.png 2071w\" loading=\"lazy\" width=\"2071\" height=\"1161\" alt=\"\" data-id=\"66704\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/penal.png-398069-QSb5AuDQ.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66704\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/penal.png-398069-QSb5AuDQ.png\" data-sizes=\"(max-width: 2071px) 100vw, 2071px\" class=\"wp-image-66704 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"2071\" height=\"1161\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/penal.png-398069-QSb5AuDQ.png\" alt=\"\" data-id=\"66704\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/penal.png-398069-QSb5AuDQ.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66704\" class=\"wp-image-66704\"><\/figure>\n<\/li>\n<\/ul>\n<\/figure>\n<ul>\n<li>The mean cross-validated score for lasso is 0.905 and for ridge is 0.904. This is the best score of the model with training data sets<\/li>\n<li>The performance score for lasso is 0.926 and for ridge is 0.924. This is the performance of the model with the test data.<\/li>\n<li>The models underfit a little but it is not a big concern<\/li>\n<\/ul>\n<h3>Random Forest Regressor<\/h3>\n<p>Random forest is an enhancement of bagging that builds a large<br \/>collection of de-correlated trees and then averages them. It introduces randomness in the individual tree generation process &#8212; at each tree node, the loss function is minimized only among a randomly chosen subset of features. To tune the parameters, we will use GridSearchCV.<\/p>\n<figure class=\"wp-block-gallery columns-1 is-cropped\">\n<ul class=\"blocks-gallery-grid\">\n<li class=\"blocks-gallery-item\">\n<figure><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/random2.png-885941-wAruwz4k-300x135.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/random2.png-885941-wAruwz4k-600x270.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/random2.png-885941-wAruwz4k-768x346.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/random2.png-885941-wAruwz4k-1024x461.png 1024w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/random2.png-885941-wAruwz4k-1536x692.png 1536w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/random2.png-885941-wAruwz4k-2048x922.png 2048w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/random2.png-885941-wAruwz4k.png 2216w\" loading=\"lazy\" width=\"2216\" height=\"998\" alt=\"\" data-id=\"66717\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/random2.png-885941-wAruwz4k.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66717\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/random2.png-885941-wAruwz4k.png\" data-sizes=\"(max-width: 2216px) 100vw, 2216px\" class=\"wp-image-66717 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"2216\" height=\"998\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/random2.png-885941-wAruwz4k.png\" alt=\"\" data-id=\"66717\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/random2.png-885941-wAruwz4k.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66717\" class=\"wp-image-66717\"><\/figure>\n<\/li>\n<\/ul>\n<\/figure>\n<ul>\n<li>The mean cross-validation score is 0.892 and the performance score is 0.91. This is slightly lower than the linear models.<\/li>\n<li>With the random forest model, the most important feature is the overall quality of the house then the gross living area, and first-floor square footage.<\/li>\n<\/ul>\n<p>\u00a0<\/p>\n<h3>Gradient Boosting Regressor<\/h3>\n<p>Boosting is a very general sequential ensemble technique which<br \/>aggregates many weak learners to produce a strong learner. It differs from the parallel ensembling in that it produces a strong learner in a sequential way.\u00a0 Iteratively, the kth weak learner makes use of the previous k-1 weak learners\u2019 outcome to make its own educated guess.\u00a0 GridSearchCV will be used to tune the parameters.<\/p>\n<figure class=\"wp-block-gallery columns-1 is-cropped\">\n<ul class=\"blocks-gallery-grid\">\n<li class=\"blocks-gallery-item\">\n<figure><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/gradient2.png-837473-JaEEOduv-300x136.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/gradient2.png-837473-JaEEOduv-600x272.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/gradient2.png-837473-JaEEOduv-768x348.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/gradient2.png-837473-JaEEOduv-1024x465.png 1024w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/gradient2.png-837473-JaEEOduv-1536x697.png 1536w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/gradient2.png-837473-JaEEOduv-2048x929.png 2048w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/gradient2.png-837473-JaEEOduv.png 2228w\" loading=\"lazy\" width=\"2228\" height=\"1011\" alt=\"\" data-id=\"66718\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/gradient2.png-837473-JaEEOduv.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66718\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/gradient2.png-837473-JaEEOduv.png\" data-sizes=\"(max-width: 2228px) 100vw, 2228px\" class=\"wp-image-66718 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"2228\" height=\"1011\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/gradient2.png-837473-JaEEOduv.png\" alt=\"\" data-id=\"66718\" data-full-url=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/09\/randy-pantinople\/gradient2.png-837473-JaEEOduv.png\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=66718\" class=\"wp-image-66718\"><\/figure>\n<\/li>\n<\/ul>\n<\/figure>\n<ul>\n<li>The mean cross validation score is 0.910 and performance score is 0.929. The model&#8217;s performance score is the highest score among all models we tested.\u00a0<\/li>\n<li>Overall quality is the most important feature. It is followed by gross living area and first floor square footage.<\/li>\n<li>Most of the important features were similar to the random forest model.<\/li>\n<\/ul>\n<p>We believe that lasso regression is enough to model the data set. Though gradient boosting regressor has a higher performance but it is not enough to compensate for the cost to run the model.<\/p>\n<h2>Recommendations:<\/h2>\n<ul>\n<li>\n<strong>Quality house<\/strong><em><br \/><\/em>A point increase in overall quality rating <strong>increases<\/strong> sale price by <strong>0.06%.<\/strong>\n<\/li>\n<li>\n<strong>Excellent kitchen  condition<br \/><\/strong>Excellent kitchen quality sells <strong>0<\/strong>.<strong>063%<\/strong> <strong>higher<\/strong> compared to other homes.\u00a0<\/li>\n<li>\n<strong>Choose a central air over  a fireplace<br \/><\/strong>Central air gives a <strong>0.056%<\/strong> <strong>higher<\/strong> sale price compared with other houses.<\/li>\n<li>\n<strong>Pave the driveway<\/strong>                                                                                                 A home with a paved driveway sells for<strong> 0.048% higher<\/strong> than a home without one.<\/li>\n<li>\n<strong>Limit the difference between bedrooms and bathrooms to 1<\/strong><br \/>For every one room increase between the number of bedrooms and the number of bathrooms, the average sale price <strong>drops<\/strong> <strong>0.01%<\/strong>.<\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/nycdatascience.com\/blog\/student-works\/machine-learning-to-enhance-cost-effective-decision-making-by-housing-developers\/<\/p>\n","protected":false},"author":0,"featured_media":1088,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1087"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=1087"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1087\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/1088"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=1087"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=1087"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=1087"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}