{"id":8009,"date":"2020-12-29T00:26:21","date_gmt":"2020-12-29T00:26:21","guid":{"rendered":"https:\/\/healinglifespan.com\/data-science\/2020\/12\/29\/popular-feature-selection-methods-in-machine-learning\/"},"modified":"2020-12-29T00:26:21","modified_gmt":"2020-12-29T00:26:21","slug":"popular-feature-selection-methods-in-machine-learning","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/12\/29\/popular-feature-selection-methods-in-machine-learning\/","title":{"rendered":"Popular Feature Selection Methods in Machine Learning"},"content":{"rendered":"<div id=\"tve_editor\" data-post-id=\"8185\">\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a39f2791\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/1-Feature-Selection-Method.png?resize=626%2C0&amp;ssl=1\" class=\"tve_image wp-image-8187\" alt=\"Feature Selection Method\" data-id=\"8187\" width=\"626\" data-init-width=\"750\" height=\"0\" data-init-height=\"450\" title=\"Feature Selection Method\" loading=\"lazy\" data-width=\"626\" data-height=\"0\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8187\" alt=\"Feature Selection Method\" data-id=\"8187\" width=\"626\" data-init-width=\"750\" height=\"0\" data-init-height=\"450\" title=\"Feature Selection Method\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/1-Feature-Selection-Method.png?resize=626%2C0&amp;ssl=1\" data-width=\"626\" data-height=\"0\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\" data-css=\"tve-u-176a39f2799\">\n<p dir=\"ltr\">Feature selection is the key influence factor for building accurate <a href=\"https:\/\/dataaspirant.com\/category\/machine-learning-2\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\"><strong>machine learning models<\/strong><\/a>. Let\u2019s say for any given dataset the machine learning model learns the <strong>mapping between<\/strong> the input features and the target variable.\u00a0<\/p>\n<p dir=\"ltr\">So, for a new dataset, where the target is unknown, the model can accurately predict the target variable.\u00a0<\/p>\n<p dir=\"ltr\">In machine learning, many factors affect the <a href=\"https:\/\/dataaspirant.com\/six-popular-classification-evaluation-metrics-in-machine-learning\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\"><strong>performance of a model<\/strong><\/a>, and they include:\u00a0<\/p>\n<p dir=\"ltr\">Occasionally in a dataset, the set of features in their raw form do not provide the optimal information to train and to perform the prediction.<\/p>\n<p dir=\"ltr\">Therefore, it is beneficial to discard the conflicting and unnecessary features from our dataset by the process known as <strong>feature selection methods<\/strong> or feature selection techniques.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_tw_qs tve_clearfix\" data-url=\"https:\/\/twitter.com\/intent\/tweet\" data-via=\"\" data-use_custom_url=\"\" data-css=\"tve-u-176a39f27d6\">\n<div class=\"thrv_tw_qs_container\">\n<div class=\"thrv_tw_quote\">\n<p>Learn the popular feature selection methods to build the accurate models. #machinelearing #datascience #python #featureselection\u00a0<\/p>\n<\/p><\/div>\n<p>\n\t\t\t<span><br \/>\n\t\t\t\t<i><\/i><br \/>\n\t\t\t\t<span class=\"thrv_tw_qs_button_text thrv-inline-text tve_editable\">Click to Tweet<\/span><br \/>\n\t\t\t<\/span>\n\t\t<\/p>\n<\/p><\/div>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\" data-css=\"tve-u-176a39f27d8\">\n<p dir=\"ltr\">In machine learning, we define a feature as:<\/p>\n<blockquote class=\"\"><p><strong>An individual measurable property or a characteristic feature of a phenomenon under observation.<\/strong>\u00a0<\/p><\/blockquote>\n<p dir=\"ltr\">Each feature or column represents a measurable piece of data, which helps for analysis. Examples of feature variables are\u00a0<\/p>\n<ul class=\"\">\n<li>Name,<\/li>\n<li>Age,\u00a0<\/li>\n<li>Gender,\u00a0<\/li>\n<li>Education qualification,\u00a0<\/li>\n<li>Salary etc.<\/li>\n<\/ul>\n<p dir=\"ltr\">If you observe the above features for a machine learning model, <strong>names<\/strong> won\u2019t add any significant information.\u00a0<\/p>\n<p dir=\"ltr\">We are having various techniques to <a href=\"https:\/\/dataaspirant.com\/word-embedding-techniques-nlp\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>convert the text data to numerical<\/strong><\/a>. But in this case the <strong>name feature<\/strong> is not helpful.<\/p>\n<p dir=\"ltr\">Manually we can remove these, but sometimes the nonsignificant features are not necessary for text data. It could be <strong>numerical features<\/strong> too.<\/p>\n<blockquote class=\"\"><p>How do we remove those features before going to the modeling phase?<\/p><\/blockquote>\n<p dir=\"ltr\">Here comes the technique <strong>feature section<\/strong> method, which helps identify the key features to build the model.<\/p>\n<p dir=\"ltr\">Now, we define \u00a0the feature selection process as under:<\/p>\n<blockquote class=\"\"><p>\u201cThe method of reducing the number of input variables during the development of a predictive model.\u201d<\/p><\/blockquote>\n<p dir=\"ltr\"><strong>OR<\/strong><\/p>\n<blockquote class=\"\"><p>\u201cFeature selection is a process of automatic selection of a subset of relevant features or variables from a set of all features, used in the process of model building.\u201d<\/p><\/blockquote>\n<p dir=\"ltr\">Other names of feature selection are variable selection or attribute selection.<\/p>\n<p dir=\"ltr\">It is possible to select those characteristic variables or features in our data that are most useful for building accurate models.<\/p>\n<blockquote class=\"\"><p>So how can we filter out the best features out of all the available features? <\/p><\/blockquote>\n<p dir=\"ltr\">To achieve that, we have various feature selection methods.\u00a0<\/p>\n<p dir=\"ltr\">So In this article, we will explore those feature selection methods that we can use to identify the best features for our machine learning model.<\/p>\n<p dir=\"ltr\">After reading this article, you will get to know about the following:<\/p>\n<ul class=\"\">\n<li>Two main types of feature selection techniques are supervised and unsupervised, and the supervised methods are further classified into the wrapper, filter, and intrinsic methods.<\/li>\n<li>Filter-based feature selection methods use statistical techniques to score the dependence or correlation between input variables, which are further filtered to choose the most relevant features.<\/li>\n<li>Statistical measures must be carefully chosen for feature selection on the basis of the data type of the input variable and the response (output) variable.<\/li>\n<\/ul>\n<p dir=\"ltr\">Before we start learning, Let\u2019s look at the topics you will learn in this article. Only if you read the complete article \ud83d\ude42<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\" data-css=\"tve-u-176a39f27da\">\n<h2 id=\"t-1609062922951\" class=\"\">Why is Feature Selection Important?<\/h2>\n<p dir=\"ltr\">Feature Selection is one of the key concepts in machine learning, which highly impacts the <a href=\"https:\/\/dataaspirant.com\/six-popular-classification-evaluation-metrics-in-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>model\u2019s performance<\/strong><\/a>.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3a66b23\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/2-Why-Feature-Selection-Is-Important.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-8197\" alt=\"Why Feature Selection Is Important\" data-id=\"8197\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Why Feature Selection Is Important\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8197\" alt=\"Why Feature Selection Is Important\" data-id=\"8197\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Why Feature Selection Is Important\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/2-Why-Feature-Selection-Is-Important.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">Irrelevant and misleading data features can <strong>negatively impact<\/strong> the performance of our machine learning model. That is why feature selection and data cleaning should be the first step of our model designing.\u00a0<\/p>\n<p dir=\"ltr\">These feature selection methods reduce the number of <strong>input variables\/features<\/strong> to those that are considered to be useful in the prediction of the target.\u00a0<\/p>\n<p dir=\"ltr\">So, the primary focus of feature selection is to:<\/p>\n<blockquote class=\"\"><p><strong>Remove<\/strong> non-informative or redundant predictors from our machine learning model.\u201d<\/p><\/blockquote>\n<p dir=\"ltr\">Some predictive modeling problems contain a large number of variables that require a large amount of system memory, and therefore, retard the development and training of the models.\u00a0<\/p>\n<p dir=\"ltr\">The importance of feature selection in building a machine learning model is:<\/p>\n<ul class=\"\">\n<li>It <strong>improves the accuracy<\/strong> with which the model is \u00a0accurately able to predict the target variable of the unseen dataset.<\/li>\n<li>It <strong>reduces<\/strong> the computational cost of the model.<\/li>\n<li>It improves the <strong>understandability<\/strong> of the model by removing the unnecessary features so that it becomes more interpretable.<\/li>\n<\/ul>\n<h2 id=\"t-1609062922952\" class=\"\">Benefits of Feature Selection<\/h2>\n<p dir=\"ltr\">Having irrelevant features in your data can <strong>decrease the accuracy<\/strong> of many models, especially <a href=\"https:\/\/dataaspirant.com\/linear-regression\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>linear algorithms<\/strong><\/a> like linear and <a href=\"https:\/\/dataaspirant.com\/how-logistic-regression-model-works\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>logistic regression<\/strong><\/a>.<\/p>\n<p dir=\"ltr\">The benefits of performing feature selection before modeling the model are as under:<\/p>\n<ul class=\"\">\n<li><strong>Reduction in Model Overfitting:<\/strong> Less redundant data implies less opportunity to make noise based decisions.<\/li>\n<li><strong>Improvement in Accuracy:<\/strong> Less misleading and misguiding data implies improvement in modeling accuracy.<\/li>\n<li><strong>Reduction in Training Time: <\/strong>Fewer data implies that algorithms train at a faster rate.<\/li>\n<\/ul>\n<h3 id=\"t-1609062922953\" class=\"\">Difference Between Supervised and Unsupervised methods<\/h3>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3a918ff\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/3-Supervised-Learning-Example.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-8201\" alt=\"Supervised Learning Example\" data-id=\"8201\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Supervised Learning Example\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8201\" alt=\"Supervised Learning Example\" data-id=\"8201\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Supervised Learning Example\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/3-Supervised-Learning-Example.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">We can think of the feature selection methods in terms of <a href=\"https:\/\/dataaspirant.com\/supervised-and-unsupervised-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>supervised and unsupervised<\/strong><\/a> methods.<\/p>\n<p dir=\"ltr\">The methods that attempt to discover the relationship between the input variables also called independent variables and the target variable, are termed as the supervised methods.\u00a0<\/p>\n<p dir=\"ltr\">They intend to identify the relevant features for achieving the high accurate model while relying on the labeled data availability.\u00a0<\/p>\n<p dir=\"ltr\">Examples of supervised learning algorithms are:<\/p>\n<p dir=\"ltr\">The methods that do not require any <strong>labeled<\/strong> sensor data to predict the relationship between the input and the output variables are termed as <a href=\"https:\/\/dataaspirant.com\/supervised-and-unsupervised-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>unsupervised methods<\/strong><\/a>.\u00a0<\/p>\n<p dir=\"ltr\">They find interesting activity patterns in <strong>unlabelled data<\/strong> and score all data dimensions based on various criteria such as variance, entropy, and ability to preserve local similarity, etc.\u00a0<\/p>\n<p dir=\"ltr\">For example,<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/dataaspirant.com\/k-means-clustering-algorithm\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Clustering<\/strong><\/a> includes customer segmentation and understands different customer groups around which the marketing and business strategies are built.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3aafd56\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/4-Unsupervised-Learning-Example.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-8204\" alt=\"Unsupervised Learning Example\" data-id=\"8204\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Unsupervised Learning Example\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8204\" alt=\"Unsupervised Learning Example\" data-id=\"8204\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Unsupervised Learning Example\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/4-Unsupervised-Learning-Example.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">Unsupervised feature learning methods <strong>don\u2019t<\/strong> consider the target variable, such as the methods that remove the redundant variables using correlation.\u00a0<\/p>\n<p dir=\"ltr\">On the contrary, the supervised feature selection techniques make use of the target variable, such as the methods which remove the irrelevant and misleading variables.<\/p>\n<h2 id=\"t-1609062922954\" class=\"\">Supervised Feature Selection Methods<\/h2>\n<p dir=\"ltr\">Supervised feature selection methods are further classified into three categories.\u00a0<\/p>\n<ol class=\"\">\n<li>Wrapper method,\u00a0<\/li>\n<li>Filter method,\u00a0<\/li>\n<li>Intrinsic method<\/li>\n<\/ol>\n<h3 id=\"t-1609062922955\" class=\"\">Wrapper Feature Selection Methods<\/h3>\n<p dir=\"ltr\">The wrapper methods create several models which are having different subsets of input feature variables. Later the selected features which result in the best performing model in accordance with the performance metric.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3ad7caa\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/5-Wrapper-Feature-Selection-Method.png?resize=626%2C257&amp;ssl=1\" class=\"tve_image wp-image-8206\" alt=\"Wrapper Feature Selection Method\" data-id=\"8206\" width=\"626\" data-init-width=\"1024\" height=\"257\" data-init-height=\"421\" title=\"Wrapper Feature Selection Method\" loading=\"lazy\" data-width=\"626\" data-height=\"257\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8206\" alt=\"Wrapper Feature Selection Method\" data-id=\"8206\" width=\"626\" data-init-width=\"1024\" height=\"257\" data-init-height=\"421\" title=\"Wrapper Feature Selection Method\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/5-Wrapper-Feature-Selection-Method.png?resize=626%2C257&amp;ssl=1\" data-width=\"626\" data-height=\"257\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">The wrapper methods are unconcerned with the variable types, though they can be computationally expensive.<\/p>\n<p dir=\"ltr\">A well-known example of a wrapper feature selection method is <strong>Recursive Feature Elimination<\/strong> (RFE).\u00a0<\/p>\n<p dir=\"ltr\">RFE performs the evaluation of multiple models using procedures that add or remove predictor variables to find the optimal combination that maximizes the model\u2019s performance.\u00a0\u00a0<\/p>\n<h3 id=\"t-1609062922956\" class=\"\">Filter Feature Selection Methods<\/h3>\n<p dir=\"ltr\">The filter feature selection methods make use of statistical techniques to predict the relationship between each independent input variable and the output (target) variable. Which assigns <strong>scores<\/strong> for each feature.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3b00617\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/6-Filter-Feature-Selection-Method.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-8208\" alt=\"Filter Feature Selection Method\" data-id=\"8208\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Filter Feature Selection Method\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8208\" alt=\"Filter Feature Selection Method\" data-id=\"8208\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Filter Feature Selection Method\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/6-Filter-Feature-Selection-Method.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">Later the scores are used to <strong>filter out <\/strong>those input variables\/features that we will use in our feature selection model.<\/p>\n<p dir=\"ltr\">The filter methods evaluate the significance of the feature variables only based on their inherent characteristics without the incorporation of any learning algorithm.\u00a0<\/p>\n<p dir=\"ltr\">These methods are computationally inexpensive and faster than the wrapper methods. <\/p>\n<p dir=\"ltr\">The filter methods may provide worse results than wrapper methods if the data is insufficient to model the statistical correlation between the feature variables.<\/p>\n<p dir=\"ltr\">Unlike wrapper methods, the filter methods are not <a href=\"https:\/\/dataaspirant.com\/handle-overfitting-deep-learning-models\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\"><strong>subjected to overfitting<\/strong><\/a>. They are used extensively on high dimensional data.\u00a0<\/p>\n<p dir=\"ltr\">However, the wrapper methods have prohibitive computational cost on such data.<\/p>\n<h3 id=\"t-1609062922957\" class=\"\">Embedded or Intrinsic Feature Selection Methods<\/h3>\n<p dir=\"ltr\">The machine learning models that have feature selection naturally incorporated as part of learning the model are termed as embedded or intrinsic feature selection methods.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3b2a03b\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/7-Intrinsic-Feature-Selection-Method.png?resize=626%2C311&amp;ssl=1\" class=\"tve_image wp-image-8209\" alt=\"Intrinsic Feature Selection Method\" data-id=\"8209\" width=\"626\" data-init-width=\"1024\" height=\"311\" data-init-height=\"509\" title=\"Intrinsic Feature Selection Method\" loading=\"lazy\" data-width=\"626\" data-height=\"311\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8209\" alt=\"Intrinsic Feature Selection Method\" data-id=\"8209\" width=\"626\" data-init-width=\"1024\" height=\"311\" data-init-height=\"509\" title=\"Intrinsic Feature Selection Method\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/7-Intrinsic-Feature-Selection-Method.png?resize=626%2C311&amp;ssl=1\" data-width=\"626\" data-height=\"311\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">Built-in feature selection is incorporated in some of the models, which means that the model includes the predictors that help in maximizing accuracy.\u00a0<\/p>\n<p dir=\"ltr\">In this scenario, the machine learning model chooses the best representation of the data.<\/p>\n<p dir=\"ltr\">The examples of the algorithms making use of embedded methods are penalized regression models such as <\/p>\n<p dir=\"ltr\">Some of these machine learning models are naturally resistant to non-informative predictors. <\/p>\n<p dir=\"ltr\">The rule-based models like Lasso and decision trees intrinsically conduct feature selection.<\/p>\n<p dir=\"ltr\">Feature selection is related to dimensionality reduction, but both are different from each other. Both methods seek to reduce the number of variables or features in the dataset, but still, there is a subtle difference between them.\u00a0<\/p>\n<p dir=\"ltr\">Let\u2019s learn the difference in details.<\/p>\n<ul class=\"\">\n<li class=\"class=\"><strong>Feature selection<\/strong> simply selects and excludes given characteristic features without excluding them. It includes and excludes the characteristic attributes in the data without changing them.<\/li>\n<li class=\"class=\"><strong>Dimensionality reduction<\/strong> transforms the features into a lower dimension. It reduces the number of attributes by creating new combinations of attributes.<\/li>\n<\/ul>\n<p dir=\"ltr\">The examples of dimensionality reduction methods are\u00a0<\/p>\n<ul class=\"\">\n<li>Principal Component Analysis,<\/li>\n<li>Singular Value Decomposition.<\/li>\n<\/ul>\n<h2 id=\"t-1609062922958\" class=\"\">Feature Selection with Statistical Measures<\/h2>\n<p dir=\"ltr\">We can use correlation type statistical measures between input and output variables, which can then be used as the basis for filter feature selection.\u00a0<\/p>\n<p dir=\"ltr\">The choice of statistical measures highly depends upon the variable data types.<\/p>\n<p dir=\"ltr\">Common variable data types include:<\/p>\n<ul class=\"\">\n<li>Numerical such as height<\/li>\n<li>Categorical such as a label\u00a0<\/li>\n<\/ul>\n<p dir=\"ltr\">Both of the variable data types are subdivided into many categories, which are as under:<\/p>\n<p dir=\"ltr\">Numerical variables are divided into the following:<\/p>\n<ul class=\"\">\n<li>Integer Variables<\/li>\n<li>Float Variables<\/li>\n<\/ul>\n<p dir=\"ltr\">On the other hand, categorical variables are divided into the following:<\/p>\n<ul class=\"\">\n<li>Boolean Variables<\/li>\n<li>Nominal Variables<\/li>\n<li>Ordinal Variables<\/li>\n<\/ul>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3bb0362\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/8-Categorical-Feature-Categories.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-8212\" alt=\"Categorical Feature Categories\" data-id=\"8212\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Categorical Feature Categories\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8212\" alt=\"Categorical Feature Categories\" data-id=\"8212\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Categorical Feature Categories\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/8-Categorical-Feature-Categories.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">We will be considering the <strong>categories of variables<\/strong>, i-e, numerical, and categorical, along with input and output.<\/p>\n<p dir=\"ltr\">The variables that are provided as input to the model are termed as input variables. In feature selection, the input variables are those which we wish to reduce in size.<\/p>\n<p dir=\"ltr\">On the contrary, output variables are those on the basis of which the model is predicted.\u00a0 They are also termed as response variables.<\/p>\n<p dir=\"ltr\">Response variables generally indicate the type of predictive modeling problem being performed. For example:<\/p>\n<ol class=\"\">\n<li>The numerical output variable depicts a regression predictive modeling problem.<\/li>\n<li>The categorical output variable depicts a classification predictive modeling problem.<\/li>\n<\/ol>\n<h3 id=\"t-1609062922959\" class=\"\">Univariate Feature Selection<\/h3>\n<p dir=\"ltr\">In feature-based filter selection, the statistical measures are calculated considering only a single input variable at a time with a target (output) variable.\u00a0<\/p>\n<p dir=\"ltr\">These statistical measures are termed as univariate statistical measures, which means that the interaction between input variables is not considered in the filtering process.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3bda56c\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/9-Univariate-Feature-Selection.png?resize=626%2C469&amp;ssl=1\" class=\"tve_image wp-image-8214\" alt=\"Univariate Feature Selection\" data-id=\"8214\" width=\"626\" data-init-width=\"640\" height=\"469\" data-init-height=\"480\" title=\"Univariate Feature Selection\" loading=\"lazy\" data-width=\"626\" data-height=\"469\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8214\" alt=\"Univariate Feature Selection\" data-id=\"8214\" width=\"626\" data-init-width=\"640\" height=\"469\" data-init-height=\"480\" title=\"Univariate Feature Selection\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/9-Univariate-Feature-Selection.png?resize=626%2C469&amp;ssl=1\" data-width=\"626\" data-height=\"469\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">Univariate feature selection selects the best features on the basis of univariate statistical tests. \u00a0We compare each feature to the target variable in order to determine the significant statistical relationship between them.\u00a0<\/p>\n<p dir=\"ltr\">Univariate feature selection is also called <strong>analysis of variance<\/strong> ( ANOVA). The majority of the techniques are univariate means that they perform the predictor evaluation in isolation.\u00a0<\/p>\n<p dir=\"ltr\">The existence of the correlated predictors increases the possibility of selecting significant but redundant predictors. Consequently, a large number of predictors are chosen, which results in the rise of collinearity problems.\u00a0<\/p>\n<p dir=\"ltr\">In univariate feature selection methods, we examine each feature individually to determine the features\u2019 relationship with the response variable.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3bfe156\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/10-Univariate-Feature-Selection-Types.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-8216\" alt=\"Univariate Feature Selection Types\" data-id=\"8216\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Univariate Feature Selection Types\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8216\" alt=\"Univariate Feature Selection Types\" data-id=\"8216\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Univariate Feature Selection Types\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/10-Univariate-Feature-Selection-Types.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">The following methods use various techniques to evaluate the input-output relation.<\/p>\n<ul class=\"\">\n<li>Numerical Input &amp; Numerical Output<\/li>\n<li>Numerical Input &amp; Categorical Output<\/li>\n<li>Categorical Input &amp; Numerical Output<\/li>\n<li>Categorical Input &amp; Categorical Output<\/li>\n<\/ul>\n<p dir=\"ltr\">Let&#8217;s discuss each of these in detail.<\/p>\n<h4 class=\"\">Numerical Input &amp; Numerical Output<\/h4>\n<p dir=\"ltr\" id=\"t-1609062922960\">It is a type of regression predictive modeling problem having numerical input variables.<\/p>\n<p dir=\"ltr\">Common techniques include using a correlation coefficient, such as:<\/p>\n<ul class=\"\">\n<li>\u00a0Pearson\u2019s for a linear correlation<\/li>\n<li>Rank-based methods for a nonlinear correlation.<\/li>\n<\/ul>\n<h4 class=\"\">Numerical Input &amp; Categorical Output<\/h4>\n<p dir=\"ltr\">It is considered to be a classification predictive modeling problem having numerical input variables. It is the most common example of a classification problem.\u00a0<\/p>\n<p dir=\"ltr\">Again here, the common techniques are correlation-based though we took the categorical target into account.<\/p>\n<p dir=\"ltr\">The techniques are as under:<\/p>\n<ul class=\"\">\n<li>Univariate feature selection or analysis of variables (<strong>ANOVA<\/strong>) \u00a0for a linear correlation<\/li>\n<li><strong>Kendall\u2019s rank coefficient<\/strong> for a nonlinear correlation assuming that the categorical variable is ordinal.<\/li>\n<\/ul>\n<h4 class=\"\">Categorical Input &amp; Numerical Output<\/h4>\n<p dir=\"ltr\">It is considered as a strange example of a regression predictive modeling problem having categorical input variables.\u00a0<\/p>\n<p dir=\"ltr\">We can use the same \u201cNumerical Input, Categorical Output\u201d methods as discussed above but in <strong>reverse<\/strong>.<\/p>\n<h4 class=\"\">Categorical Input &amp; Categorical Output<\/h4>\n<p dir=\"ltr\">It is considered as a classification predictive modeling problem having categorical input variables.<\/p>\n<p dir=\"ltr\">The following techniques are used in this predictive modeling problem.<\/p>\n<ul class=\"\">\n<li>Chi-Squared test\u00a0<\/li>\n<li>Mutual Information<\/li>\n<\/ul>\n<p dir=\"ltr\">The chi-squared test is the most common correlation measure for categorical data. It tests if there exists a significant difference between the observed and the expected frequencies of two categorical variables.\u00a0<\/p>\n<p dir=\"ltr\">Therefore, based on the Null hypothesis, there exists no association between both variables.\u00a0<\/p>\n<p dir=\"ltr\">For applying the chi-squared test to determine the relationship between various features in the dataset and the target variable, the following conditions must be met:<\/p>\n<ul class=\"\">\n<li>The variables under consideration must be categorical.<\/li>\n<li>The variables must be sampled independently.<\/li>\n<li>The values must have an expected frequency greater than 5.<\/li>\n<\/ul>\n<p dir=\"ltr\">Just to summarize the above concepts, \u00a0we are providing you with an image that explains everything.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3c1c65c\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/11-Univariate-Feature-Selection-Methods.png?resize=626%2C517&amp;ssl=1\" class=\"tve_image wp-image-8218\" alt=\"Univariate Feature Selection Methods\" data-id=\"8218\" width=\"626\" data-init-width=\"1024\" height=\"517\" data-init-height=\"845\" title=\"Univariate Feature Selection Methods\" loading=\"lazy\" data-width=\"626\" data-height=\"517\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8218\" alt=\"Univariate Feature Selection Methods\" data-id=\"8218\" width=\"626\" data-init-width=\"1024\" height=\"517\" data-init-height=\"845\" title=\"Univariate Feature Selection Methods\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/11-Univariate-Feature-Selection-Methods.png?resize=626%2C517&amp;ssl=1\" data-width=\"626\" data-height=\"517\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<h2 id=\"t-1609062922961\" class=\"\">Feature Selection Strategies<\/h2>\n<p dir=\"ltr\">While building a <a href=\"https:\/\/dataaspirant.com\/support-vector-machine-classifier-implementation-r-caret-package\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>machine learning model<\/strong><\/a> in real-life, it is uncommon that all variables in the dataset are useful for the perfect model building.\u00a0<\/p>\n<p dir=\"ltr\">The overall accuracy and the generalization capability of the model are reduced by the addition of redundant variables. Furthermore, the complexity of the model is also increased by adding more and more variables.<\/p>\n<p dir=\"ltr\">In this section, some additional considerations using filter-based feature selection are mentioned, which are:<\/p>\n<ul class=\"\">\n<li>Selection Method<\/li>\n<li>Transform Variables<\/li>\n<\/ul>\n<h3 id=\"t-1609062922962\" class=\"\">Selection Method<\/h3>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3c3a4a9\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/12-Selection-Method-Workflow.png?resize=479%2C551&amp;ssl=1\" class=\"tve_image wp-image-8221\" alt=\"Selection Method Workflow\" data-id=\"8221\" width=\"479\" data-init-width=\"891\" height=\"551\" data-init-height=\"1024\" title=\"Selection Method Workflow\" loading=\"lazy\" data-width=\"479\" data-height=\"551\" data-css=\"tve-u-176a4404b90\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8221\" alt=\"Selection Method Workflow\" data-id=\"8221\" width=\"479\" data-init-width=\"891\" height=\"551\" data-init-height=\"1024\" title=\"Selection Method Workflow\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/12-Selection-Method-Workflow.png?resize=479%2C551&amp;ssl=1\" data-width=\"479\" data-height=\"551\" data-css=\"tve-u-176a4404b90\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">The scikit-learn library provides a wide variety of filtering methods after the statistics are calculated for each input (independent) variable with the target (dependent) variable.<\/p>\n<p dir=\"ltr\">The most commonly used methods are:<\/p>\n<ul class=\"\">\n<li>Selection of the top k variables i-e; SelectKBest is the sklearn feature selection method used here.<\/li>\n<li>Selection of the top percentile variables i-e; SelectPercentile is the sklearn feature selection method used for this purpose.<\/li>\n<\/ul>\n<h3 id=\"t-1609062922963\" class=\"\">Transform Variables<\/h3>\n<p dir=\"ltr\">Variables can be transformed into one another in order to access different statistical measures.<\/p>\n<p dir=\"ltr\">For example, we can transform a categorical variable into an ordinal variable. Also, we can transform a numerical value into a discrete one, etc., and see the interesting results coming out.<\/p>\n<p dir=\"ltr\">So, we can transform the data to meet the test requirements so that we can try and compare the results.<\/p>\n<h2 id=\"t-1609062922964\" class=\"\">Which Feature Selection Method is the Best?<\/h2>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3c5946e\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/13-Feature-Selection-Method-Workflow.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-8223\" alt=\"Feature Selection Method Workflow\" data-id=\"8223\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Feature Selection Method Workflow\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8223\" alt=\"Feature Selection Method Workflow\" data-id=\"8223\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Feature Selection Method Workflow\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/13-Feature-Selection-Method-Workflow.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">None of the feature selection methods can be regarded as the best method. \u00a0Even speaking on a universal scale, there is no best machine learning algorithm or the best set of input variables.\u00a0<\/p>\n<p dir=\"ltr\">Instead, we need to discover which feature selection will work best for our specific problem using careful, systematic experimentation.\u00a0<\/p>\n<p dir=\"ltr\">So, we try a range of models on different subsets of features chosen using various statistical measures and then discover what works best for our concerned problem.<\/p>\n<h2 id=\"t-1609062922965\" class=\"\">Feature Selection Implementations<\/h2>\n<p dir=\"ltr\">The following section depicts the worked examples of feature selection cases for a regression problem and a classification problem.<\/p>\n<h3 id=\"t-1609062922966\" class=\"\">Feature Selection For Regression models<\/h3>\n<p dir=\"ltr\">The following code depicts the feature selection for the <a href=\"https:\/\/dataaspirant.com\/linear-regression-implementation-in-python\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>regression problem<\/strong><\/a> as numerical inputs and numerical outputs.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv-columns\">\n<div class=\"tcb-flex-row v-2 tcb--cols--1\">\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col\">\n<div class=\"thrv_wrapper thrv_text_element\">\n<p>You can download the dataset from this <a href=\"https:\/\/www.kaggle.com\/iabhishekofficial\/mobile-price-classification\">kaggle dataset<\/a>. Please download the training dataset. The following output is generated on running the above code:<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3d3448c\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/14-Regression-Feature-Selection-Scores.png?resize=404%2C404&amp;ssl=1\" class=\"tve_image wp-image-8227\" alt=\"Regression Feature Selection Scores\" data-id=\"8227\" width=\"404\" data-init-width=\"602\" height=\"404\" data-init-height=\"602\" title=\"Regression Feature Selection Scores\" loading=\"lazy\" data-width=\"404\" data-height=\"404\" data-css=\"tve-u-176a44258e9\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8227\" alt=\"Regression Feature Selection Scores\" data-id=\"8227\" width=\"404\" data-init-width=\"602\" height=\"404\" data-init-height=\"602\" title=\"Regression Feature Selection Scores\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/14-Regression-Feature-Selection-Scores.png?resize=404%2C404&amp;ssl=1\" data-width=\"404\" data-height=\"404\" data-css=\"tve-u-176a44258e9\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">We used the <strong>chi-squared<\/strong> statistical test for non-negative integers, and by using the SelectKBest class, we selected the top 10 features for our model from Mobile Price Range Prediction Dataset.<\/p>\n<p dir=\"ltr\">When we run the above example, <\/p>\n<ul class=\"\">\n<li>A regression dataset is created<\/li>\n<li>feature selection is defined<\/li>\n<li>Feature selection applied to the regression dataset<\/li>\n<li>We get a subset of selected input features<\/li>\n<\/ul>\n<h3 id=\"t-1609062922967\" class=\"\">Classification Feature Selection<\/h3>\n<p dir=\"ltr\">The following code depicts the feature selection for the classification problem as numerical inputs and categorical outputs.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p>The output of the above code is as:<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176a3d8e311\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/15-Classification-Feature-Selection-Scores.png?resize=609%2C351&amp;ssl=1\" class=\"tve_image wp-image-8231\" alt=\"Classification Feature Selection Scores\" data-id=\"8231\" width=\"609\" data-init-width=\"1014\" height=\"351\" data-init-height=\"584\" title=\"Classification Feature Selection Scores\" loading=\"lazy\" data-width=\"609\" data-height=\"351\" data-css=\"tve-u-176a3daf8a3\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-8231\" alt=\"Classification Feature Selection Scores\" data-id=\"8231\" width=\"609\" data-init-width=\"1014\" height=\"351\" data-init-height=\"584\" title=\"Classification Feature Selection Scores\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/15-Classification-Feature-Selection-Scores.png?resize=609%2C351&amp;ssl=1\" data-width=\"609\" data-height=\"351\" data-css=\"tve-u-176a3daf8a3\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">We got the feature importance of each of our features using the feature importance property of the model. The feature importance depicts the importance of each feature by giving its score.\u00a0<\/p>\n<p dir=\"ltr\">The higher the score of any feature, the more significant and relevant it is towards our response variable.<\/p>\n<p dir=\"ltr\">When we run the above example,\u00a0<\/p>\n<ul class=\"\">\n<li>A classification dataset is created.<\/li>\n<li>Feature selection is defined.<\/li>\n<li>Feature selection is applied to the regression dataset.<\/li>\n<li>We get a subset of selected input features.<\/li>\n<\/ul>\n<h2 id=\"t-1609062922968\" class=\"\">What Next?<\/h2>\n<p dir=\"ltr\">Don\u2019t limit yourself with the above two example code. Try to play with other feature selection methods we explained.\u00a0<\/p>\n<p dir=\"ltr\">Just to cross-check, build any <a href=\"https:\/\/dataaspirant.com\/category\/machine-learning-2\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\"><strong>machine learning model<\/strong><\/a> without applying any feature selection methods, then pick any feature selection method and try to check the accuracy.<\/p>\n<p dir=\"ltr\">For <a href=\"https:\/\/dataaspirant.com\/classification-clustering-alogrithms\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>c<\/strong><strong>lassification problems<\/strong><\/a>, you can leverage the famous <a href=\"https:\/\/dataaspirant.com\/six-popular-classification-evaluation-metrics-in-machine-learning\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\"><strong>classification evaluation metrics<\/strong><\/a>. For simple cases, you can measure the performance of the model with a <a href=\"https:\/\/dataaspirant.com\/confusion-matrix-sklearn-python\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\"><strong>confusion matrix<\/strong><\/a>.\u00a0<\/p>\n<p dir=\"ltr\">For regression kind of problem, you can check the <a href=\"https:\/\/dataaspirant.com\/difference-between-r-squared-and-adjusted-r-squared\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>R-squared and Adjusted R-square<\/strong><\/a>d measures.<\/p>\n<h2 id=\"t-1609062922969\" class=\"\">Conclusion<\/h2>\n<p dir=\"ltr\">In this article, we explain the importance of feature selection methods while building machine learning models. <\/p>\n<p dir=\"ltr\">So far, we have learned how to choose statistical measures for filter-based feature selection with numerical and categorical data.<\/p>\n<p dir=\"ltr\">Apart from this, we got an idea of the following:<\/p>\n<ul class=\"\">\n<li>The types of feature selection techniques are supervised and unsupervised. The supervised methods are further classified into the <strong>filter, wrapper, and intrinsic methods<\/strong>.<\/li>\n<li>Statistical measures are used by filter-based feature selection to score the correlation or dependence between input variables and the output or response variable.<\/li>\n<li>Statistical measures for feature selection must be carefully chosen on the basis of the data type of the input variable and the output variable.<\/li>\n<\/ul>\n<\/div>\n<h4 class=\"\">Recommended Machine Learning Courses<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"thrv_wrapper thrv-page-section thrv-lp-block\" data-inherit-lp-settings=\"1\" data-css=\"tve-u-176a39f2694\" data-keep-css_id=\"1\">\n<div class=\"tve-page-section-in tve_empty_dropzone  \" data-css=\"tve-u-17481b960b8\">\n<div class=\"thrv_wrapper thrv-columns dynamic-group-kbt3q0q7\" data-css=\"tve-u-17481b95e2b\">\n<div class=\"tcb-flex-row v-2 tcb--cols--3 tcb-medium-no-wrap tcb-mobile-wrap m-edit\" data-css=\"tve-u-176a39f2695\">\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-176a39f26ac\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-176a39f26af\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/supervised-learning.png?resize=176%2C176&amp;ssl=1\" class=\"tve_image wp-image-4696\" alt=\"supervised learning\" data-id=\"4696\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"supervised learning\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-176a39f26b0\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-4696\" alt=\"supervised learning\" data-id=\"4696\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"supervised learning\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/supervised-learning.png?resize=176%2C176&amp;ssl=1\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-176a39f26b0\" data-recalc-dims=\"1\"><span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-176a39f2697\">Complete Supervised Learning Algorithms<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-176a39f26ad\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-176a39f26bb\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/deeplearning-course.jpg?resize=176%2C176&amp;ssl=1\" class=\"tve_image wp-image-5170\" alt=\"Deep Learning python\" data-id=\"5170\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"deeplearning-course\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-176a39f26bc\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-5170\" alt=\"Deep Learning python\" data-id=\"5170\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"deeplearning-course\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/deeplearning-course.jpg?resize=176%2C176&amp;ssl=1\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-176a39f26bc\" data-recalc-dims=\"1\"><span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-176a39f269f\">Python Data Science Specialization Course<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-176a39f26ae\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-176a39f26bd\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/datascience-spelization.jpg?resize=176%2C176&amp;ssl=1\" class=\"tve_image wp-image-4307\" alt=\"Data Science Full Specialization\" data-id=\"4307\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"datascience spelization\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-176a39f26bf\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-4307\" alt=\"Data Science Full Specialization\" data-id=\"4307\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"datascience spelization\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/datascience-spelization.jpg?resize=176%2C176&amp;ssl=1\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-176a39f26bf\" data-recalc-dims=\"1\"><span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-176a39f26a6\">A to Z Machine Learning with Python<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/dataaspirant.com\/feature-selection-methods-machine-learning\/<\/p>\n","protected":false},"author":0,"featured_media":8010,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8009"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8009"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8009\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8010"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8009"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8009"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8009"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}