{"id":8015,"date":"2020-12-29T00:26:25","date_gmt":"2020-12-29T00:26:25","guid":{"rendered":"https:\/\/healinglifespan.com\/data-science\/2020\/12\/29\/how-k-means-clustering-algorithm-works\/"},"modified":"2020-12-29T00:26:25","modified_gmt":"2020-12-29T00:26:25","slug":"how-k-means-clustering-algorithm-works","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/12\/29\/how-k-means-clustering-algorithm-works\/","title":{"rendered":"How K-Means Clustering Algorithm Works"},"content":{"rendered":"<div id=\"tve_editor\" data-post-id=\"7742\">\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-1764884152a\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/1-K-means-Clustering.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-7747\" alt=\"K-means Clustering\" data-id=\"7747\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"K-means Clustering\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7747\" alt=\"K-means Clustering\" data-id=\"7747\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"K-means Clustering\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/1-K-means-Clustering.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\" data-css=\"tve-u-17648841533\">\n<p dir=\"ltr\">\u00a0 In today\u2019s world, where machine learning models implementation is so easy to find anywhere over the internet. It becomes paramount for all machine learning enthusiasts to get their hands dirty on topics related to it.\u00a0<\/p>\n<p dir=\"ltr\">There are many fascinating topics of <a href=\"https:\/\/dataaspirant.com\/supervised-and-unsupervised-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>supervised and unsupervised learning<\/strong><\/a> or even reinforcement learning come up. But my favorite is the k-means clustering algorithm.\u00a0<\/p>\n<p dir=\"ltr\">As the name suggests, it is a <strong>clustering algorithm<\/strong>.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_tw_qs tve_clearfix\" data-url=\"https:\/\/twitter.com\/intent\/tweet\" data-via=\"\" data-use_custom_url=\"\" data-css=\"tve-u-17648841571\">\n<div class=\"thrv_tw_qs_container\">\n<div class=\"thrv_tw_quote\">\n<p>Learn the popular clustering algorithm k-means clustering along with the implementation in python. #datascience #unsupervisedlearning #machinelearning #kmeansclustering #python<\/p>\n<\/p><\/div>\n<p>\n\t\t\t<span><br \/>\n\t\t\t\t<i><\/i><br \/>\n\t\t\t\t<span class=\"thrv_tw_qs_button_text thrv-inline-text tve_editable\">Click to Tweet<\/span><br \/>\n\t\t\t<\/span>\n\t\t<\/p>\n<\/p><\/div>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\" data-css=\"tve-u-17648841572\">\n<p dir=\"ltr\">Even if you don\u2019t know what is clustering, still, it is ok.<\/p>\n<p dir=\"ltr\">By the end of this article, you will learn everything you need to know about <strong>k-means clustering<\/strong>.<\/p>\n<p dir=\"ltr\">After reading this article, you don\u2019t need to brush up on k-means clustering topics before attending any <a href=\"https:\/\/dataaspirant.com\/how-to-get-first-job-data-scientist\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>data scientist job interview<\/strong><\/a>.<\/p>\n<p dir=\"ltr\">Excited to learn \ud83d\ude42<\/p>\n<p dir=\"ltr\">We are too \ud83d\ude42<\/p>\n<p dir=\"ltr\">Great, before starting the article, let\u2019s look at the topics you are going to learn in this article. Only if you read the complete article \ud83d\ude00<\/p>\n<p dir=\"ltr\">I am not kidding. It\u2019s true. It will give you a better idea about the entire article flow.\u00a0<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\" data-css=\"tve-u-17648841575\">\n<p dir=\"ltr\">Let\u2019s split the k-means clustering into two parts,\u00a0<\/p>\n<p dir=\"ltr\">Let\u2019s learn <a href=\"https:\/\/dataaspirant.com\/hierarchical-clustering-r\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>clustering first<\/strong><\/a>. Then we will use this knowledge to understand k-means clustering.<\/p>\n<h2 class=\"\" id=\"t-1607534314798\">What is clustering?<\/h2>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-1764896c149\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/2-What-is-Clustering.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-7754\" alt=\"What is Clustering\" data-id=\"7754\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"What is Clustering\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7754\" alt=\"What is Clustering\" data-id=\"7754\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"What is Clustering\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/2-What-is-Clustering.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">A cluster is a group of similar entities that are kept <strong>together<\/strong>. Their similarity decided by the feature they possess and how closely associated compared with the other entities to this feature.\u00a0<\/p>\n<p dir=\"ltr\">Let\u2019s say we have <strong>two points<\/strong> in a 2-d \u00a0graph. Using the euclidean distance, we can measure how close these two points are located.\u00a0<\/p>\n<p dir=\"ltr\">Likewise, using <a href=\"https:\/\/dataaspirant.com\/five-most-popular-similarity-measures-implementation-in-python\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>various similarity measures<\/strong><\/a>, we can find how close\/similar the data points are.\u00a0<\/p>\n<p dir=\"ltr\">All similar data points form clusters or groups. Creating these clusters in a <strong>meaningful<\/strong> way is called clustering.<\/p>\n<p dir=\"ltr\">In the <a href=\"https:\/\/dataaspirant.com\/category\/machine-learning-2\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>machine learning world<\/strong><\/a>, clustering is the process in which we segregate a heap of data points into clusters on the basis of their features.\u00a0<\/p>\n<p dir=\"ltr\">We will, discusses these features in the upcoming sections.\u00a0<\/p>\n<h3 id=\"t-1607534314799\" class=\"\">Clustering Real Life Example<\/h3>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-1764897b761\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/3-Clustering-Example.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-7757\" alt=\"Clustering Example\" data-id=\"7757\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Clustering Example\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7757\" alt=\"Clustering Example\" data-id=\"7757\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Clustering Example\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/3-Clustering-Example.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">One good real-life example for clustering is the <strong>world map<\/strong>. If you see here, each color represents a cluster. These clusters are created based on meaningful similarities.\u00a0<\/p>\n<p dir=\"ltr\">For now, let\u2019s say this similarity is <strong>distance<\/strong>. If you take any place in the cluster, it is closer to the center of that cluster compared with other clusters.\u00a0<\/p>\n<p dir=\"ltr\">This is one of the main rules for creating clusters using any clustering algorithms.<\/p>\n<p dir=\"ltr\">Any point in the cluster should be closer to that cluster\u2019s center and far from any other cluster.<\/p>\n<p dir=\"ltr\">In a more technical way, we can say the <strong>intra distance<\/strong> between the same points should be smaller compared with the <strong>inter points<\/strong> distance of different clusters.<\/p>\n<ul class=\"\">\n<li><strong>Intra Distance: <\/strong>Distance between the same cluster points.<\/li>\n<li><strong>Inter Distance:<\/strong> Distance between different cluster points.<\/li>\n<\/ul>\n<p dir=\"ltr\">If the above statements are not clear, please go to the <strong>how to evaluate clusters section<\/strong> of this article. We provided an excellent visual example for this.<\/p>\n<p dir=\"ltr\">We hope the above sentence is clear by now. If not, read this sentence again. Once you have given the complete reading of the article.<\/p>\n<h3 id=\"t-1607534314800\" class=\"\">How is clustering different from classification?<\/h3>\n<p dir=\"ltr\">As a <a href=\"https:\/\/dataaspirant.com\/for-beginners\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>data science beginner<\/strong><\/a>, the difference between clustering and classification is confusing. So as the initial step, let us understand the fundamental difference between <a href=\"https:\/\/dataaspirant.com\/classification-clustering-alogrithms\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\"><strong>classification and clustering<\/strong><\/a>.\u00a0<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17648995311\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/4-Difference-Between-Clustering-and-Classification.png?resize=626%2C329&amp;ssl=1\" class=\"tve_image wp-image-7760\" alt=\"Difference Between Clustering and Classification\" data-id=\"7760\" width=\"626\" data-init-width=\"1024\" height=\"329\" data-init-height=\"538\" title=\"Difference Between Clustering and Classification\" loading=\"lazy\" data-width=\"626\" data-height=\"329\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7760\" alt=\"Difference Between Clustering and Classification\" data-id=\"7760\" width=\"626\" data-init-width=\"1024\" height=\"329\" data-init-height=\"538\" title=\"Difference Between Clustering and Classification\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/4-Difference-Between-Clustering-and-Classification.png?resize=626%2C329&amp;ssl=1\" data-width=\"626\" data-height=\"329\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">For example,<\/p>\n<p dir=\"ltr\">let us say we have four categories:\u00a0<\/p>\n<ol class=\"\">\n<li>Dog\u00a0<\/li>\n<li>Cat<\/li>\n<li>Shark<\/li>\n<li>Goldfish<\/li>\n<\/ol>\n<p dir=\"ltr\">In this scenario, clustering would make <strong>2 clusters.<\/strong> The one who lives on land and the other one lives in water. <\/p>\n<p dir=\"ltr\">So the entities of the first cluster would be dogs and cats. Similarly, for the second cluster, it would be sharks and goldfishes.\u00a0<\/p>\n<p dir=\"ltr\">But in classification, it would <a href=\"https:\/\/dataaspirant.com\/implement-multinomial-logistic-regression-python\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>classify the four categories<\/strong><\/a> into four different classes. One for each category. So dogs would be classified under the class dog, and similarly, it would be for the rest.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176489a8853\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/5-Clustering-Vs-Classification-Example.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-7763\" alt=\"Clustering Vs Classification Example\" data-id=\"7763\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Clustering Vs Classification Example\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7763\" alt=\"Clustering Vs Classification Example\" data-id=\"7763\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Clustering Vs Classification Example\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/5-Clustering-Vs-Classification-Example.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\"><a href=\"https:\/\/dataaspirant.com\/classification-and-prediction\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>In classification<\/strong><\/a>, we have labels to tell us and supervise whether the classification is right or not, and that is how we can classify them right. Thus making it a <a href=\"https:\/\/dataaspirant.com\/supervised-and-unsupervised-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>supervised learning algorithm<\/strong><\/a>.\u00a0<\/p>\n<p dir=\"ltr\">But in clustering, despite distinctions, we cannot classify them because we don\u2019t have labels for them. And that is why clustering is an <a href=\"https:\/\/dataaspirant.com\/supervised-and-unsupervised-learning\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\"><strong>unsupervised learning algorithm<\/strong><\/a>.<\/p>\n<p dir=\"ltr\">In real life we can expect high volume of data <strong>without labels<\/strong>, Because of such great use, clustering technique have may real-time situations to help. Let us understand that.<\/p>\n<h2 id=\"t-1607534314801\" class=\"\">Clustering Applications<\/h2>\n<p dir=\"ltr\">Below are the listed clustering applications.<\/p>\n<h3 id=\"t-1607534314802\" class=\"\">Recommendation Engines<\/h3>\n<p>Clustering is widely used in <a href=\"https:\/\/dataaspirant.com\/recommendation-engine-part-1\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>recommendation engines<\/strong><\/a> to make clusters one\u2019s likes and dislikes.<\/p>\n<h3 id=\"t-1607534314803\" class=\"\">Image Segmentation\u00a0<\/h3>\n<p>It clubs the pixels with similar values and segments them out from the rest of the image.<\/p>\n<h3 id=\"t-1607534314804\" class=\"\">Customer Segmentation<\/h3>\n<p>People with similar choices are clustered and studied in one category. It helps the firm in ways like promoting things to the right audience, taking the right feedback.<\/p>\n<h2 id=\"t-1607534314805\" class=\"\">Various Clustering Algorithms<\/h2>\n<p dir=\"ltr\">There are various clustering algorithms. Usage is dependent on their use cases. Below are the listed clustering algorithms.<\/p>\n<ul class=\"\">\n<li>KMeans<\/li>\n<li>DBSCAN<\/li>\n<li>Agglomerative Clustering<\/li>\n<li>Gaussian Mixture Models<\/li>\n<li>Spectral Clustering<\/li>\n<\/ul>\n<p dir=\"ltr\">But in this article, we are focusing exclusively on K-Means algorithm. \u00a0<\/p>\n<h2 id=\"t-1607534314806\" class=\"\">How K-Means Clustering Works<\/h2>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176489da7f5\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/6-How-K-means-Clustering-Works.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-7771\" alt=\"How K-means Clustering Works\" data-id=\"7771\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"How K-means Clustering Works\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7771\" alt=\"How K-means Clustering Works\" data-id=\"7771\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"How K-means Clustering Works\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/6-How-K-means-Clustering-Works.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<p dir=\"ltr\">Let us understand the basic intuition of this <a href=\"https:\/\/dataaspirant.com\/supervised-and-unsupervised-learning\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\"><strong>unsupervised learning algorithm<\/strong><\/a>.<\/p>\n<h4 class=\"\">The intuition of the algorithm<\/h4>\n<p dir=\"ltr\">Let us start by understanding what does this \u201ck\u201d means in K-means. K is a free parameter that is for addressing the <strong>number of clusters<\/strong> we want to have out of the given data points.<\/p>\n<p dir=\"ltr\">From all the content mentioned above, what we understand from a cluster is that we intend to have only those entities in one cluster who are similar to each other.\u00a0<\/p>\n<p dir=\"ltr\">The same is for K means clustering. It is a clustering algorithm that aims to have similar entities in one cluster.<\/p>\n<p dir=\"ltr\">Well, you may ask, how does this algorithm decide whether an entity would lie in it or not?\u00a0<\/p>\n<p dir=\"ltr\">So the answer to it is that it calculates the distance between its data points to the centroid of that cluster and aims to minimize the sum of all the distances(the distance of each data point from the centroid).<\/p>\n<p dir=\"ltr\">In short it uses <strong><a href=\"https:\/\/dataaspirant.com\/five-most-popular-similarity-measures-implementation-in-python\/\" target=\"_blank\" class=\"tve-froala\" rel=\"noopener noreferrer\">smilarity measures<\/a> <\/strong>to decide that.<\/p>\n<p dir=\"ltr\">One small thing that we need to understand is that the more the number of clusters, the less would be the sum of the distance of all the data points from the <strong>centroid<\/strong>.<\/p>\n<p dir=\"ltr\">This is because of the very reason that the number of data points in each cluster would decrease with an <strong>increase<\/strong> in the number of clusters.<\/p>\n<p dir=\"ltr\">And at a point where the number of clusters is equal to the number of data points, the sum of distance becomes zeros because the centroid is the data point itself!<\/p>\n<p dir=\"ltr\">Now let us see how it works. Please refer to the below image for all the steps.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-176489ee814\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/7-K-means-Clustering-Example.png?resize=626%2C667&amp;ssl=1\" class=\"tve_image wp-image-7774\" alt=\"K-means Clustering Example\" data-id=\"7774\" width=\"626\" data-init-width=\"1412\" height=\"667\" data-init-height=\"1504\" title=\"K-means Clustering Example\" loading=\"lazy\" data-width=\"626\" data-height=\"667\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7774\" alt=\"K-means Clustering Example\" data-id=\"7774\" width=\"626\" data-init-width=\"1412\" height=\"667\" data-init-height=\"1504\" title=\"K-means Clustering Example\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/7-K-means-Clustering-Example.png?resize=626%2C667&amp;ssl=1\" data-width=\"626\" data-height=\"667\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<h4 class=\"\">Step 1<\/h4>\n<p dir=\"ltr\">Here we are having a few data points, which we want to cluster. So we would start by picking the number of clusters we want to have for this case.<\/p>\n<p dir=\"ltr\">Let us select 2 for this instance. And then randomly selecting a point considering it to be the centroid of the cluster.<\/p>\n<h4 class=\"\">Step 2<\/h4>\n<p dir=\"ltr\">We have successfully marked the centers of these clusters. Now we will be marking all the points with respective colors on the basis of the distance they have from the centroid.<\/p>\n<h4 class=\"\">Step 3<\/h4>\n<p dir=\"ltr\">After marking all the data points, we will now be computing the centroid of this cluster again. We are doing it because initially, we had picked the centroid randomly. Then to remove error, if any, we are doing it.<\/p>\n<p dir=\"ltr\">The centroid of the cluster is computed by finding a point within the cluster that would be equidistant from all the data points.<\/p>\n<h4 class=\"\">Step 4<\/h4>\n<p dir=\"ltr\">Now since we have computed the centroid again and we know it is not the same as it was before so we would iterate the process again and would find the points nearest to this centroid for each cluster.<\/p>\n<h4 class=\"\">Step 5<\/h4>\n<p dir=\"ltr\">Now we have got the result again. One may ask when shall we stop the iteration of this finding the centroid and then placing the data points accordingly? Well, you have to do it till the time when the position of the centroids doesn\u2019t change.<\/p>\n<h4 class=\"\">Step 6<\/h4>\n<p dir=\"ltr\">We marked the two clusters.<\/p>\n<p dir=\"ltr\">In this case, it was easy, so we were able to get the results in 2 iterations only.<\/p>\n<p dir=\"ltr\">We had also talked about the <strong>random initialization<\/strong> that we are putting ourselves into. With this a problem we have is that it can land us up with some really bad clusters which <strong>won\u2019t<\/strong> be of any use.\u00a0<\/p>\n<h2 id=\"t-1607534314807\" class=\"\">How To Evaluate Clusters<\/h2>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17648a0b24a\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/8-Intra-Cluster-Distance-and-Inter-Cluster-Distance.png?resize=626%2C366&amp;ssl=1\" class=\"tve_image wp-image-7778\" alt=\"Intra Cluster Distance and Inter Cluster Distance\" data-id=\"7778\" width=\"626\" data-init-width=\"1024\" height=\"366\" data-init-height=\"599\" title=\"Intra Cluster Distance and Inter Cluster Distance\" loading=\"lazy\" data-width=\"626\" data-height=\"366\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7778\" alt=\"Intra Cluster Distance and Inter Cluster Distance\" data-id=\"7778\" width=\"626\" data-init-width=\"1024\" height=\"366\" data-init-height=\"599\" title=\"Intra Cluster Distance and Inter Cluster Distance\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/8-Intra-Cluster-Distance-and-Inter-Cluster-Distance.png?resize=626%2C366&amp;ssl=1\" data-width=\"626\" data-height=\"366\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">Let us also understand different evaluation metrics for clustering. In <a href=\"https:\/\/dataaspirant.com\/six-popular-classification-evaluation-metrics-in-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>classification evaluation metrics<\/strong><\/a> helps in understanding how good the build is performing on the unseen data. In the same way we are having ways to determine the performance of the clusters created.<\/p>\n<p dir=\"ltr\">Of many, we would discuss <strong>2 criteria<\/strong> for evaluating clusters.<\/p>\n<ol class=\"\">\n<li>Inertia<\/li>\n<li>Dunn Index<\/li>\n<\/ol>\n<h3 class=\"\" id=\"t-1607534314809\">Inertia<\/h3>\n<p dir=\"ltr\">If you recall, we have discussed that it is very important for us to have similar entities in our cluster. So what it does basically calculates the sum of distances of all the entities present in the cluster.<\/p>\n<h3 class=\"\" id=\"t-1607534314810\">Dunn Index<\/h3>\n<p>Here comes the concept of inter and intra cluster distance. Intra cluster distance is handled by inertia, and that is the distance between the data points which are inside one cluster. Inter cluster distance means the distance between 2 different clusters.<\/p>\n<p dir=\"ltr\">So <strong>dunn index<\/strong> is the ratio of the minimum inter cluster distance to the maximum of intra cluster distance.<\/p>\n<p dir=\"ltr\">So more will be the value of the dunn index better would be the clusters in terms of being separable.<\/p>\n<h2 id=\"t-1607534314808\" class=\"\">How K-Means++ Clustering Works<\/h2>\n<p dir=\"ltr\">So to pull ourselves out of this random initialization trap, we have kmeans++.<\/p>\n<p dir=\"ltr\">Let us also see how this thing really works.<\/p>\n<ol class=\"\">\n<li>Just like K means, here too we select the centroid randomly but the twist here is that there we used to select centroid for all the clusters and here we would be selecting the centroid randomly for only one cluster.<\/li>\n<li>Now we would be computing the distance between every data point from that cluster.<\/li>\n<li>Now comes the selecting of the cluster, here we would be choosing our second cluster by seeing which data point is the farthest from our centroid. Usually, we take the square of the distance just to be on a safer side.<\/li>\n<li>Now repeat the above steps until the desired number(k) of clusters have been selected.<\/li>\n<\/ol>\n<p dir=\"ltr\">We would be having a look at the implementation of this and along with that would look at how can we decide the right amount of clusters for the same.<\/p>\n<h2 id=\"t-1607534314811\" class=\"\">Key Differences Between K means and Kmeans++<\/h2>\n<\/div>\n<div class=\"thrv_wrapper thrv-page-section thrv-lp-block\" data-inherit-lp-settings=\"1\" data-css=\"tve-u-17648a3944c\" data-keep-css_id=\"1\">\n<div class=\"tve-page-section-in tve_empty_dropzone  \" data-css=\"tve-u-17648a396f9\">\n<div class=\"thrv_wrapper thrv-columns dynamic-group-kbulxqe6\" data-css=\"tve-u-17648a3944d\">\n<div class=\"tcb-flex-row v-2 tcb--cols--2\" data-css=\"tve-u-17648a3944e\">\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbulxl9a\" data-css=\"tve-u-17648a3944f\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbulxc3q\" data-css=\"tve-u-17648a39450\">\n<div class=\"tve-cb\">\n<h4 class=\"\">K-Means<\/h4>\n<div class=\"thrv_wrapper thrv-styled_list dynamic-group-kbulx7a0\" data-icon-code=\"icon-check\" data-css=\"tve-u-17648a39454\">\n<ul class=\"tcb-styled-list\">\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-17648a39455\"><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-17648a39457\">Takes less time to implement.<\/span><\/li>\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-17648a39455\"><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-17648a39457\">Randomly chooses two centroids.<\/span><\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbulxl9a\" data-css=\"tve-u-17648a39458\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbulxc3q\" data-css=\"tve-u-17648a39459\">\n<div class=\"tve-cb\">\n<h4 class=\"\">K-Means++<\/h4>\n<div class=\"thrv_wrapper thrv-styled_list dynamic-group-kbulx7a0\" data-icon-code=\"icon-times-solid\" data-css=\"tve-u-17648a3945c\">\n<ul class=\"tcb-styled-list\">\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-17648a3945d\"><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-17648a3945f\">Takes more time to implement<\/span><\/li>\n<li class=\"thrv-styled-list-item dynamic-group-kbulwyg8\" data-css=\"tve-u-17648a39460\"><span class=\"thrv-advanced-inline-text tve_editable tcb-styled-list-icon-text tcb-no-delete tcb-no-save dynamic-group-kbulwoj9\" data-css=\"tve-u-17648a39462\">Chooses one centroid at random and other on the basis of the square of the distance from the first one.<\/span><\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2 class=\"\" id=\"t-1607534314812\">Methods to identify \u201cK\u201d in K means clustering<\/h2>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17648a77176\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/9-Methods-to-Identify-K-in-K-means-clustering.png?resize=626%2C376&amp;ssl=1\" class=\"tve_image wp-image-7789\" alt=\"Methods to Identify K in K-means clustering\" data-id=\"7789\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Methods to Identify K in K-means clustering\" loading=\"lazy\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7789\" alt=\"Methods to Identify K in K-means clustering\" data-id=\"7789\" width=\"626\" data-init-width=\"750\" height=\"376\" data-init-height=\"450\" title=\"Methods to Identify K in K-means clustering\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/9-Methods-to-Identify-K-in-K-means-clustering.png?resize=626%2C376&amp;ssl=1\" data-width=\"626\" data-height=\"376\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element tve-froala fr-box fr-basic\">\n<blockquote class=\"\"><p><strong>How to identify the best \u201cK\u201d ?<\/strong><\/p><\/blockquote>\n<p dir=\"ltr\">There are several methods to find an optimal number of clusters for KMeans clusters. To popular methods are<\/p>\n<ol class=\"\">\n<li>Elbow Method<\/li>\n<li>Silhouette Method<\/li>\n<\/ol>\n<h3 id=\"t-1607534314814\" class=\"\">Elbow Method<\/h3>\n<p dir=\"ltr\">It is to calculate the sum the distances from data points to centroids and aims at minimising the sum to an optimal value.<\/p>\n<h3 id=\"t-1607534314815\" class=\"\">Silhouette Method<\/h3>\n<p dir=\"ltr\">The silhouette value measures how similar a point is to its own cluster (cohesion) compared to other clusters (separation).<\/p>\n<p dir=\"ltr\">Here we would be looking at the <strong>Elbow method<\/strong>. Details about the same are mentioned in the problem statement below.<\/p>\n<h2 id=\"t-1607534314813\" class=\"\">K-means Clustering Implementation in Python<\/h2>\n<p dir=\"ltr\">It is a problem to cluster people on the basis of their spending scores and income. In this problem, you will understand the dataset.<\/p>\n<p dir=\"ltr\">Also you will learn about how the elbow method determines the right number of cluster. At the we will learn the python implementation K-Means clustering and plotting the clusters<\/p>\n<p>You can download the dataset from <a href=\"https:\/\/www.kaggle.com\/shwetabh123\/mall-customers\" class=\"tve-froala\">here<\/a>.<\/p>\n<\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p dir=\"ltr\">Before jumping into this, we need to understand what exactly is <strong>wcss<\/strong> doing?\u00a0<\/p>\n<p dir=\"ltr\">Wcss stands for the within-cluster sum of squares. Which is just a high-fi name for finding the sum of distances of all the data points to the centroid of the cluster.<\/p>\n<p dir=\"ltr\">In the code segment below, it would be starting off with 1 cluster and would go till 10. Always remember we want the sum of this distance to be as minimum as possible in a way where the number of data points in that cluster remains constant.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17648ad6ec5\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/10-Elbow-method.png?resize=626%2C469&amp;ssl=1\" class=\"tve_image wp-image-7797\" alt=\"Elbow method\" data-id=\"7797\" width=\"626\" data-init-width=\"640\" height=\"469\" data-init-height=\"480\" title=\"Elbow method\" loading=\"lazy\" data-width=\"626\" data-height=\"469\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7797\" alt=\"Elbow method\" data-id=\"7797\" width=\"626\" data-init-width=\"640\" height=\"469\" data-init-height=\"480\" title=\"Elbow method\" loading=\"lazy\" src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/10-Elbow-method.png?resize=626%2C469&amp;ssl=1\" data-width=\"626\" data-height=\"469\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<p>Here we are able to see a considerable decline in the value of WCSS after cluster 5. So this means that the optimal number of clusters is 5.<\/p>\n<\/div>\n<div class=\"thrv_wrapper tve_image_caption\" data-css=\"tve-u-17648b0b5d9\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/11-Clustering-output.png?resize=626%2C469&amp;ssl=1\" class=\"tve_image wp-image-7800\" alt=\"K-means clustering output\" data-id=\"7800\" width=\"626\" data-init-width=\"640\" height=\"469\" data-init-height=\"480\" title=\"K-means clustering output\" loading=\"lazy\" data-width=\"626\" data-height=\"469\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7800\" alt=\"K-means clustering output\" data-id=\"7800\" width=\"626\" data-init-width=\"640\" height=\"469\" data-init-height=\"480\" title=\"K-means clustering output\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/11-Clustering-output.png?resize=626%2C469&amp;ssl=1\" data-width=\"626\" data-height=\"469\" data-recalc-dims=\"1\"><\/span><\/div>\n<div class=\"thrv_wrapper thrv_text_element\">\n<h2 id=\"t-1607534314816\" class=\"\">Conclusion<\/h2>\n<p dir=\"ltr\">In this article we explained or provided a brief idea about k-means clustering. Also explained how clustering is different from classification, how we can evaluate clusters.<\/p>\n<p dir=\"ltr\">This gives the complete flow of how the K means algorithms works.\u00a0<\/p>\n<p dir=\"ltr\">In that we had also seen more about the random initialisation trap and how can we use kmeans++ to pull ourselves out of it.<\/p>\n<p dir=\"ltr\">Lastly we had taken a look at a clustering based problem statement, which involved the concepts of choosing the right number of clusters and how to visualise it.<\/p>\n<\/div>\n<h4 class=\"\">Recommended Machine Learning Courses<\/h4>\n<div class=\"thrv_wrapper thrv-page-section thrv-lp-block\" data-inherit-lp-settings=\"1\" data-css=\"tve-u-1764884141b\" data-keep-css_id=\"1\">\n<div class=\"tve-page-section-in tve_empty_dropzone  \" data-css=\"tve-u-17481b960b8\">\n<div class=\"thrv_wrapper thrv-columns dynamic-group-kbt3q0q7\" data-css=\"tve-u-17481b95e2b\">\n<div class=\"tcb-flex-row v-2 tcb--cols--3 tcb-medium-no-wrap tcb-mobile-wrap m-edit\" data-css=\"tve-u-1764884141c\">\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-17648841433\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-17648841436\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/clustering_unsupervised_learning.jpg?resize=176%2C176&amp;ssl=1\" class=\"tve_image wp-image-7846\" alt=\"clustering unsupervised learning\" data-id=\"7846\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"clustering unsupervised learning\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17648841437\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7846\" alt=\"clustering unsupervised learning\" data-id=\"7846\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"clustering unsupervised learning\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/clustering_unsupervised_learning.jpg?resize=176%2C176&amp;ssl=1\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17648841437\" data-recalc-dims=\"1\"><span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-1764884141e\">Cluster Analysis With Python<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-17648841434\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-17648841442\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/unsupervised_learning.jpg?resize=176%2C176&amp;ssl=1\" class=\"tve_image wp-image-7848\" alt=\"unsupervised learning\" data-id=\"7848\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"unsupervised learning\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17648841443\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-7848\" alt=\"unsupervised learning\" data-id=\"7848\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"unsupervised learning\" loading=\"lazy\" src=\"https:\/\/i1.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/12\/unsupervised_learning.jpg?resize=176%2C176&amp;ssl=1\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17648841443\" data-recalc-dims=\"1\"><span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-17648841425\">Unsupervised Learning Algorithms<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"tcb-flex-col\">\n<div class=\"tcb-col dynamic-group-kbt3pyfd\" data-css=\"tve-u-17481b95e2d\">\n<div class=\"thrv_wrapper thrv_contentbox_shortcode thrv-content-box tve-elem-default-pad dynamic-group-kbt3pwhk\" data-css=\"tve-u-17648841435\">\n<div class=\"tve-cb\">\n<div class=\"thrv_wrapper tve_image_caption dynamic-group-kbt3pu4z\" data-css=\"tve-u-17648841444\"><span class=\"tve_image_frame\"><img src=\"https:\/\/i2.wp.com\/dataaspirant.com\/wp-content\/plugins\/lazy-load\/images\/1x1.trans.gif?ssl=1\" data-lazy-src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/machine-learning-1.jpg?resize=176%2C176&amp;ssl=1\" class=\"tve_image wp-image-4302\" alt=\"Machine learning\" data-id=\"4302\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"machine learning\" loading=\"lazy\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17648841445\" data-recalc-dims=\"1\"><img class=\"tve_image wp-image-4302\" alt=\"Machine learning\" data-id=\"4302\" width=\"176\" data-init-width=\"150\" height=\"176\" data-init-height=\"150\" title=\"machine learning\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/dataaspirant.com\/wp-content\/uploads\/2020\/08\/machine-learning-1.jpg?resize=176%2C176&amp;ssl=1\" data-width=\"176\" data-height=\"176\" data-css=\"tve-u-17648841445\" data-recalc-dims=\"1\"><span class=\"tve-image-overlay\"><\/span><\/span><\/div>\n<h4 class=\"\" data-css=\"tve-u-1764884142c\">A to Z Machine Learning with Python<\/h4>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/dataaspirant.com\/k-means-clustering-algorithm\/<\/p>\n","protected":false},"author":0,"featured_media":8016,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8015"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8015"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8015\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8016"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8015"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8015"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8015"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}