{"id":1336,"date":"2020-09-11T14:36:27","date_gmt":"2020-09-11T14:36:27","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/09\/11\/feature-engineering-for-numerical-data\/"},"modified":"2020-09-11T14:36:27","modified_gmt":"2020-09-11T14:36:27","slug":"feature-engineering-for-numerical-data","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/09\/11\/feature-engineering-for-numerical-data\/","title":{"rendered":"Feature Engineering for Numerical Data"},"content":{"rendered":"<div id=\"post-\">\n<p><b>By <a href=\"https:\/\/www.linkedin.com\/in\/kurtispykes\/\" target=\"_blank\" rel=\"noopener noreferrer\">Kurtis Pykes<\/a>, Artificial Intelligence Writer<\/b>.<\/p>\n<p>Numeric data is almost a blessing. Why almost? Well, because it is already in a format that is ingestible by Machine Learning models. However, if we translate it into human-relatable terms, just because a PhD level textbook is written in English \u2014 I speak, read and write in English \u2014 does not mean that I am capable of understanding the textbook well enough to derive useful insights. What would make the textbook useful to me is if it epitomizes the most important information in a manner that considers the assumptions of my mental model, such as \u201cMaths is a myth\u201d (which, by the way, is no longer my view since I am really starting to enjoying it). In the same way, a good feature should represent salient aspects of the data, as well as taking the shape of the assumptions that are made by the Machine Learning model.<\/p>\n<p><img class=\"aligncenter size-full wp-image-46874\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/data-engineering1.jpg\" alt=\"Data Engineering\" width=\"90%\"><\/p>\n<p>Feature engineering is the process of extracting features from raw data and transforming them into formats that can be ingested by a machine learning model. Transformations are often required to ease the difficulty of modelling and boost the results of our models. Therefore, techniques to engineer numeric data types are fundamental tools for Data Scientist (Machine Learning Engineers alike) to add to their artillery.<\/p>\n<blockquote>\n<p data-selectable-paragraph=\"\">\u201cdata is like the\u00a0<em>crude oil<\/em>\u00a0of machine learning, which means it has to be refined into\u00a0<em>features\u00a0<\/em>\u2014 predictor variables \u2014 to be useful for training a model.\u201d\u00a0\u2014 <a href=\"https:\/\/medium.com\/u\/e2f299e30cb9?source=post_page-----e20167ec18----------------------\" target=\"_blank\" rel=\"noopener noreferrer\">Will Koehrsen<\/a><\/p>\n<\/blockquote>\n<p data-selectable-paragraph=\"\">As we strive for mastery, it is important to note that it is never enough to know why a mechanism works and what it can do. Mastery knows how something is done, has an intuition for the underlying principles, and has the neural connections that make drawing the correct tool a seamless procedure when faced with a challenge. That will not come from reading this article, but from the deliberate practice of which this article will open the door for you to do by providing the intuition behind the techniques so that you may understand how and when to apply them.<\/p>\n<p data-selectable-paragraph=\"\">The features in your data will directly influence the predictive models you use and the results you can achieve.\u201d \u2014\u00a0<a href=\"https:\/\/medium.com\/u\/f374d0159316?source=post_page-----e20167ec18----------------------\" target=\"_blank\" rel=\"noopener noreferrer\">Jason Brownlee<\/a><\/p>\n<p data-selectable-paragraph=\"\"><em>Note: You can find the code used for this article on my <a href=\"https:\/\/github.com\/kurtispykes\/demo\/tree\/master\" target=\"_blank\" rel=\"noopener noreferrer\">Github page<\/a>.<\/em><\/p>\n<p data-selectable-paragraph=\"\">There may be occasions where data is collected on a feature that accumulates, thereby having an infinite upper boundary. Examples of this type of continuous data may be a tracking system that monitors the number of visits that all of my blog posts receive on a daily basis. This type of data easily attracts outliers since there could be some unpredictable event that affects the total traffic that my articles are accumulating. For instance, one day, people may decide they want to be able to do data analysis, so my article on\u00a0<a href=\"https:\/\/towardsdatascience.com\/effective-data-visualization-ef30ae560961\">Effective data visualization<\/a>\u00a0may spike for that day. In other words, when data can be collected quickly and in large amounts, then it is likely that it would contain some extreme values that would need engineering.<\/p>\n<p data-selectable-paragraph=\"\">Some methods to handle this instance are:<\/p>\n<p>\u00a0<\/p>\n<h3>Quantization<\/h3>\n<p>\u00a0<\/p>\n<p data-selectable-paragraph=\"\">This method contains the scale of the data by grouping the values into bins. Therefore, quantization maps a continuous value into a discrete value, and, conceptually, this can be thought of as an ordered sequence of bins. To implement this, we must consider the width of bins that we create, of which the solutions fall into two categories, fixed-width bins or adaptive bins.<\/p>\n<blockquote>\n<p data-selectable-paragraph=\"\"><em>Note: This is particularly useful for linear models. In tree-based models, this is not useful because tree-based models make their own splits.<\/em><\/p>\n<\/blockquote>\n<p data-selectable-paragraph=\"\">In the fixed-width scenario, the value is automatically or custom-designed to segment data into discrete bins \u2014 they can also be linearly scaled or exponentially scaled. A popular example is separating ages of people into partitions by decade intervals such that bin 1 contains ages 0\u20139, bin 2 has 10\u201319 etc.<\/p>\n<p data-selectable-paragraph=\"\">Note that if the values span across a large magnitude of numbers, then a better method may be to group the values into powers of a constant, such as to the power of 10: 0\u20139, 10\u201399, 100\u2013999, 1000\u20139999. Notice that the bin widths grow exponentially, hence in the case of 1000\u20139999, the bin width is O(10000), whereas 0\u20139 is O(10). Take the log of the count to map from the count to the bin of the data.<\/p>\n<div>\n<pre>import numpy as np \r\n\r\n#15 random integers from the \"discrete uniform\" distribution\r\nages = np.random.randint(0, 100, 15)\r\n\r\n#evenly spaced bins\r\nages_binned = np.floor_divide(ages, 10)\r\n\r\nprint(f\"Ages: {ages} nAges Binned: {ages_binned} n\")\r\n&gt;&gt;&gt; Ages: [97 56 43 73 89 68 67 15 18 36  4 97 72 20 35]\r\nAges Binned: [9 5 4 7 8 6 6 1 1 3 0 9 7 2 3]\r\n\r\n#numbers spanning several magnitudes\r\nviews = [300, 5936, 2, 350, 10000, 743, 2854, 9113, 25, 20000, 160, 683, 7245, 224]\r\n\r\n#map count -&gt; exponential width bins\r\nviews_exponential_bins = np.floor(np.log10(views))\r\n\r\nprint(f\"Views: {views} nViews Binned: {views_exponential_bins}\")\r\n&gt;&gt;&gt; Views: [300, 5936, 2, 350, 10000, 743, 2854, 9113, 25, 20000, 160, 683, 7245, 224]\r\nViews Binned: [2. 3. 0. 2. 4. 2. 3. 3. 1. 4. 2. 2. 3. 2.]\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p data-selectable-paragraph=\"\">Adaptive bins are a better fit when there are large gaps within the counts. When there are large margins in-between the values of the counts, then some of the fixed-width bins would be empty.<\/p>\n<p data-selectable-paragraph=\"\">To do adaptive binning, we can make use of the quantiles of the data \u2014 the values that divide the data into equal portions like the median.<\/p>\n<div>\n<pre>import pandas as pd\r\n\r\n#map the counts to quantiles (adaptive binning)\r\nviews_adaptive_bin = pd.qcut(views, 5, labels=False)\r\n\r\nprint(f\"Adaptive bins: {views_adaptive_bin}\")\r\n&gt;&gt;&gt; Adaptive bins: [1 3 0 1 4 2 3 4 0 4 0 2 3 1]\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<h3>Power Transformations<\/h3>\n<p>\u00a0<\/p>\n<p data-selectable-paragraph=\"\">We have already seen an example of this: the log transformation is part of a family of variance stabilizing transformations know as power transformations. Wikipedia describes power transformations as a\u00a0<em>\u201ctechnique used to stabilize variance, make the data more normal distribution-like, improve the validity of measures of association such as the Pearson correlation between variables and for other data stabilization procedures.\u201d<\/em><\/p>\n<p data-selectable-paragraph=\"\">Why would we want to transform our data to fit the Normal Distribution? Great question! You may want to use a parametric model \u2014 a model that makes assumptions of the data \u2014 rather than a non-parametric model. When the data is normally distributed, parametric models are powerful. However, in some cases, the data we have may need a helping hand to bring out the beautiful bell-shaped curve of the normal distribution. For instance, the data may be skewed, so we apply a power transformation to assist in helping our feature look more Gaussian.<\/p>\n<p data-selectable-paragraph=\"\">The code below leverages data science frameworks such as pandas, scipy, and numpy to demonstrate power transformations and visualize them using the Plotly.py framework for interactive plots. The dataset used is the\u00a0<a href=\"https:\/\/www.kaggle.com\/c\/house-prices-advanced-regression-techniques\/data\" target=\"_blank\" rel=\"noopener noreferrer\"><em>House Prices: Advanced regression techniques<\/em><\/a>\u00a0from Kaggle, which you can easily download (<a href=\"https:\/\/www.kaggle.com\/c\/house-prices-advanced-regression-techniques\/data\" target=\"_blank\" rel=\"noopener noreferrer\">Click here for access data<\/a>).<\/p>\n<div>\n<pre>import numpy as np\r\nimport pandas as pd\r\nfrom scipy import stats\r\nimport plotly.graph_objects as go\r\nfrom plotly.subplots import make_subplots\r\n\r\ndf = pd.read_csv(\"..\/data\/raw\/train.csv\")\r\n\r\n# applying various transformations\r\nx_log = np.log(df[\"GrLivArea\"].copy()) # log \r\nx_square_root = np.sqrt(df[\"GrLivArea\"].copy()) # square root x_boxcox, _ = stats.boxcox(df[\"GrLivArea\"].copy()) # boxcox\r\nx = df[\"GrLivArea\"].copy() # original data\r\n\r\n# creating the figures\r\nfig = make_subplots(rows=2, cols=2,\r\n                    horizontal_spacing=0.125,\r\n                    vertical_spacing=0.125,\r\n                    subplot_titles=(\"Original Data\",\r\n                                    \"Log Transformation\",\r\n                                    \"Square root transformation\",\r\n                                    \"Boxcox Transformation\")\r\n                    )\r\n\r\n# drawing the plots\r\nfig.add_traces([\r\n                go.Histogram(x=x,\r\n                             hoverinfo=\"x\",\r\n                             showlegend=False),\r\n                go.Histogram(x=x_log,\r\n                             hoverinfo=\"x\",\r\n                             showlegend=False),\r\n                go.Histogram(x=x_square_root,\r\n                             hoverinfo=\"x\",\r\n                             showlegend=False),\r\n                go.Histogram(x=x_boxcox,\r\n                             hoverinfo=\"x\",\r\n                             showlegend=False),\r\n               ],\r\n               rows=[1, 1, 2, 2],\r\n               cols=[1, 2, 1, 2]\r\n)\r\n\r\nfig.update_layout(\r\n    title=dict(\r\n               text=\"GrLivArea with various Power Transforms\",\r\n               font=dict(\r\n                         family=\"Arial\",\r\n                         size=20)),\r\n    showlegend=False,\r\n    width=800,\r\n    height=500)\r\n\r\nfig.show() # display figure\r\n\r\n<\/pre>\n<\/div>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*HeCloiMiQxrEG-HZuKnggg.png\" width=\"90%\"><\/p>\n<p><em>Figure 1: Visualizing the original data and various power transforms power transformations.<\/em><\/p>\n<p data-selectable-paragraph=\"\"><em>Note: Box-cox transformations only work when the data is non-negative<\/em><\/p>\n<blockquote>\n<p data-selectable-paragraph=\"\">\u201cWhich of these is best? You cannot know beforehand. You must try them and evaluate the results to achieve on your algorithm and performance measures.\u201d\u00a0\u2014 <a href=\"https:\/\/medium.com\/u\/f374d0159316?source=post_page-----e20167ec18----------------------\">Jason Brownlee<\/a><\/p>\n<\/blockquote>\n<p>\u00a0<\/p>\n<h3>Feature Scaling<\/h3>\n<p>\u00a0<\/p>\n<p data-selectable-paragraph=\"\">As the name implies, feature scaling (also referred to as feature normalization) is concerned with changing the scale of features. When the features of a dataset differ greatly in scale, then a model that is sensitive to the scale of the input features (i.e., linear regression, logistic regression, neural networks) would be affected. Ensuring features are within a similar scale is imperative. Whereas, models such as tree-based models (i.e., Decision Trees, Random Forest, Gradient boosting) do not care about scale.<\/p>\n<p data-selectable-paragraph=\"\">Common ways to scale features include min-max scaling, standardization, and L\u00b2 normalization. The following is a brief introduction and implementation in python.<\/p>\n<p data-selectable-paragraph=\"\"><strong>Min-Max Scaling<\/strong>\u00a0&#8211; The feature is scaled to a fixed range (which is usually between 0\u20131), meaning that we will have reduced standard deviations, therefore, suppressing the effect of outliers on the feature. Where x is the individual value of the instance (i.e., person 1, feature 2), max(x), min(x) is the maximum and minimum values of the feature \u2014 see Figure 2. For more on this, see the\u00a0<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.MinMaxScaler.html\" target=\"_blank\" rel=\"noopener noreferrer\">sklearn documentation.<\/a><\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*hrlR_IqI13XjrjlcslbDSw.png\" width=\"90%\"><\/p>\n<p><em>Figure 2: Formula for Min-max scaling.<\/em><\/p>\n<p><strong>Standardization\u00a0<\/strong>&#8211; The feature values will be rescaled so that they fit the properties of a normal distribution where the mean is 0, and the standard deviation is 1. To do this, we subtract the mean of the feature \u2014 taken over all the instances \u2014 from the feature instance value, then divide by the variance \u2014 see Figure 3. Refer to the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.StandardScaler.html\">sklearn documentation<\/a>\u00a0for standardization.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*H47v7SPAuEbXkowfI_QSOA.png\" width=\"90%\"><\/p>\n<p><em>Figure 3: Formula for standardization.<\/em><\/p>\n<p><strong>L\u00b2 Normalization<\/strong>\u00a0&#8211; This technique divides the original feature value by the l\u00b2 norm (also euclidean distance) \u2014 the second equation in Figure 4. L\u00b2 norm takes the sum of squares of the values in the feature set across all instances. Refer to the sklearn\u00a0<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer\" target=\"_blank\" rel=\"noopener noreferrer\">documentation<\/a>\u00a0for L\u00b2 Norm (note that there is also the option to do L\u00b9 normalization by setting the\u00a0norm\u00a0parameter to\u00a0&#8220;l1&#8221;\u00a0).<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*fRRWqzBlDcTJnGMALqkYWQ.png\" width=\"90%\"><\/p>\n<p><em>Figure 4: Formula for L\u00b2 Normalization.<\/em><\/p>\n<blockquote>\n<p data-selectable-paragraph=\"\"><em>Visualization of the effects of feature scaling will give a better image of what is going on. For this, I am using the wine dataset that can be imported from sklearn datasets.<\/em><\/p>\n<\/blockquote>\n<div>\n<pre>import pandas as pd\r\nfrom sklearn.datasets import load_wine\r\nfrom sklearn.preprocessing import StandardScaler, MinMaxScaler, Normalizer\r\nimport plotly.graph_objects as go\r\n\r\nwine_json= load_wine() # load in dataset\r\n\r\ndf = pd.DataFrame(data=wine_json[\"data\"], columns=wine_json[\"feature_names\"]) # create pandas dataframe\r\n\r\ndf[\"Target\"] = wine_json[\"target\"] # created new column and added target labels\r\n\r\n# standardization\r\nstd_scaler = StandardScaler().fit(df[[\"alcohol\", \"malic_acid\"]])\r\ndf_std = std_scaler.transform(df[[\"alcohol\", \"malic_acid\"]])\r\n\r\n# minmax scaling\r\nminmax_scaler = MinMaxScaler().fit(df[[\"alcohol\", \"malic_acid\"]])\r\ndf_minmax = minmax_scaler.transform(df[[\"alcohol\", \"malic_acid\"]])\r\n\r\n# l2 normalization\r\nl2norm = Normalizer().fit(df[[\"alcohol\", \"malic_acid\"]])\r\ndf_l2norm = l2norm.transform(df[[\"alcohol\", \"malic_acid\"]])\r\n\r\n# creating traces\r\ntrace1 = go.Scatter(x= df_std[:, 0],\r\n                    y= df_std[:, 1],\r\n                    mode= \"markers\",\r\n                    name= \"Standardized Scale\")\r\n\r\ntrace2 = go.Scatter(x= df_minmax[:, 0],\r\n                    y= df_minmax[:, 1],\r\n                    mode= \"markers\",\r\n                    name= \"MinMax Scale\")\r\n\r\ntrace3 = go.Scatter(x= df_l2norm[:, 0],\r\n                    y= df_l2norm[:, 1],\r\n                    mode= \"markers\",\r\n                    name= \"L2 Norm Scale\")\r\n\r\ntrace4 = go.Scatter(x= df[\"alcohol\"],\r\n                    y= df[\"malic_acid\"],\r\n                    mode= \"markers\",\r\n                    name= \"Original Scale\")\r\n\r\nlayout = go.Layout(\r\n         title= \"Effects of Feature scaling\",\r\n         xaxis=dict(title= \"Alcohol\"),\r\n         yaxis=dict(title= \"Malic Acid\")\r\n         )\r\n\r\ndata = [trace1, trace2, trace3, trace4]\r\nfig = go.Figure(data=data, layout=layout)\r\nfig.show()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*kWQgQtwiANDHijvXsk-s8w.png\" width=\"90%\"><\/p>\n<p><em>Figure 5: The plots for the original feature and various scaling implementations.<\/em><\/p>\n<p>\u00a0<\/p>\n<h3>Feature Interactions<\/h3>\n<p>\u00a0<\/p>\n<p data-selectable-paragraph=\"\">We can create the logical AND function by using the product of pairwise interactions between features. In tree-based models, these interactions occur implicitly, but in models that assume independence of the features, we can explicitly declare interactions between features to improve the output of the model.<\/p>\n<p data-selectable-paragraph=\"\">Think of a simple linear model that uses a linear combination of the input features to predict the output y:<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*LpGyRwI4s8Kla5n7bUPWGA.png\" width=\"90%\"><\/p>\n<p><em>Figure 6: Formula for a linear model.<\/em><\/p>\n<p>We can extend the linear model to capture the interactions that occur between features.<\/p>\n<p><img class=\"aligncenter size-large\" src=\"https:\/\/miro.medium.com\/max\/875\/1*ZO7RIvVdMp6EXCLoESBJDQ.png\" width=\"90%\"><\/p>\n<p><em>Figure 7: Extending the linear model.<\/em><\/p>\n<p data-selectable-paragraph=\"\"><em>Note: Linear functions are expensive to use, and the scoring and training of a linear model with a pairwise interaction would go from O(n) to O(n\u00b2). However, you could perform feature extraction to overcome this problem (feature extraction is beyond the scope of this article, but will be something I discuss in a future article).<\/em><\/p>\n<p data-selectable-paragraph=\"\">Let\u2019s code this in python, I am going to leverage the scitkit-learn\u00a0<em>PolynomialFeatures\u00a0<\/em>class, and you can read more about it in the\u00a0<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.PolynomialFeatures.html\" target=\"_blank\" rel=\"noopener noreferrer\">documentation<\/a>:<\/p>\n<div>\n<pre>import numpy as np\r\nfrom sklearn.preprocessing import PolynomialFeatures\r\n\r\n# creating dummy dataset\r\nX = np.arange(10).reshape(5, 2)\r\nX.shape\r\n&gt;&gt;&gt; (5, 2)\r\n\r\n# interactions between features only\r\ninteractions = PolynomialFeatures(interaction_only=True)\r\nX_interactions= interactions.fit_transform(X)\r\nX_interactions.shape\r\n&gt;&gt;&gt; (5, 4)\r\n\r\n# polynomial features \r\npolynomial = PolynomialFeatures(5)\r\nX_poly = polynomial.fit_transform(X)\r\nX_poly.shape\r\n&gt;&gt;&gt; (5, 6)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p data-selectable-paragraph=\"\"><em>This article was heavily inspired by the book\u00a0<a href=\"https:\/\/www.amazon.co.uk\/Feature-Engineering-Machine-Learning-Principles-ebook\/dp\/B07BNX4MWC\" target=\"_blank\" rel=\"noopener noreferrer\">Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists<\/a>, which I&#8217;d definitely recommend reading. Though it was published in 2016, it is still very informative and clearly explained, even for those without a mathematical background.<\/em><\/p>\n<p>\u00a0<\/p>\n<h3>Conclusion<\/h3>\n<p>\u00a0<\/p>\n<p data-selectable-paragraph=\"\">There we have it. In this article, we discussed techniques to deal with numerical features, such as quantization, power transformations, feature scaling, and interaction features (which can be applied to various data types). This is by no means the be-all and end-all of feature engineering, and there is always much more to learn on a daily basis. Feature engineering is an art and will take practice, so now that you have the intuition, you are ready to begin practicing.<\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/feature-engineering-for-numerical-data-e20167ec18\" target=\"_blank\" rel=\"noopener noreferrer\">Original<\/a>. Reposted with permission.<\/p>\n<p>\u00a0<\/p>\n<p><strong>Bio:<\/strong>\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/kurtispykes\/\" target=\"_blank\" rel=\"noopener noreferrer\">Kurtis Pykes<\/a>\u00a0is a Machine Learning Engineer Intern at Codehouse. He is passionate about harnessing the power of machine learning and data science to help people become more productive and effective.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2020\/09\/feature-engineering-numerical-data.html<\/p>\n","protected":false},"author":0,"featured_media":1337,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1336"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=1336"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1336\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/1337"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=1336"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=1336"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=1336"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}