{"id":8265,"date":"2021-05-10T20:05:35","date_gmt":"2021-05-10T20:05:35","guid":{"rendered":"https:\/\/wealthrevelation.com\/data-science\/2021\/05\/10\/essential-linear-algebra-for-data-science-and-machine-learning\/"},"modified":"2021-05-10T20:05:35","modified_gmt":"2021-05-10T20:05:35","slug":"essential-linear-algebra-for-data-science-and-machine-learning","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2021\/05\/10\/essential-linear-algebra-for-data-science-and-machine-learning\/","title":{"rendered":"Essential Linear Algebra for Data Science and Machine Learning"},"content":{"rendered":"<div id=\"post-\">\n   <!-- post_author Benjamin Obi Tayo -->  <\/p>\n<p><img class=\"aligncenter size-full wp-image-126938\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig1-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"90%\"><\/p>\n<p><em>Image by Benjamin O. Tayo.<\/em><\/p>\n<p>Linear Algebra is a branch of mathematics that is extremely useful in data science and machine learning. Linear algebra is\u00a0the most important math skill\u00a0in machine learning. Most machine learning models can be expressed in matrix form. A dataset itself is often represented as a matrix. Linear algebra is used in data preprocessing, data transformation, and model evaluation. Here are the topics you need to be familiar with:<\/p>\n<ul>\n<li>Vectors<\/li>\n<li>Matrices<\/li>\n<li>Transpose of a matrix<\/li>\n<li>Inverse of a matrix<\/li>\n<li>Determinant of a matrix<\/li>\n<li>Trace of a matrix<\/li>\n<li>Dot product<\/li>\n<li>Eigenvalues<\/li>\n<li>Eigenvectors<\/li>\n<\/ul>\n<p>In this article, we illustrate the application of linear algebra in data science and machine learning using the tech stocks dataset, which can be found <a href=\"https:\/\/github.com\/bot13956\/datasets\/blob\/master\/tech-stocks-04-2021.csv\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p>\u00a0<\/p>\n<h3>1. Linear Algebra for Data Preprocessing<\/h3>\n<p>\u00a0<\/p>\n<p><strong>\u00a0<\/strong>We begin by illustrating how linear algebra is used in data preprocessing.<\/p>\n<p><strong>1.1 Import necessary libraries for linear algebra<\/strong><\/p>\n<div>\n<pre>import numpy as np\r\nimport pandas as pd\r\nimport pylab\r\nimport matplotlib.pyplot as plt\r\nimport seaborn as sns\r\n\r\n<\/pre>\n<\/div>\n<p><em>\u00a0<\/em><\/p>\n<p><strong>1.2 Read dataset and display features<\/strong><\/p>\n<div>\n<pre>data = pd.read_csv(\"tech-stocks-04-2021.csv\")\r\ndata.head()\r\n\r\n<\/pre>\n<\/div>\n<p><img class=\"aligncenter size-full wp-image-126940\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig2-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"90%\"><\/p>\n<p><em>\u00a0<strong>Table <\/strong><strong>1<\/strong>. Stock prices for selected stock prices for the first 16 days in April 2021.<\/em><\/p>\n<div>\n<pre>print(data.shape)\r\noutput = (11,5) \r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><em>\u00a0<\/em>The <strong><em>data.shape<\/em><\/strong> function enables us to know the size of our dataset. In this case, the dataset has 5 features (date, AAPL, TSLA, GOOGL, and AMZN), and each feature has 11 observations. <em>Date<\/em> refers to the trading days in April 2021 (up to April 16). AAPL, TSLA, GOOGL, and AMZN are the closing stock prices for Apple, Tesla, Google, and Amazon, respectively.<\/p>\n<p><strong>1.3 Data visualization<\/strong><\/p>\n<p>To perform data visualization, we would need to define <strong><em>column matrices<\/em><\/strong> for the features to be visualized:<\/p>\n<div>\n<pre>x = data['date']\r\ny = data['TSLA']\r\nplt.plot(x,y)\r\nplt.xticks(np.array([0,4,9]), ['Apr 1','Apr 8','Apr 15'])\r\nplt.title('Tesla stock price (in dollars) for April 2021',size=14)\r\nplt.show()\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-full wp-image-126941\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig3-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"90%\"><\/p>\n<p><em><strong>Figure <\/strong><strong>1<\/strong>. Tesla stock price for first 16 days in April 2021.<\/em><\/p>\n<p>\u00a0<\/p>\n<h3>2. Covariance Matrix<\/h3>\n<p>\u00a0<\/p>\n<p>The <strong><em>covariance matrix<\/em><\/strong> is one of the most important matrices in data science and machine learning. It provides information about co-movement (correlation) between features. Suppose we have a features matrix with\u00a0<em>4<\/em>\u00a0features and\u00a0<em>n\u00a0<\/em>observations as shown in\u00a0<strong>Table 2<\/strong>:<\/p>\n<p><img class=\"aligncenter size-full wp-image-126942\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig4-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"90%\"><\/p>\n<p><em><strong>Table 2<\/strong>. Features matrix with 4 variables and n observations.<\/em><\/p>\n<p>To visualize the correlations between the features, we can generate a scatter pairplot:<\/p>\n<div>\n<pre>cols=data.columns[1:5]\r\nprint(cols)\r\noutput = Index(['AAPL', 'TSLA', 'GOOGL', 'AMZN'], dtype='object')\r\nsns.pairplot(data[cols], height=3.0)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-full wp-image-126943\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig5-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"90%\"><\/p>\n<p><em><strong>Figure 2<\/strong>. Scatter pairplot for selected tech stocks.<\/em><\/p>\n<p>\u00a0<\/p>\n<p>To quantify the degree of correlation between features (multicollinearity), we can compute the covariance matrix using this equation:<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-126944\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig6-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"350\" height=\"77\"><\/p>\n<p>where \u00a0and \u00a0are the mean and standard deviation of feature , respectively. This equation indicates that when features are standardized, the covariance matrix is simply the <strong><em>dot product<\/em><\/strong> between features.<\/p>\n<p>In matrix form, the covariance matrix can be expressed as a 4 x 4 real and symmetric matrix:<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-126945\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig7-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"350\" height=\"172\"><\/p>\n<p>This matrix can be diagonalized by performing a <strong><em>unitary transformation<\/em><\/strong>, also referred to as Principal Component Analysis (PCA) transformation, to obtain the following:<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-126946\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig8-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"350\" height=\"168\"><\/p>\n<p>Since the <strong><em>trace of a matrix<\/em><\/strong> remains invariant under a unitary transformation, we observe that the sum of the eigenvalues of the diagonal matrix is equal to the total variance contained in features X<sub>1<\/sub>, X<sub>2<\/sub>, X<sub>3<\/sub>, and X<sub>4<\/sub>.<\/p>\n<p><strong>2.1 Computing the covariance matrix for tech stocks<\/strong><\/p>\n<div>\n<pre>from sklearn.preprocessing import StandardScaler\r\nstdsc = StandardScaler()\r\nX_std = stdsc.fit_transform(data[cols].iloc[:,range(0,4)].values)\r\ncov_mat = np.cov(X_std.T, bias= True)\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>Note that this uses the <strong><em>transpose<\/em><\/strong> of the standardized matrix.<\/p>\n<p><strong>2.2 Visualization of covariance matrix<\/strong><\/p>\n<div>\n<pre>plt.figure(figsize=(8,8))\r\nsns.set(font_scale=1.2)\r\nhm = sns.heatmap(cov_mat,\r\n                 cbar=True,\r\n                 annot=True,\r\n                 square=True,\r\n                 fmt='.2f',\r\n                 annot_kws={'size': 12},\r\n                 yticklabels=cols,\r\n                 xticklabels=cols)\r\nplt.title('Covariance matrix showing correlation coefficients')\r\nplt.tight_layout()\r\nplt.show()\r\n\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><img class=\"aligncenter size-full wp-image-126947\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig9-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"90%\"><\/p>\n<p><em><strong>Figure 3<\/strong>. Covariance matrix plot for selected tech stocks.<\/em><\/p>\n<p>We observe from Figure 3 that AAPL correlates strongly with GOOGL and AMZN, and weakly with TSLA. TSLA correlates generally weakly with AAPL, GOOGL and AMZN, while AAPL, GOOGL, and AMZN correlate strongly among each other.<\/p>\n<p><strong>2.3 Compute eigenvalues of the covariance matrix<\/strong><\/p>\n<div>\n<pre>np.linalg.eigvals(cov_mat)\r\noutput = array([3.41582227, 0.4527295 , 0.02045092, 0.11099732])\r\nnp.sum(np.linalg.eigvals(cov_mat))\r\noutput = 4.000000000000006\r\nnp.trace(cov_mat)\r\noutput = 4.000000000000001 \r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>We observe that the trace of the covariance matrix is equal to the sum of the eigenvalues as expected.<\/p>\n<p><strong>2.4 Compute the cumulative variance<\/strong><\/p>\n<p>Since the trace of a matrix remains invariant under a unitary transformation, we observe that the sum of the eigenvalues of the diagonal matrix is equal to the total variance contained in features X<sub>1<\/sub>, X<sub>2<\/sub>, X<sub>3<\/sub>, and X<sub>4<\/sub>. Hence, we can define the following quantities:<\/p>\n<p><strong>\u00a0<img loading=\"lazy\" class=\"aligncenter size-full wp-image-126948\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig10-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"350\" height=\"155\"><\/strong><\/p>\n<p>Notice that when <em>p<\/em> = 4, the cumulative variance becomes equal to 1 as expected.<\/p>\n<div>\n<pre>eigen = np.linalg.eigvals(cov_mat)\r\ncum_var = eigen\/np.sum(eigen)\r\nprint(cum_var)\r\noutput = [0.85395557 0.11318237 0.00511273 0.02774933]\r\n\r\nprint(np.sum(cum_var))\r\noutput = 1.0\r\n\r\n<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p><em>\u00a0<\/em>We observe from the cumulative variance (<strong><em>cum_var<\/em><\/strong>) that 85% of the variance is contained in the first eigenvalue and 11% in the second. This means when PCA is implemented, only the first two principal components could be used, as 97% of the total variance is contributed by these 2 components. This can essentially reduce the dimensionally of the feature space from 4 to 2 when PCA is implemented.<\/p>\n<p>\u00a0<\/p>\n<h3>3. Linear Regression Matrix<\/h3>\n<p>\u00a0<\/p>\n<p>Suppose we have a dataset that has 4 predictor features and\u00a0<em>n<\/em>\u00a0observations, as shown below.<\/p>\n<p><img class=\"aligncenter size-full wp-image-126949\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig11-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"90%\"><\/p>\n<p><em><strong>Table 3<\/strong>. Features matrix with 4 variables and n observations. Column 5 is the target variable (y).<\/em><\/p>\n<p>We would like to build a multi-regression model for predicting the\u00a0<em>y<\/em>\u00a0values (column 5). Our model can thus be expressed in the form<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-126950\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig12-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"317\" height=\"155\"><\/p>\n<p>In matrix form, this equation can be written as<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-126951\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig13-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"179\" height=\"78\"><\/p>\n<p>where\u00a0<strong>X<\/strong>\u00a0is the ( n x 4) features matrix,\u00a0<strong>w<\/strong>\u00a0is the (4 x 1) matrix representing the regression coefficients to be determined, and\u00a0<strong>y<\/strong>\u00a0is the (n x 1) matrix containing the n observations of the target variable y.<\/p>\n<p>Note that\u00a0<strong>X<\/strong>\u00a0is a rectangular matrix, so we can\u2019t solve the equation above by taking the inverse of\u00a0<strong>X<\/strong>.<\/p>\n<p>To convert\u00a0<strong>X<\/strong>\u00a0into a square matrix, we multiple the left-hand side and right-hand side of our equation by the\u00a0<strong><em>transpose<\/em><\/strong><em>\u00a0<\/em>of\u00a0<strong>X<\/strong>, that is<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-126952\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig14-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"284\" height=\"71\"><\/p>\n<p>This equation can also be expressed as<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-126954\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig16-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"228\" height=\"71\"><\/p>\n<p>where<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-126953\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig15-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"184\" height=\"56\"><\/p>\n<p>is the (4\u00d74) regression matrix. Clearly, we observe that\u00a0<strong>R<\/strong>\u00a0is a real and symmetric matrix. Note that in linear algebra, the transpose of the product of two matrices obeys the following relationship<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-126955\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig17-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"289\" height=\"73\"><\/p>\n<p>Now that we\u2019ve reduced our regression problem and expressed it in terms of the (4\u00d74) real, symmetric, and invertible regression matrix\u00a0<strong>R<\/strong>, it is straightforward to show that the exact solution of the regression equation is then<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-126956\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig18-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"299\" height=\"84\"><\/p>\n<p>Examples of regression analysis for predicting continuous and discrete variables are given in the following:<\/p>\n<p><a href=\"https:\/\/pub.towardsai.net\/linear-regression-basics-for-absolute-beginners-68ed9ff980ae\" target=\"_blank\" rel=\"noopener\">Linear Regression Basics for Absolute Beginners<\/a><\/p>\n<p><a href=\"https:\/\/github.com\/bot13956\/perceptron_classifier\" target=\"_blank\" rel=\"noopener\">Building a Perceptron Classifier Using the Least Squares Method<\/a><\/p>\n<p>\u00a0<\/p>\n<h3>4. Linear Discriminant Analysis Matrix<\/h3>\n<p>\u00a0<\/p>\n<p>Another example of a real and symmetric matrix in data science is the Linear Discriminant Analysis (LDA) matrix. This matrix can be expressed in the form:<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-126957\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/Fig19-essential-linear-algebra-data-science-machine-learning.jpg\" alt=\"\" width=\"223\" height=\"68\"><\/p>\n<p>where\u00a0<strong>S<sub>W<\/sub><\/strong>\u00a0is the within-feature scatter matrix, and\u00a0<strong>S<sub>B\u00a0<\/sub><\/strong>is the between-feature scatter matrix. Since both matrices\u00a0<strong>S<sub>W<\/sub>\u00a0<\/strong>and<strong>\u00a0S<sub>B<\/sub>\u00a0<\/strong>are real and symmetric, it follows that\u00a0<strong>L<\/strong>\u00a0is also real and symmetric. The diagonalization of\u00a0<strong>L<\/strong>\u00a0produces a feature subspace that optimizes class separability and reduces dimensionality. Hence LDA is a supervised algorithm, while PCA is not.<\/p>\n<p>For more details about the implementation of LDA, please see the following references:<\/p>\n<p><a href=\"https:\/\/medium.com\/towards-artificial-intelligence\/machine-learning-dimensionality-reduction-via-linear-discriminant-analysis-cc96b49d2757\" target=\"_blank\" rel=\"noopener\">Machine Learning: Dimensionality Reduction via Linear Discriminant Analysis<\/a><\/p>\n<p><a href=\"https:\/\/github.com\/bot13956\/linear-discriminant-analysis-iris-dataset\" target=\"_blank\" rel=\"noopener\">GitHub repository for LDA implementation using Iris dataset<\/a><\/p>\n<p><a href=\"https:\/\/github.com\/rasbt\/python-machine-learning-book-3rd-edition\" target=\"_blank\" rel=\"noopener\">Python Machine Learning by Sebastian Raschka, 3rd Edition (Chapter 5)<\/a><\/p>\n<p>\u00a0<\/p>\n<h3>Summary<\/h3>\n<p>\u00a0<\/p>\n<p>In summary, we\u2019ve discussed several applications of linear algebra in data science and machine learning. Using the tech stocks dataset, we illustrated important concepts such as the size of a matrix, column matrices, square matrices, covariance matrix, transpose of a matrix, eigenvalues, dot products, etc. Linear algebra is an essential tool in data science and machine learning. Thus, beginners interested in data science must familiarize themselves with essential concepts in linear algebra.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2021\/05\/essential-linear-algebra-data-science-machine-learning.html<\/p>\n","protected":false},"author":0,"featured_media":8266,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8265"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8265"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8265\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8266"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8265"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8265"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8265"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}