{"id":8500,"date":"2022-03-21T17:06:38","date_gmt":"2022-03-21T17:06:38","guid":{"rendered":"https:\/\/wealthrevelation.com\/data-science\/2022\/03\/21\/air-quality-forecasting-python-project\/"},"modified":"2022-03-21T17:06:38","modified_gmt":"2022-03-21T17:06:38","slug":"air-quality-forecasting-python-project","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2022\/03\/21\/air-quality-forecasting-python-project\/","title":{"rendered":"Air Quality Forecasting Python Project"},"content":{"rendered":"<div>\n<p><em>You will find the full python code and all visuals for this article <a href=\"https:\/\/gitlab.com\/shivanipadaya\/Air-Quality-Forecasting\/-\/tree\/main\" target=\"_blank\" rel=\"noopener\">here in this gitlab repository<\/a>. The repository contains a series of analysis, transforms and forecasting models frequently used when dealing with time series. The aim of this repository is to showcase how to model time series from the scratch, for this we are using a real usecase dataset<br \/><\/em><\/p>\n<p><span id=\"page3R_mcid2\" class=\"markedContent\"><span dir=\"ltr\" role=\"presentation\">This project foreca<\/span><span dir=\"ltr\" role=\"presentation\">st the <strong>Carbon Dioxide<\/strong> (<strong>Co2<\/strong>) emission levels yearly. Most of the<\/span> <span dir=\"ltr\" role=\"presentation\">organizations have to follow<\/span> <span dir=\"ltr\" role=\"presentation\">government<\/span> <span dir=\"ltr\" role=\"presentation\">norms with respect to Co2 emissions and they h<\/span><span dir=\"ltr\" role=\"presentation\">ave to pay<\/span> <span dir=\"ltr\" role=\"presentation\">charges accordingly,<\/span> <span dir=\"ltr\" role=\"presentation\">so this<\/span> <span dir=\"ltr\" role=\"presentation\">project<\/span> <span dir=\"ltr\" role=\"presentation\">will<\/span> <span dir=\"ltr\" role=\"presentation\">forecast<\/span> <span dir=\"ltr\" role=\"presentation\">the Co2 levels so that organizations can follow<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">norms and pay in advance based on the forecasted values. In any data science<\/span> <span dir=\"ltr\" role=\"presentation\">project<\/span> <span dir=\"ltr\" role=\"presentation\">t<\/span><span dir=\"ltr\" role=\"presentation\">he main<\/span> <span dir=\"ltr\" role=\"presentation\">component is data, for<\/span> <span dir=\"ltr\" role=\"presentation\">this project the data was provided by the company, from here time series<\/span> <span dir=\"ltr\" role=\"presentation\">concept<\/span> <span dir=\"ltr\" role=\"presentation\">comes into the picture. The dataset for this project<\/span> <span dir=\"ltr\" role=\"presentation\">contains 215 entries and two co<\/span><span dir=\"ltr\" role=\"presentation\">mponents<\/span> <span dir=\"ltr\" role=\"presentation\">which are Year and Co2<\/span> <span dir=\"ltr\" role=\"presentation\">emissions which is univariate time series as there is only one dependent<\/span> <span dir=\"ltr\" role=\"presentation\">variable Co2 which depends on time. from year 1800 to year 2014 Co2 lev<\/span><span dir=\"ltr\" role=\"presentation\">els were present in the<\/span> <span dir=\"ltr\" role=\"presentation\">dataset.<\/span><\/span><\/p>\n<p><em>The dataset used: The dataset contains yearly Co2 emmisions levels. data from 1800 to 2014 sampled every 1 year. The dataset is non stationary so we have to use differenced time series for forecasting.<\/em><\/p>\n<p><span id=\"page3R_mcid3\" class=\"markedContent\"><span dir=\"ltr\" role=\"presentation\">After ge<\/span><span dir=\"ltr\" role=\"presentation\">tting da<\/span><span dir=\"ltr\" role=\"presentation\">ta the next step is to analyze the time series data. This process is done <\/span><span dir=\"ltr\" role=\"presentation\">by using <strong>Python<\/strong>. The data was present in excel file so first we need to read that excel file. This tas<\/span><span dir=\"ltr\" role=\"presentation\">k is<\/span> <span dir=\"ltr\" role=\"presentation\">done by<\/span> <span dir=\"ltr\" role=\"presentation\">using <strong>Pandas<\/strong> which<\/span> <span dir=\"ltr\" role=\"presentation\">is python libraries to creates Pandas <strong>Data Frame<\/strong><\/span><span dir=\"ltr\" role=\"presentation\">. After that<\/span> <span dir=\"ltr\" role=\"presentation\">preprocessing like changing data types of time from object to DateTime performed for the coding<\/span> <span dir=\"ltr\" role=\"presentation\">purpose<\/span><span dir=\"ltr\" role=\"presentation\">. Time s<\/span><span dir=\"ltr\" role=\"presentation\">eries contain 4 main components<\/span> <span dir=\"ltr\" role=\"presentation\">Level, Trend, Seasonality and Noise. To study this <\/span><span dir=\"ltr\" role=\"presentation\">component,<\/span> <span dir=\"ltr\" role=\"presentation\">we need to decompose our time series so<\/span> <span dir=\"ltr\" role=\"presentation\">that we can batter understand our time series<\/span> <span dir=\"ltr\" role=\"presentation\">and we can choose the forecasting model acco<\/span><span dir=\"ltr\" role=\"presentation\">rdingly because each<\/span> <span dir=\"ltr\" role=\"presentation\">component<\/span> <span dir=\"ltr\" role=\"presentation\">behave different on<\/span> <span dir=\"ltr\" role=\"presentation\">the model. also by decomposing we can identify that the <strong>time series<\/strong> is multiplicative or additive.<\/span><\/span><\/p>\n<div id=\"attachment_5986\" class=\"wp-caption alignnone\"><img aria-describedby=\"caption-attachment-5986\" loading=\"lazy\" class=\"wp-image-5986 size-full\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2022\/03\/co2-emissions.png\" alt=\"\" width=\"835\" height=\"386\"><\/p>\n<p id=\"caption-attachment-5986\" class=\"wp-caption-text\">CO2 emissions \u2013 plotted via python pandas \/ matplotlib<\/p>\n<\/div>\n<p><span id=\"page3R_mcid3\" class=\"markedContent\">D<span dir=\"ltr\" role=\"presentation\">ecomposing ti<\/span><span dir=\"ltr\" role=\"presentation\">me series using python<\/span> <span dir=\"ltr\" role=\"presentation\">states<\/span><span dir=\"ltr\" role=\"presentation\">models<\/span> <span dir=\"ltr\" role=\"presentation\">libraries we get to know trend, seasonality and<\/span> <span dir=\"ltr\" role=\"presentation\">residual component separately.<\/span> <span dir=\"ltr\" role=\"presentation\">the components multiply together to make the time series<\/span> <span dir=\"ltr\" role=\"presentation\">multiplicative and in additive time series components added<\/span> <span dir=\"ltr\" role=\"presentation\">together<\/span><span dir=\"ltr\" role=\"presentation\">.<\/span><\/span><span id=\"page3R_mcid4\" class=\"markedContent\"> <span dir=\"ltr\" role=\"presentation\">Taking the deep dive to understand the trend component, moving average of 10 steps<\/span> <span dir=\"ltr\" role=\"presentation\">were applied which shows nonlinear upward trend, fit the <strong>linear regression model<\/strong> to check the<\/span> <span dir=\"ltr\" role=\"presentation\">trend<\/span> <span dir=\"ltr\" role=\"presentation\">which shows upward trend.<\/span> <span dir=\"ltr\" role=\"presentation\">talking about seasonality there were combinatio<\/span><span dir=\"ltr\" role=\"presentation\">n of multiple patterns<\/span> <span dir=\"ltr\" role=\"presentation\">over time period which is common in real world time series data. capturing the white noise is difficult<\/span> <span dir=\"ltr\" role=\"presentation\">in this type of data.<\/span> <span dir=\"ltr\" role=\"presentation\">the time series contains values<\/span> <span dir=\"ltr\" role=\"presentation\">from 1800 where the Co2 values<\/span> <span dir=\"ltr\" role=\"presentation\">are less then 1<\/span> <span dir=\"ltr\" role=\"presentation\">because of no human activitie<\/span><span dir=\"ltr\" role=\"presentation\">s so levels were<\/span> <span dir=\"ltr\" role=\"presentation\">decreasing<\/span><span dir=\"ltr\" role=\"presentation\">. By the time numbers of industries and <\/span><span dir=\"ltr\" role=\"presentation\">human activities are<\/span> <span dir=\"ltr\" role=\"presentation\">rapidly<\/span> <span dir=\"ltr\" role=\"presentation\">increasing<\/span> <span dir=\"ltr\" role=\"presentation\">which causes Co2 levels rapidly<\/span> <span dir=\"ltr\" role=\"presentation\">increasing<\/span><span dir=\"ltr\" role=\"presentation\">. In time<\/span> <span dir=\"ltr\" role=\"presentation\">series the<\/span> <span dir=\"ltr\" role=\"presentation\">highest Co2 emission<\/span> <span dir=\"ltr\" role=\"presentation\">level was 18.7 in 1979. It was challenging to decide<\/span> <span dir=\"ltr\" role=\"presentation\">whether<\/span> <span dir=\"ltr\" role=\"presentation\">to con<\/span><span dir=\"ltr\" role=\"presentation\">sider this<\/span> <span dir=\"ltr\" role=\"presentation\">values which are less then 0.5 as white noise or not because 30% of the Co2 values were less then 1,<\/span> <span dir=\"ltr\" role=\"presentation\">in real world looking at current scenar<\/span><span dir=\"ltr\" role=\"presentation\">io the chances of Co2 emission<\/span> <span dir=\"ltr\" role=\"presentation\">level being 0 is near to<\/span> <span dir=\"ltr\" role=\"presentation\">impossible still there are chances that Co2 leve<\/span><span dir=\"ltr\" role=\"presentation\">ls can be 0.0005. So considering each data point as a <\/span><span dir=\"ltr\" role=\"presentation\">valuable information we refused to remove that entries.<\/span><\/span><span id=\"page3R_mcid5\" class=\"markedContent\"><br role=\"presentation\"><\/span><\/p>\n<p><span id=\"page3R_mcid5\" class=\"markedContent\"><span dir=\"ltr\" role=\"presentation\">Next step is to create Lag plot so we can see the correlation between the current year <\/span><span dir=\"ltr\" role=\"presentation\">Co2 level and previous year Co2 level. the plot was linear<\/span> <span dir=\"ltr\" role=\"presentation\">which shows high correlation so we can say<\/span> <span dir=\"ltr\" role=\"presentation\">that the current<\/span> <span dir=\"ltr\" role=\"presentation\">Co2 levels and previous levels<\/span> <span dir=\"ltr\" role=\"presentation\">have strong relationship. the randomness of the data<\/span> <span dir=\"ltr\" role=\"presentation\">were<\/span> <span dir=\"ltr\" role=\"presentation\">measured<\/span> <span dir=\"ltr\" role=\"presentation\">by<\/span> <span dir=\"ltr\" role=\"presentation\">plotting<\/span> <span dir=\"ltr\" role=\"presentation\">autocorrelation graph. the autocorrelation graph shows smooth curves<\/span> <span dir=\"ltr\" role=\"presentation\">which<\/span> <span dir=\"ltr\" role=\"presentation\">indicates<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">time series is non<\/span><span dir=\"ltr\" role=\"presentation\">\u2013<\/span><span dir=\"ltr\" role=\"presentation\">stationary thus next step is to make time series stationary. in <\/span><span dir=\"ltr\" role=\"presentation\">non<\/span><span dir=\"ltr\" role=\"presentation\">\u2013<\/span><span dir=\"ltr\" role=\"presentation\">stationary<\/span> <span dir=\"ltr\" role=\"presentation\">time series, summary<\/span> <span dir=\"ltr\" role=\"presentation\">statistics<\/span> <span dir=\"ltr\" role=\"presentation\">like mean and variance change over time. <\/span><\/span><\/p>\n<p><span id=\"page3R_mcid5\" class=\"markedContent\"><span dir=\"ltr\" role=\"presentation\">To make <\/span><\/span><span id=\"page16R_mcid0\" class=\"markedContent\"><span dir=\"ltr\" role=\"presentation\">time<\/span> <span dir=\"ltr\" role=\"presentation\">series stationary we have to remove trend and seasonality from it. Before<\/span> <span dir=\"ltr\" role=\"presentation\">that we use dickey <\/span><span dir=\"ltr\" role=\"presentation\">fuller test to make sure our time series is non<\/span><span dir=\"ltr\" role=\"presentation\">\u2013<\/span><span dir=\"ltr\" role=\"presentation\">stationar<\/span><span dir=\"ltr\" role=\"presentation\">y. the test was<\/span> <span dir=\"ltr\" role=\"presentation\">done by<\/span> <span dir=\"ltr\" role=\"presentation\">using<\/span> <span dir=\"ltr\" role=\"presentation\">python, and the<\/span> <span dir=\"ltr\" role=\"presentation\">test gives p<\/span><span dir=\"ltr\" role=\"presentation\">\u2013<\/span><span dir=\"ltr\" role=\"presentation\">value as output. here the null <strong>hypothesis<\/strong> is that the data is non<\/span><span dir=\"ltr\" role=\"presentation\">\u2013<\/span><span dir=\"ltr\" role=\"presentation\">stationary while<\/span> <span dir=\"ltr\" role=\"presentation\">alternate<\/span> <span dir=\"ltr\" role=\"presentation\">hypothesis is that the dat<\/span><span dir=\"ltr\" role=\"presentation\">a is stationary, in this case the<\/span> <span dir=\"ltr\" role=\"presentation\">significance<\/span> <span dir=\"ltr\" role=\"presentation\">values is 0.05 and the <strong>p<\/strong><\/span><strong><span dir=\"ltr\" role=\"presentation\">\u2013<\/span><\/strong><span dir=\"ltr\" role=\"presentation\"><strong>value<\/strong>s<\/span> <span dir=\"ltr\" role=\"presentation\">which is given by dickey fuller tes<\/span><span dir=\"ltr\" role=\"presentation\">t is<\/span> <span dir=\"ltr\" role=\"presentation\">greater<\/span> <span dir=\"ltr\" role=\"presentation\">than 0.05 hence we<\/span> <span dir=\"ltr\" role=\"presentation\">failed to reject null hypothesis so<\/span> <span dir=\"ltr\" role=\"presentation\">we can say the time series is non<\/span><span dir=\"ltr\" role=\"presentation\">\u2013<\/span><span dir=\"ltr\" role=\"presentation\">stationery<\/span><span dir=\"ltr\" role=\"presentation\">. Differencing is one of the tec<\/span><span dir=\"ltr\" role=\"presentation\">hniques to make time series <\/span><span dir=\"ltr\" role=\"presentation\">stationary. On this time series, first order differencing<\/span> <span dir=\"ltr\" role=\"presentation\">t<\/span><span dir=\"ltr\" role=\"presentation\">echnique<\/span> <span dir=\"ltr\" role=\"presentation\">applied to make the time series <\/span><span dir=\"ltr\" role=\"presentation\">stationar<\/span><span dir=\"ltr\" role=\"presentation\">y. In first order differencing<\/span> <span dir=\"ltr\" role=\"presentation\">we have to subtract previous value from current value for all<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">data points<\/span><span dir=\"ltr\" role=\"presentation\">. also diffe<\/span><span dir=\"ltr\" role=\"presentation\">rent transformations like log, sqrt and reciprocal were applied in the context<\/span> <span dir=\"ltr\" role=\"presentation\">of making the time series stationary. Smoothing techniques like simple moving<\/span> <span dir=\"ltr\" role=\"presentation\">average<\/span><span dir=\"ltr\" role=\"presentation\">,<\/span> <span dir=\"ltr\" role=\"presentation\">exponential<\/span> <span dir=\"ltr\" role=\"presentation\">weighted<\/span> <span dir=\"ltr\" role=\"presentation\">moving average, simple exponential<\/span> <span dir=\"ltr\" role=\"presentation\">smoothing<\/span> <span dir=\"ltr\" role=\"presentation\">and double exponential s<\/span><span dir=\"ltr\" role=\"presentation\">moothing<\/span> <span dir=\"ltr\" role=\"presentation\">techniques can be applied to remove the variation between time<\/span> <span dir=\"ltr\" role=\"presentation\">stamps<\/span> <span dir=\"ltr\" role=\"presentation\">and to see the smooth curves.<\/span><br role=\"presentation\"><\/span><\/p>\n<p><span id=\"page16R_mcid0\" class=\"markedContent\"><span dir=\"ltr\" role=\"presentation\">Smoothing techniques also used to observe trend in time series as well a<\/span><span dir=\"ltr\" role=\"presentation\">s to predict the future values. B<\/span><span dir=\"ltr\" role=\"presentation\">ut performance of other models was good co<\/span><span dir=\"ltr\" role=\"presentation\">mpared to smoothing<\/span> <span dir=\"ltr\" role=\"presentation\">techniques<\/span><span dir=\"ltr\" role=\"presentation\">.<\/span><\/span><span id=\"page16R_mcid1\" class=\"markedContent\"> <span dir=\"ltr\" role=\"presentation\">First 200 entries taken to train the model and remaining last for testing the<\/span> <span dir=\"ltr\" role=\"presentation\">performance of the model. performance of different models me<\/span><span dir=\"ltr\" role=\"presentation\">a<\/span><span dir=\"ltr\" role=\"presentation\">sured by Root Mean Squared<\/span> <span dir=\"ltr\" role=\"presentation\">Error (<strong>RMSE<\/strong>) and <strong>Mean Absolute Error<\/strong> (<strong>MAE<\/strong>) as we are predictin<\/span><span dir=\"ltr\" role=\"presentation\">g future Co2 emissions<\/span> <span dir=\"ltr\" role=\"presentation\">so basically<\/span> <span dir=\"ltr\" role=\"presentation\">it is regression problem. RMSE is calculated by root of the average of squared difference between<\/span> <span dir=\"ltr\" role=\"presentation\">actual values and predicted values by the model on testing data. Here RMSE values were calculated<\/span> <span dir=\"ltr\" role=\"presentation\">using python sklearn lib<\/span><span dir=\"ltr\" role=\"presentation\">rary.<\/span> <span dir=\"ltr\" role=\"presentation\">For model building two<\/span> <span dir=\"ltr\" role=\"presentation\">approaches<\/span> <span dir=\"ltr\" role=\"presentation\">are there, one is data<\/span><span dir=\"ltr\" role=\"presentation\">\u2013<\/span><span dir=\"ltr\" role=\"presentation\">driven and<\/span> <span dir=\"ltr\" role=\"presentation\">another one is model based. models from both the<\/span> <span dir=\"ltr\" role=\"presentation\">approaches<\/span> <span dir=\"ltr\" role=\"presentation\">were<\/span> <span dir=\"ltr\" role=\"presentation\">applied<\/span> <span dir=\"ltr\" role=\"presentation\">to find the best fitted<\/span> <span dir=\"ltr\" role=\"presentation\">model. ARIMA model gives the best results for this kind of dataset as the model were<\/span> <span dir=\"ltr\" role=\"presentation\">trained<\/span> <span dir=\"ltr\" role=\"presentation\">on<\/span> <span dir=\"ltr\" role=\"presentation\">differenced<\/span> <span dir=\"ltr\" role=\"presentation\">time series. The ARIMA model predicts a given time series based on its own past values.<\/span> <span dir=\"ltr\" role=\"presentation\">It can be used for any non<\/span><span dir=\"ltr\" role=\"presentation\">\u2013<\/span><span dir=\"ltr\" role=\"presentation\">seasonal series of numbers that exhibits patterns and is not a series of<\/span> <span dir=\"ltr\" role=\"presentation\">random events. <strong>ARIMA<\/strong> takes 3 parameters which a<\/span><span dir=\"ltr\" role=\"presentation\">re AR, M<\/span><span dir=\"ltr\" role=\"presentation\">A and the order of difference.<\/span> H<span dir=\"ltr\" role=\"presentation\">yper<\/span> <span dir=\"ltr\" role=\"presentation\">parameter<\/span> <span dir=\"ltr\" role=\"presentation\">tuning<\/span> <span dir=\"ltr\" role=\"presentation\">technique<\/span> <span dir=\"ltr\" role=\"presentation\">gives best parameters for the model by trying different sets of <\/span><span dir=\"ltr\" role=\"presentation\">parameters. Although The autocorrelation and partial autocorrelation plots can be use to decide AR <\/span><span dir=\"ltr\" role=\"presentation\">and MA parameter beca<\/span><span dir=\"ltr\" role=\"presentation\">use partial autocorrelat<\/span><span dir=\"ltr\" role=\"presentation\">ion function s<\/span><span dir=\"ltr\" role=\"presentation\">hows the partial<\/span> <span dir=\"ltr\" role=\"presentation\">correlation of a<\/span> <span dir=\"ltr\" role=\"presentation\">stationary time series with its own lagged values so using PACF we can decide the value of AR and<\/span> <span dir=\"ltr\" role=\"presentation\">from ACF we can decide the value of MA parameter as ACF shows how data points in a ti<\/span><span dir=\"ltr\" role=\"presentation\">me series<\/span> <span dir=\"ltr\" role=\"presentation\">are related.<\/span><\/span><\/p>\n<div id=\"attachment_5985\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-5985\" loading=\"lazy\" class=\"size-full wp-image-5985\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2022\/03\/co2-difference-arima-prediction.png\" alt=\"\" width=\"493\" height=\"285\"><\/p>\n<p id=\"caption-attachment-5985\" class=\"wp-caption-text\">Yearly difference of CO2 emissions \u2013 ARIMA Prediction<\/p>\n<\/div>\n<p><span id=\"page16R_mcid2\" class=\"markedContent\"><span dir=\"ltr\" role=\"presentation\">Apart from ARIMA, few other model were trained which are AR, ARMA, Simple Linear <\/span><span dir=\"ltr\" role=\"presentation\">Regression, Quadratic method, Holts winter exponential smoothing, <strong>Ridge<\/strong> and <strong>Lasso<\/strong><\/span><strong> <span dir=\"ltr\" role=\"presentation\">Regression<\/span><\/strong><span dir=\"ltr\" role=\"presentation\">,<\/span> <span dir=\"ltr\" role=\"presentation\">LGBM and XGboost methods, <strong>Recurrent neural network<\/strong> <strong>(RNN)<\/strong><\/span> <span dir=\"ltr\" role=\"presentation\">\u2013<\/span> <strong><span dir=\"ltr\" role=\"presentation\">Long S<\/span><span dir=\"ltr\" role=\"presentation\">hort Term<\/span> <\/strong><span dir=\"ltr\" role=\"presentation\"><strong>Memory<\/strong> (<strong>LSTM<\/strong>) and<\/span> <span dir=\"ltr\" role=\"presentation\">Fbprophet. I would like to mention my<\/span> <span dir=\"ltr\" role=\"presentation\">experience<\/span> <span dir=\"ltr\" role=\"presentation\">with LSTM here because it is another model which<\/span> <span dir=\"ltr\" role=\"presentation\">gives<\/span> <span dir=\"ltr\" role=\"presentation\">good<\/span> <span dir=\"ltr\" role=\"presentation\">result as ARIMA. the<\/span> <span dir=\"ltr\" role=\"presentation\">reason<\/span> <span dir=\"ltr\" role=\"presentation\">for not choosing LSTM as final model is its complexity. As<\/span> <span dir=\"ltr\" role=\"presentation\">ARIMA is giving<\/span> <span dir=\"ltr\" role=\"presentation\">appropriate<\/span> <span dir=\"ltr\" role=\"presentation\">results<\/span> <span dir=\"ltr\" role=\"presentation\">and it is simple to understand<\/span> <span dir=\"ltr\" role=\"presentation\">and<\/span> <span dir=\"ltr\" role=\"presentation\">requires<\/span> <span dir=\"ltr\" role=\"presentation\">less dependencies.<\/span> <span dir=\"ltr\" role=\"presentation\">while using lstm, lot of data preprocessing and other dependencies required, the dataset was small<\/span> <span dir=\"ltr\" role=\"presentation\">thus we used to train the model on CPU, oth<\/span><span dir=\"ltr\" role=\"presentation\">e<\/span><span dir=\"ltr\" role=\"presentation\">rwise gpu is required to train the LSTM model. we f<\/span><span dir=\"ltr\" role=\"presentation\">ace<\/span> <span dir=\"ltr\" role=\"presentation\">one more challenge in deployment part. the challenge<\/span> <span dir=\"ltr\" role=\"presentation\">is to get the data into original form<\/span> <span dir=\"ltr\" role=\"presentation\">because<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">model was trained on differenced time series, so it will predict the future values in differenced format.<\/span> A<span dir=\"ltr\" role=\"presentation\">fter lot of research on the internet and<\/span> <span dir=\"ltr\" role=\"presentation\">by deeply understanding<\/span> <span dir=\"ltr\" role=\"presentation\">mathemat<\/span><span dir=\"ltr\" role=\"presentation\">ical<\/span> <span dir=\"ltr\" role=\"presentation\">concepts finally we <span id=\"page18R_mcid0\" class=\"markedContent\">got the solution for it. solution for this issue is we have to add previous value from the original data from into first order differencing and then we have to add the last value of this time series into predicted values. <\/span><span id=\"page18R_mcid1\" class=\"markedContent\">To create the user interface streamlit was used, it is commonly used python library. the pickle file of the ARIMA model were used to predict the future values based on user input. The limit for forecasting is the year 2050. The project was uploaded on google cloud platform. so the flow is, first the starting year from which user want to forecast was taken and the end year till which year user want to forecast was taken and then according to the range of this inputs the prediction takes place. so by taking the inputs the pickle file will produce the future Co2 emissions in differenced format, then the values will be converted to original format and then the original values will be displayed on the user interface as well as the interactive line graph were displayed on the interface.<\/span><br \/><\/span><\/span><\/p>\n<p><em>You will find the full python code and all visuals for this article <a href=\"https:\/\/gitlab.com\/shivanipadaya\/Air-Quality-Forecasting\/-\/tree\/main\" target=\"_blank\" rel=\"noopener\">here in this gitlab repository<\/a>.<\/em><\/p>\n<div id=\"author-bio-box\">\n<h3><a href=\"https:\/\/data-science-blog.com\/en\/blog\/author\/shivanipadaya\/\" title=\"All posts by Shivani Padaya\" rel=\"author\">Shivani Padaya<\/a><\/h3>\n<div class=\"bio-gravatar\"><img loading=\"lazy\" data-del=\"avatar\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2022\/03\/shivani_padaya-80x80.jpg\" class=\"avatar pp-user-avatar avatar-70 photo \" height=\"70\" width=\"70\"><\/div>\n<p class=\"bio-description\">I&#8217;m an IT graduate and data science enthusiast who loves to execute data driven solutions and find hidden insights from the data. I enjoy analyzing time series data. I like to read and write data science blogs.<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/data-science-blog.com\/en\/blog\/2022\/03\/20\/air-quality-forecasting-python-project\/<\/p>\n","protected":false},"author":0,"featured_media":8501,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8500"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8500"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8500\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8501"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8500"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8500"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8500"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}