{"id":8367,"date":"2021-07-20T01:32:46","date_gmt":"2021-07-20T01:32:46","guid":{"rendered":"https:\/\/wealthrevelation.com\/data-science\/2021\/07\/20\/coffee-shop-location-predictor\/"},"modified":"2021-07-20T01:32:46","modified_gmt":"2021-07-20T01:32:46","slug":"coffee-shop-location-predictor","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2021\/07\/20\/coffee-shop-location-predictor\/","title":{"rendered":"Coffee Shop Location Predictor"},"content":{"rendered":"<div>\n<p>As part of this article, we will explore the main steps involved in predicting the best location for a coffee shop in Vancouver. We will also take into consideration that the coffee shop is near a transit station, and has no Starbucks near it. Well, while at it, let us also add an extra feature where we make sure the crime in the area is lower.<\/p>\n<p>In this article, we will highlight the main steps involved to predict a location for a coffee shop in <a href=\"https:\/\/vancouver.ca\/\">Vancouver.<\/a> We also want to make sure that the coffee shop is near a transit station, and has no Starbucks near it. As an added feature, we will make sure that the crime concentration in the area is low, and the entire program should be implemented in Python. So let\u2019s walk through the steps.<\/p>\n<ul>\n<li>Get crime history for the last two years<\/li>\n<li>Get locations of all transit stations and Starbucks in Vancouver<\/li>\n<li>Check all the transit stations that do not have any Starbucks near them<\/li>\n<li>Get all the data regarding crimes near the filtered transit stations<\/li>\n<li>Create a grid of all possible coordinates around the transit station<\/li>\n<li>Check crime around each created coordinate and display the top 5 locations.<\/li>\n<\/ul>\n<p>This covers the first two steps required to get data from the internet, both manually and automatically.<\/p>\n<p>We can get crime history for the past 14 years in Vancouver from <a href=\"https:\/\/www.kaggle.com\/wosaku\/crime-in-vancouver\">here<\/a>. This data is in raw crime.csv format, so we have to process it and filter out useless data. We then write this processed information on the crime_processed.csv file.<\/p>\n<p>Note: There are 530,653 records of crime in this file<\/p>\n<p>In this program, we will just use the type and coordinate of the crime. There are many crime types, but we have classified them into three major categories namely;<\/p>\n<p>Theft (<span>red<\/span>), Break and Enter (<span>orange<\/span>) and Mischief (<span>green<\/span>)<\/p>\n<p>These all crimes can be plotted on Graph as displayed below.<\/p>\n<p><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-1.png\"><img loading=\"lazy\" class=\"aligncenter size-full wp-image-5693\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-1.png\" alt=\"\" width=\"1215\" height=\"684\"><\/a><\/p>\n<p>This may seem very congested and full, so let\u2019s see a closeup image for future references.<\/p>\n<p><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-2.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5692\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-2.png\" alt=\"\" width=\"527\" height=\"263\"><\/a><\/p>\n<p>We can get the coordinates of all Transit Stations in Vancouver from <a href=\"https:\/\/opendata.vancouver.ca\/explore\/dataset\/rapid-transit-stations\/information\/\">here<\/a>. This dataset has all coordinates of rapid transit stations in three transit lines in Vancouver. There are a total of 23 of them in Vancouver, we can then use it for further processing.<\/p>\n<p><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-3.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5691 size-large\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-3-1030x506.png\" alt=\"\" width=\"1030\" height=\"506\"><\/a><\/p>\n<p>The Starbucks data is present <a href=\"https:\/\/www.starbucks.ca\/store-locator?map=49.281601,-123.110406,12z&amp;place=vancouver,%20bc\">here<\/a>, we can scrape it easily and get the locations of all the Starbucks in Vancouver. We just need the Starbucks that is near transit stations, so we\u2019ll filter out the rest. There are a total 24 Starbucks in Vancouver, and 10 of them are near Transit Stations.<\/p>\n<p><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-4.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5690 size-large\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-4-1030x707.png\" alt=\"\" width=\"1030\" height=\"707\"><\/a><\/p>\n<p>Note: Other than the coordinates of Transit Stations and Starbucks, we also need coordinates and type of the crime.<\/p>\n<p>As we have all the data required, now moving to the next step. We need to get to the transit Station locations that have no Starbucks near them. For that we can create an area of particular radius around each Transit Station. Then check all Starbucks locations with respect to them, whether they are within that area or not.<\/p>\n<p>If none of the Starbucks are within that particular Transit Station\u2019s area, we can append it to a list. At the end, we have a list of all Transit locations with no Starbucks near them. There are a total of 6 Transit Stations with no Starbucks near them.<\/p>\n<p><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-5.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5689 size-large\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-5-1030x584.png\" alt=\"\" width=\"1030\" height=\"584\"><\/a><\/p>\n<p>Now lets filter out all crime records and get just what we are interested in, which means the crime near Transit stations. For that we will plot an area of specific radius around each of them to see the crimes. These are more than 110,000 crime records.<\/p>\n<p><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-6.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5688 size-large\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-6-1030x525.png\" alt=\"\" width=\"1030\" height=\"525\"><\/a><\/p>\n<p>Now that we have all the Transit Stations that don\u2019t have any Starbucks near them and also the crime near all Transit Stations. So, let\u2019s use this information and get crime near the located Transit Stations. These are about 44,000 crime records.<\/p>\n<p><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-7.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5687 size-large\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-7-1030x560.png\" alt=\"\" width=\"1030\" height=\"560\"><\/a><\/p>\n<p>This may seem correct at first glance, but the points are overlapping due to abundance, so we can create different lists of crimes based on their types.<\/p>\n<h2>Theft<\/h2>\n<p><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-8.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5686 size-large\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-8-1030x607.png\" alt=\"\" width=\"1030\" height=\"607\"><\/a><\/p>\n<h2>Break and Enter<\/h2>\n<p><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-9.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5685 size-large\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-9-1030x585.png\" alt=\"\" width=\"1030\" height=\"585\"><\/a><\/p>\n<h2>Mischief<\/h2>\n<p><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-10.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5684 size-large\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-10-1030x555.png\" alt=\"\" width=\"1030\" height=\"555\"><\/a><\/p>\n<p>Now finally, we have all the prerequisites and let\u2019s get to the main task at hand, predicting the best coordinate for the coffee shop.<\/p>\n<p>There may be many approaches to solve this problem, but the one I used in this program is that I will create a grid of all possible locations (coordinates) in the area of 1 km radius around each located transit station.<\/p>\n<p>Initially I generated 1 coordinate for every m, this resulted in 1000,000 coordinates in every km. This is a huge number, and for the 6 located Transit stations, it becomes 6 Million. It may not seem much at first glance because computers can handle such data in a few seconds.<\/p>\n<p>But for location prediction we need to compare each coordinate with crime coordinates. As the algorithm has to check for ~7,000 Thefts, ~19,000 Break ins, and ~17,000 Mischiefs around each generated coordinate. Computing this would want the program to process an estimate of 432.4 Billion times. This sort of execution takes many hours on normal computers (sometimes days).<\/p>\n<p>The solution to this is to create a coordinate for each 10 m\u00a0area, this results about 10,000 coordinate per km. For the above mentioned number of crimes, the estimated processes will be several Billions. That would significantly reduce the time, but is still not less.<\/p>\n<p>To control this, we can remove the duplicate values in crime coordinates and those which are too close to each other ~1m. Doing so, we are left with just 816 Thefts, 2,654 Break ins, and 8,234 Mischiefs around each generated coordinate.<br \/>The precision will not be affected much but the time and computational resources required will be reduced a lot.<\/p>\n<p>\u00a0<\/p>\n<h2>Checking Crime near Generated coordinates<\/h2>\n<p>Now that we have all the locations, we will start some processing on it and check each coordinate against some constraints. That are respectively;<\/p>\n<ol>\n<li>Filter out Coordinates having Theft near 1 km<br \/>We get 122,000 coordinates with no Thefts (Below merged 1000 to 1)<a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-11.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5683 size-medium\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-11-300x219.png\" alt=\"\" width=\"300\" height=\"219\"><\/a><\/li>\n<li>Filter out Coordinates having Break Ins near 200m<br \/>We get 8000 coordinates with no Thefts (Below merged 1000 to 1)<br \/><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-12.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5682 size-medium\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-12-300x170.png\" alt=\"\" width=\"300\" height=\"170\"><\/a><\/li>\n<li>Filter out Coordinates having Mischief near 200m<br \/>We get 6000 coordinates with no Thefts (Below merged 1000 to 1)<br \/><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-13.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5681 size-medium\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-13-300x180.png\" alt=\"\" width=\"300\" height=\"180\"><\/a>Now that we have 6 Coordinates of best locations that have passed through all the constraints, we will order them.To order them, we will check their distance from the nearest transit location. The nearest will be on top of the list as the best possible location, then the second and so on. The generated List is;\n<ol>\n<li>-123.0419406741792, 49.24824259252004<\/li>\n<li>-123.05887151659479, 49.24327221040713<\/li>\n<li>-123.05287151659476, 49.24327221040713<\/li>\n<li>-123.04994067417924, 49.239242592520064<\/li>\n<li>-123.0419406741792, 49.239242592520064<\/li>\n<li>-123.0409406741792, 49.239242592520064<\/li>\n<\/ol>\n<p><a href=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-14.png\"><img loading=\"lazy\" class=\"aligncenter wp-image-5680 size-medium\" src=\"https:\/\/data-science-blog.com\/en\/wp-content\/uploads\/sites\/4\/2021\/07\/location-predictor-vancouver-image-14-300x153.png\" alt=\"\" width=\"300\" height=\"153\"><\/a><\/p>\n<\/li>\n<\/ol>\n<p>MindTrades Consulting Services, a leading marketing agency provides in-depth analysis and insights for the global IT sector including leading data integration brands such as Diyotta. From Cloud Migration, Big Data, Digital Transformation, Agile Deliver, Cyber Security, to Analytics- Mind trades provides published breakthrough ideas, and prompt content delivery. For more information, refer to <a href=\"https:\/\/mindtrades.com\/\">mindtrades.com<\/a>.<\/p>\n<p><a href=\"https:\/\/github.com\/Mindtrades-Consulting\/Coffee-Shop-Location-Predictor\">https:\/\/github.com\/Mindtrades-Consulting\/Coffee-Shop-Location-Predictor<\/a><\/p>\n<p>\u00a0<\/p>\n<div id=\"author-bio-box\">\n<h3><a href=\"https:\/\/data-science-blog.com\/en\/blog\/author\/mrinalini\/\" title=\"All posts by Mrinalini Sunder\" rel=\"author\">Mrinalini Sunder<\/a><\/h3>\n<div class=\"bio-gravatar\"><img alt=\"\" src=\"https:\/\/secure.gravatar.com\/avatar\/d7990630af545aebca83b207f5504beb?s=70&amp;d=mm&amp;r=g\" class=\"avatar avatar-70 photo\" height=\"70\" width=\"70\" loading=\"lazy\"><\/div>\n<p class=\"bio-description\">I&#8217;m Mrinalini, a content marketing manager with MindTrades, a digital transformation company based in the US. I enjoy writing about data science and have been published in several magazines!<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/data-science-blog.com\/en\/blog\/2021\/07\/12\/coffee-shop-location-predictor\/<\/p>\n","protected":false},"author":0,"featured_media":8368,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8367"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=8367"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/8367\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/8368"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=8367"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=8367"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=8367"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}