{"id":3248,"date":"2020-10-09T16:50:06","date_gmt":"2020-10-09T16:50:06","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/10\/09\/web-scraping-product-details-from-sunglass-hut-and-woot\/"},"modified":"2020-10-09T16:50:06","modified_gmt":"2020-10-09T16:50:06","slug":"web-scraping-product-details-from-sunglass-hut-and-woot","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/10\/09\/web-scraping-product-details-from-sunglass-hut-and-woot\/","title":{"rendered":"Web Scraping Product Details from Sunglass Hut and Woot!"},"content":{"rendered":"<div>\n<figure class=\"wp-block-gallery columns-2 is-cropped\"><\/figure>\n<figure class=\"wp-block-image size-large\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/image-10-1-20-at-1615-531232-lB2XpswR-300x151.jpeg 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/image-10-1-20-at-1615-531232-lB2XpswR-600x303.jpeg 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/image-10-1-20-at-1615-531232-lB2XpswR-768x388.jpeg 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/image-10-1-20-at-1615-531232-lB2XpswR-1024x517.jpeg 1024w\" loading=\"lazy\" width=\"1024\" height=\"517\" alt=\"\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/image-10-1-20-at-1615-531232-lB2XpswR-1024x517.jpeg\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68023 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"1024\" height=\"517\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/image-10-1-20-at-1615-531232-lB2XpswR-1024x517.jpeg\" alt=\"\" class=\"wp-image-68023\"><\/figure>\n<p>Sunglasses product details were scraped from the Sunglass Hut and Woot! websites in order to perform an exploratory data analysis (EDA) and to compare the deals on Woot! to the retail prices on Sunglass Hut. The above word cloud was produced using the descriptions of the sunglasses on Sunglass Hut. The code used to scrape and analyze this data may be found on the <a href=\"https:\/\/github.com\/pstarvaggi\/sunglasses.git\">Git Hub<\/a>.<\/p>\n<h2>Web scraping<\/h2>\n<p>The Sunglass Hut website uses Ajax to load more sunglasses on each page when you click a button at the bottom the screen. For this reason Selenium had to be used to interact with this dynamic website. The main brand page was visited, and the &#8220;load more&#8221; button was programed to click until all sunglasses were visible on the page. Then the url to each of those pairs was scraped and saved in a CSV file. This list of urls was then the starting urls in a Scrapy spider that visited each and collected the urls to all the different colors of the same pair. This was necessary because each pair that comes in multiple colors will have different product details for each color. For example, some colors may have polarized lenses, while some may not. Then, each individual pair&#8217;s url was visited and the product details were scraped.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-111722-945186-EwwBTjri-300x169.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-111722-945186-EwwBTjri-600x338.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-111722-945186-EwwBTjri-768x432.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-111722-945186-EwwBTjri-1024x576.png 1024w\" loading=\"lazy\" width=\"1024\" height=\"576\" alt=\"\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-111722-945186-EwwBTjri-1024x576.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68024 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"1024\" height=\"576\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-111722-945186-EwwBTjri-1024x576.png\" alt=\"\" class=\"wp-image-68024\"><\/figure>\n<p>For each pair, the brand, description, name, price, whether it is on sale and by how much, whether the lenses are polarized, frame color, frame material, lens color, lens material, lens technology, shape, url to the product page, and face shape for best look was scraped. <\/p>\n<h2>What are the most expensive brands?<\/h2>\n<p>We break down the median price per pair for each brand.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112340-913104-4vp9hv41-300x177.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112340-913104-4vp9hv41-600x353.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112340-913104-4vp9hv41-768x452.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112340-913104-4vp9hv41-1024x603.png 1024w\" loading=\"lazy\" alt=\"\" width=\"758\" height=\"447\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112340-913104-4vp9hv41-1024x603.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68026 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112340-913104-4vp9hv41-1024x603.png\" alt=\"\" class=\"wp-image-68026\" width=\"758\" height=\"447\"><\/figure>\n<p>How do the price distributions of the most expensive two brands compare?<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112519-310159-4dkfFosa-300x165.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112519-310159-4dkfFosa-600x331.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112519-310159-4dkfFosa-768x423.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112519-310159-4dkfFosa-1024x565.png 1024w\" loading=\"lazy\" alt=\"\" width=\"731\" height=\"404\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112519-310159-4dkfFosa-1024x565.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68027 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112519-310159-4dkfFosa-1024x565.png\" alt=\"\" class=\"wp-image-68027\" width=\"731\" height=\"404\"><\/figure>\n<p>We see that, although Fendi has a higher median price, Bulgari has a few pairs that are extremely expensive. In fact, the most expensive pair on all of the Sunglass Hut website is from Bulgari <\/p>\n<figure class=\"wp-block-image size-large\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112729-211489-uB09vlQV-300x117.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112729-211489-uB09vlQV-600x234.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112729-211489-uB09vlQV-768x300.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112729-211489-uB09vlQV-1024x400.png 1024w\" loading=\"lazy\" width=\"1024\" height=\"400\" alt=\"\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112729-211489-uB09vlQV-1024x400.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68028 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"1024\" height=\"400\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-112729-211489-uB09vlQV-1024x400.png\" alt=\"\" class=\"wp-image-68028\"><\/figure>\n<p>It is also worth noting that Bulgari has many more models available than Fendi, as the next graph demonstrates.<\/p>\n<h2>Let&#8217;s choose a few brands<\/h2>\n<p>For the purpose of an exploratory data analysis, let&#8217;s pick a few brands to analyze. The following graph shows the number of pairs available on Sunglass Hut from each brand. We see that Ray-Ban is far and away the most, followed by Oakley, Vogue and Prada. We will also include Gucci because it is a popular brand, and we will include the Prada Linea Rossa sunglasses along with the Prada sunglasses.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113238-957314-TVXfY747-300x173.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113238-957314-TVXfY747-600x346.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113238-957314-TVXfY747-768x442.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113238-957314-TVXfY747-1024x590.png 1024w\" loading=\"lazy\" alt=\"\" width=\"747\" height=\"430\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113238-957314-TVXfY747-1024x590.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68029 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113238-957314-TVXfY747-1024x590.png\" alt=\"\" class=\"wp-image-68029\" width=\"747\" height=\"430\"><\/figure>\n<figure class=\"wp-block-image size-large is-resized\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113319-055945-teP7MJNJ-300x130.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113319-055945-teP7MJNJ-600x260.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113319-055945-teP7MJNJ-768x333.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113319-055945-teP7MJNJ-1024x444.png 1024w\" loading=\"lazy\" alt=\"\" width=\"736\" height=\"318\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113319-055945-teP7MJNJ-1024x444.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68030 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113319-055945-teP7MJNJ-1024x444.png\" alt=\"\" class=\"wp-image-68030\" width=\"736\" height=\"318\"><\/figure>\n<p>We see that, among these brands, Gucci seems to be the most expensive overall, followed by Prada. Ray-Ban and Oakley seem similarly priced, while Vogue is the cheapest among these brands.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113510-458095-gOIdpXyw-300x150.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113510-458095-gOIdpXyw-600x300.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113510-458095-gOIdpXyw-768x385.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113510-458095-gOIdpXyw-1024x513.png 1024w\" loading=\"lazy\" alt=\"\" width=\"749\" height=\"375\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113510-458095-gOIdpXyw-1024x513.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68031 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-113510-458095-gOIdpXyw-1024x513.png\" alt=\"\" class=\"wp-image-68031\" width=\"749\" height=\"375\"><figcaption>Price by brand for the brands we selected.<\/figcaption><\/figure>\n<h2>Lens polarization<\/h2>\n<p>How does whether or not the lenses are polarized affect the price of the sunglasses. One would assume this feature would result in an increased price. Do the numbers bear this out? From the below graph, we see that for most brands, the polarized sunglasses tend to be more expensive than the non-polarized sunglasses. The notable exception seems to be Gucci, where the median price of polarized sunglasses is less than non polarized. This is due to the fact that many of the most high-end sunglasses are not polarized. We see that the price distribution of polarized sunglasses is much more strongly peaked near its median. In other words, Gucci has some relatively cheaper non-polarized sunglasses and also very expensive non-polarized sunglasses.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115636-131309-xBGqQllC-300x149.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115636-131309-xBGqQllC-600x297.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115636-131309-xBGqQllC-768x380.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115636-131309-xBGqQllC-1024x507.png 1024w\" loading=\"lazy\" alt=\"\" width=\"777\" height=\"385\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115636-131309-xBGqQllC-1024x507.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68033 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115636-131309-xBGqQllC-1024x507.png\" alt=\"\" class=\"wp-image-68033\" width=\"777\" height=\"385\"><\/figure>\n<p>Below is a distribution of the prices among all brands of lenses that are polarized (red, right) and lenses that are not polarized (blue, left). The difference of these distributions was found to be very significant (p-value less than 1e-14) based on the Kolmogorov\u2013Smirnov test.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115721-499441-iXUB9QUV-300x152.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115721-499441-iXUB9QUV-600x304.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115721-499441-iXUB9QUV-768x389.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115721-499441-iXUB9QUV-1024x519.png 1024w\" loading=\"lazy\" alt=\"\" width=\"794\" height=\"400\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115721-499441-iXUB9QUV-1024x519.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68034 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-115721-499441-iXUB9QUV-1024x519.png\" alt=\"\" class=\"wp-image-68034\" width=\"794\" height=\"400\"><\/figure>\n<h2>Further EDA<\/h2>\n<p>Further EDA can be done on this data set. For example, the following graphs give price by frame color as well as price by face shape for best look.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-120902-112401-Cms0oiay-300x150.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-120902-112401-Cms0oiay-600x299.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-120902-112401-Cms0oiay-768x383.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-120902-112401-Cms0oiay-1024x511.png 1024w\" loading=\"lazy\" alt=\"\" width=\"786\" height=\"391\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-120902-112401-Cms0oiay-1024x511.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68035 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-120902-112401-Cms0oiay-1024x511.png\" alt=\"\" class=\"wp-image-68035\" width=\"786\" height=\"391\"><\/figure>\n<figure class=\"wp-block-image size-large is-resized\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121007-033492-WRCNtYD2-300x186.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121007-033492-WRCNtYD2-600x372.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121007-033492-WRCNtYD2-768x476.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121007-033492-WRCNtYD2-1024x635.png 1024w\" loading=\"lazy\" alt=\"\" width=\"675\" height=\"418\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121007-033492-WRCNtYD2-1024x635.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68036 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121007-033492-WRCNtYD2-1024x635.png\" alt=\"\" class=\"wp-image-68036\" width=\"675\" height=\"418\"><\/figure>\n<h2>Woot!<\/h2>\n<figure class=\"wp-block-image size-large\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121108-406325-E1mVYVe8-254x300.png 254w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121108-406325-E1mVYVe8-600x708.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121108-406325-E1mVYVe8-768x906.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121108-406325-E1mVYVe8-868x1024.png 868w\" loading=\"lazy\" width=\"868\" height=\"1024\" alt=\"\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121108-406325-E1mVYVe8-868x1024.png\" data-sizes=\"(max-width: 868px) 100vw, 868px\" class=\"wp-image-68037 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"868\" height=\"1024\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121108-406325-E1mVYVe8-868x1024.png\" alt=\"\" class=\"wp-image-68037\"><\/figure>\n<p>A Scrapy spider was also written to scrape the product details of the sunglasses on Woot!. This data set was then joined with the Sunglass Hut data set when we could find possible matches. Many pairs contain a letter and numerical digit label in the name of the pair which can then be matched on both pages. Then, further exploration can be done to determine if the deals on Woot! are as good as they seem.<\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121626-994307-CQJ7eC2Q-300x128.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121626-994307-CQJ7eC2Q-600x256.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121626-994307-CQJ7eC2Q-768x327.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121626-994307-CQJ7eC2Q-1024x436.png 1024w\" loading=\"lazy\" alt=\"\" width=\"782\" height=\"332\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121626-994307-CQJ7eC2Q-1024x436.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68038 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121626-994307-CQJ7eC2Q-1024x436.png\" alt=\"\" class=\"wp-image-68038\" width=\"782\" height=\"332\"><\/figure>\n<p>The first pair is an easy match and a good deal.<\/p>\n<figure class=\"wp-block-gallery columns-1 is-cropped\">\n<ul class=\"blocks-gallery-grid\">\n<li class=\"blocks-gallery-item\">\n<figure><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121723-833708-zjSaRwVz-300x138.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121723-833708-zjSaRwVz-600x276.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121723-833708-zjSaRwVz-768x353.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121723-833708-zjSaRwVz.png 948w\" loading=\"lazy\" width=\"948\" height=\"436\" alt=\"\" data-id=\"68039\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=68039\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121723-833708-zjSaRwVz.png\" data-sizes=\"(max-width: 948px) 100vw, 948px\" class=\"wp-image-68039 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"948\" height=\"436\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121723-833708-zjSaRwVz.png\" alt=\"\" data-id=\"68039\" data-link=\"https:\/\/nycdatascience.com\/blog\/?attachment_id=68039\" class=\"wp-image-68039\"><\/figure>\n<\/li>\n<\/ul>\n<\/figure>\n<figure class=\"wp-block-image size-large\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121847-476509-lGBoLj4E-300x110.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121847-476509-lGBoLj4E-600x219.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121847-476509-lGBoLj4E-768x280.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121847-476509-lGBoLj4E-1024x374.png 1024w\" loading=\"lazy\" width=\"1024\" height=\"374\" alt=\"\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121847-476509-lGBoLj4E-1024x374.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68041 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"1024\" height=\"374\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-121847-476509-lGBoLj4E-1024x374.png\" alt=\"\" class=\"wp-image-68041\"><\/figure>\n<p>Items 0 through 6 in the above table all correspond to different Wayfarer sunglasses by Ray-Ban of different colors. The exact color from Woot! could not be found on Sunglass Hut. See the images below:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-srcset=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-122037-265762-olBgFTXN-300x111.png 300w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-122037-265762-olBgFTXN-600x222.png 600w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-122037-265762-olBgFTXN-768x285.png 768w, https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-122037-265762-olBgFTXN-1024x380.png 1024w\" loading=\"lazy\" width=\"1024\" height=\"380\" alt=\"\" data-src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-122037-265762-olBgFTXN-1024x380.png\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" class=\"wp-image-68043 lazyload\" src=\"image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\"><img loading=\"lazy\" width=\"1024\" height=\"380\" src=\"https:\/\/nycdsa-blog-files.s3.us-east-2.amazonaws.com\/2020\/10\/patrick-starvaggi\/screen-shot-2020-10-09-at-122037-265762-olBgFTXN-1024x380.png\" alt=\"\" class=\"wp-image-68043\"><\/figure>\n<h2>Best use of this data<\/h2>\n<p>This data is rich enough to explore several features to find sunglasses that you like or to match pairs with Woot! to find deals. It may be useful to do this kind of brand analysis if you are opening a shop or if you are thinking of manufacturing sunglasses. There is also a market for reselling sunglasses on sites like Poshmark and TheRealReal. This data could be used to find deals on Woot! that may be resold on Poshmark or TheRealReal, although a further analysis of sales on those websites would become necessary. <\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/nycdatascience.com\/blog\/student-works\/web-scraping-product-details-from-sunglass-hut-and-woot\/<\/p>\n","protected":false},"author":0,"featured_media":3249,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/3248"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=3248"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/3248\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/3249"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=3248"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=3248"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=3248"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}