{"id":1572,"date":"2020-09-15T12:25:47","date_gmt":"2020-09-15T12:25:47","guid":{"rendered":"https:\/\/data-science.gotoauthority.com\/2020\/09\/15\/visualization-of-covid-19-new-cases-over-time-in-python\/"},"modified":"2020-09-15T12:25:47","modified_gmt":"2020-09-15T12:25:47","slug":"visualization-of-covid-19-new-cases-over-time-in-python","status":"publish","type":"post","link":"https:\/\/wealthrevelation.com\/data-science\/2020\/09\/15\/visualization-of-covid-19-new-cases-over-time-in-python\/","title":{"rendered":"Visualization Of COVID-19 New Cases Over Time In Python"},"content":{"rendered":"<div id=\"post-\">\n<p><b>By <a href=\"https:\/\/www.linkedin.com\/in\/jasonbowlingoh\/\" target=\"_blank\" rel=\"noopener noreferrer\">Jason Bowling<\/a>, Manager, Network Communications at University of Akron<\/b><\/p>\n<div>\n<a href=\"https:\/\/i.ibb.co\/p3B8VZR\/bowling-covid-full.png\" rel=\"noopener noreferrer\" target=\"_blank\"><img src=\"https:\/\/i.ibb.co\/KjLXzr8\/bowling-covid-smaller.jpg\" alt=\"Figure\" width=\"100%\"><\/a><br \/><span><\/p>\n<p>Heat map of new COVID-19 cases per 100K of population, by day (click to enlarge)<\/p>\n<p><\/span>\n<\/div>\n<p>\u00a0<\/p>\n<p>This heat map shows the progression of the COVID-19 pandemic in the United States over time. The map is read from left to right, and color coded to show the relative numbers of new cases by state, adjusted for population.<\/p>\n<p>This visualization was inspired by a similar heat map that I saw on a discussion forum thread. I could never locate the source, as it was only a pasted image with no link. The original version was also crafted to make a political point, separating states by predominate party affiliation, which I was not as interested in. I was fascinated by how it concisely showed the progression of the pandemic, so I decided to create a similar visualization myself that I could update regularly.<\/p>\n<p>Source code is hosted on my\u00a0<a href=\"https:\/\/github.com\/JasonRBowling\/covid19NewCasesPer100KHeatmap\" rel=\"noopener noreferrer\" target=\"_blank\">Github repo<\/a>. If you are just interested in seeing updated versions of this heat map, I publish them weekly on my\u00a0<a href=\"https:\/\/twitter.com\/JRBowling\" rel=\"noopener noreferrer\" target=\"_blank\">Twitter feed<\/a>. It\u2019s important to note that you should be careful comparing graphs from one week to another to each other, as the color map may change as new data is included. Comparisons are only valid within a given heatmap.<\/p>\n<p>The script relies on pandas, numpy, matplotlib, and seaborn.<\/p>\n<p>The data comes from the\u00a0<a href=\"https:\/\/github.com\/nytimes\/covid-19-data\" rel=\"noopener noreferrer\" target=\"_blank\">New York Times COVID-19 Github repo<\/a>. A simple launcher script clones the latest copy of the repository and copies the required file, and then launches the Python script to create the heat map. Only one file is really needed, so it could certainly be tightened up, but this works.<\/p>\n<div>\n<pre>echo \"Clearing old data...\"\r\nrm -rf covid-19-data\/\r\nrm us-states.csv\r\necho \"Getting new data...\"\r\ngit clone https:\/\/github.com\/nytimes\/covid-19-data\r\necho \"Done.\"\r\n\r\ncp covid-19-data\/us-states.csv .\r\necho \"Starting...\"\r\n\r\npython3 heatmap-newcases.py\r\necho \"Done.\"<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>The script first loads a CSV file containing the state populations into a dictionary, which is used to scale daily new case results. The new cases are computed for each day from the running total in the NY Times data, and then\u00a0<a href=\"https:\/\/www.robertniles.com\/stats\/percap.shtml\" rel=\"noopener noreferrer\" target=\"_blank\">scaled to new cases per 100,000 people<\/a>\u00a0in the population.<\/p>\n<p>We could display the heat map at that point, but if we do, states with very high numbers of cases per 100,000 people will swamp the detail of the states with lower numbers of cases. Applying a\u00a0<a href=\"http:\/\/onbiostatistics.blogspot.com\/2012\/05\/logx1-data-transformation.html#:~:text=A%3A%20log(x%2B1,in%20which%20x%20was%20measured.\" rel=\"noopener noreferrer\" target=\"_blank\">log(x+1)<\/a>\u00a0transform improves contrast and readability significantly.<\/p>\n<p>Finally, Seaborn and Matplotlib are used to generate the heatmap and save it to an image file.<\/p>\n<p>That\u2019s it! Feel free to use this as a framework for your own visualization. You can customize it to zero in on areas of interest.<\/p>\n<p>Full source code is below. Thanks for reading, and I hope you found it useful.<\/p>\n<div>\n<pre>import numpy as np\r\nimport seaborn as sns\r\nimport matplotlib.pylab as plt\r\nimport pandas as pd\r\nimport csv\r\nimport datetime\r\n\r\nreader = csv.reader(open('StatePopulations.csv'))\r\n\r\nstatePopulations = {}\r\nfor row in reader:\r\n    key = row[0]\r\n    if key in statePopulations:\r\n        pass\r\n    statePopulations[key] = row[1:]\r\n\r\nfilename = \"us-states.csv\"\r\nfullTable = pd.read_csv(filename)\r\nfullTable = fullTable.drop(['fips'], axis=1)\r\nfullTable = fullTable.drop(['deaths'], axis=1)\r\n\r\n# generate a list of the dates in the table\r\ndates = fullTable['date'].unique().tolist()\r\nstates = fullTable['state'].unique().tolist()\r\n\r\nresult = pd.DataFrame()\r\nresult['date'] = fullTable['date']\r\n\r\nstates.remove('Northern Mariana Islands')\r\nstates.remove('Puerto Rico')\r\nstates.remove('Virgin Islands')\r\nstates.remove('Guam')\r\n\r\nstates.sort()\r\n\r\nfor state in states:\r\n    # create new dataframe with only the current state's date\r\n    population = int(statePopulations[state][0])\r\n    print(state + \": \" + str(population))\r\n    stateData = fullTable[fullTable.state.eq(state)]\r\n\r\n    newColumnName = state\r\n    stateData[newColumnName] = stateData.cases.diff()\r\n    stateData[newColumnName] = stateData[newColumnName].replace(np.nan, 0)\r\n    stateData = stateData.drop(['state'], axis=1)\r\n    stateData = stateData.drop(['cases'], axis=1)\r\n\r\n    stateData[newColumnName] = stateData[newColumnName].div(population)\r\n    stateData[newColumnName] = stateData[newColumnName].mul(100000.0)\r\n\r\n    result = pd.merge(result, stateData, how='left', on='date')\r\n\r\nresult = result.drop_duplicates()\r\nresult = result.fillna(0)\r\n\r\nfor state in states:\r\n    result[state] = result[state].add(1.0)\r\n    result[state] = np.log10(result[state])\r\n    #result[state] = np.sqrt(result[state])\r\n\r\nresult['date'] = pd.to_datetime(result['date'])\r\nresult = result[result['date'] &gt;= '2020-02-15']\r\nresult['date'] = result['date'].dt.strftime('%Y-%m-%d')\r\n\r\nresult.set_index('date', inplace=True)\r\nresult.to_csv(\"result.csv\")\r\nresult = result.transpose()\r\n\r\nplt.figure(figsize=(16, 10))\r\ng = sns.heatmap(result, cmap=\"coolwarm\", linewidth=0.05, linecolor='lightgrey')\r\nplt.xlabel('')\r\nplt.ylabel('')\r\n\r\nplt.title(\"Daily New Covid-19 Cases Per 100k Of Population\", fontsize=20)\r\n\r\nupdateText = \"Updated \" + str(datetime.date.today()) + \r\n    \". Scaled with Log(x+1) for improved contrast due to wide range of values. Data source: NY Times Github. Visualization by @JRBowling\"\r\n\r\nplt.suptitle(updateText, fontsize=8)\r\n\r\nplt.yticks(np.arange(.5, 51.5, 1.0), states)\r\n\r\nplt.yticks(fontsize=8)\r\nplt.xticks(fontsize=8)\r\ng.set_xticklabels(g.get_xticklabels(), rotation=90)\r\ng.set_yticklabels(g.get_yticklabels(), rotation=0)\r\nplt.savefig(\"covidNewCasesper100K.png\")<\/pre>\n<\/div>\n<p>\u00a0<\/p>\n<p>\u00a0<br \/><b>Bio: <a href=\"https:\/\/www.linkedin.com\/in\/jasonbowlingoh\/\" target=\"_blank\" rel=\"noopener noreferrer\">Jason Bowling<\/a><\/b> is Manager of Network Communications at University of Akron. Jason is a proven technology professional with a focus on network administration, security and medical device design. Outstanding troubleshooting skills, excellent written communications, and established project management experience. You can find <a href=\"https:\/\/medium.com\/@kb8rnu\" rel=\"noopener noreferrer\" target=\"_blank\">more of his writing on Medium<\/a>.<\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/visualization-of-covid-19-new-cases-over-time-in-python-8c6ac4620c88\" target=\"_blank\" rel=\"noopener noreferrer\">Original<\/a>. Reposted with permission.<\/p>\n<p><b>Related:<\/b><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>https:\/\/www.kdnuggets.com\/2020\/09\/visualization-covid-19-new-cases-over-time-python.html<\/p>\n","protected":false},"author":0,"featured_media":1573,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1572"}],"collection":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/comments?post=1572"}],"version-history":[{"count":0,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/posts\/1572\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media\/1573"}],"wp:attachment":[{"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/media?parent=1572"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/categories?post=1572"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wealthrevelation.com\/data-science\/wp-json\/wp\/v2\/tags?post=1572"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}