[June 11th, 2020: edited for grammar/typos, and to generally replace the phrase “last two weeks” with “previous two weeks”]

The COVID-19 data visualizations that I’ve seen tend to be top-10 based, or search-box based, inviting the user to focus on a small, exclusive area, or State or national statistics, which don’t convey an image of whether COVID-19 spread is winding down across the country, or if there may still be new infections and hot spots.

If we look only at large cities or scale our view to the number of COVID-19 cases nation-wide or per-State, the effects on less-population-dense areas of the country get obscured. If COVID-19 cases are on the rise anywhere at all, even in a place with low population density, then it only takes spreaders traveling to more-population dense areas, and cases will rise again. It’s the “ember” theory: as long as there is an un-managed spark out there somewhere, the risk of reigniting the COVID-19 fire is present.

There are alternate views being expressed, that forced isolation and lock-downs are actions of either an excess of caution, or of ulterior goals to confiscate freedoms.  For better or worse, those theories are about to be tested.  Arguments about predictions can’t begin approach the weight of the evidence that is due within a couple of weeks after thousands of Americans went out and gathered in close proximity, in great numbers. We need not predict anything, because the experiment has already been conducted. We only need to observe results.

The Data

In March 2020, I was handed a reference to COVID-19 data being curated by the New York Times , and I thought “I’ll get back to that.”

The New York Times data for the US and territories is organized by county-equivalents, along the same lines as the US Bureau Of The Census divides things, with some adjustments. For example, the counties containing the 5 boroughs of New York City are listed together as “New York City”, pieces of counties around Kansas City, MO, are divided between “in KC” and “not  in KC”, Alameda County, CA includes a cruise ship, Guam includes an aircraft carrier, etc.  I’ll just refer to county-equivalents as “counties”.

Of course, the 3,138 county-case-counts considered here for each day of data  have diverse origins, the numbers may drift as testing becomes more prevalent, etc. Whether or not it precisely reflects every case in a given county can’t be known, but I think it’s probable that when COVID-19 comes to town, people notice, and some sort of metric emerges.

Is The Disease Spreading, Or Receding?

The following 2 graphs show daily counts from counties across the US, categorized by their average cases-per-day over the previous two weeks. The first graph shows the 695 counties across the US that have population which is more than their State’s average county population.

The graphs below are somewhat complex, and I arrived at these visualizations over about a week of thinking about it, so I’ll do my best to catch you up. Consider the first, “Large” county graph:

  • The vertical axis shows number of counties
  • Each color band shows the number of counties with a given average new-cases-per-day count.
  • The horizontal axis is time, in days, from the first COVID-19 reports in the data set
  • The categories are stacked, so the total height is always 695, the total number of “Large” counties being counted.
  • The Blue Band at the bottom is number of counties with no reported data.
  • The Red Band, 2nd from the bottom, is number of counties with reports, which don’t have 2 weeks of data yet.
  • The Gold Band, 3rd from the bottom, is number of counties with no new cases for 2 weeks
  • All bands above the gold band are numbers of counties with new COVID-19 cases, showing average cases-per-day for the previous 14 days, grouped by 20s
      • For example
        • the green band represents the number of counties which averaged more-than-zero to 20 new cases per day, over the previous 14 days.
        • the orange band shows 20-40 cases per day over the previous 14 days
        • etc.

Clicking the graphs below will open a new tab with a full-sized, interactive version with a legend.

The above “large county” graph shows several important things, as it reaches June 6th.

  • Firstly, only a negligible number of large US counties are not reported in this data set.
  • Second, the data has matured to the point that there are almost no large counties which have never had COVID-19. The red band shows counties which only started reporting any cases within the last 14 days. The Red Band represents disease spread to previously uninfected counties.
  • Third, the Gold Band, a count of counties with no new cases for two weeks, has not grown since mid April. The Gold Band represents recovery. When it displaces all of the bands above it, the pandemic has ended, in the United States and its territories. Of course, the risk of outside cases will still exists for some time.

Below is a similar graph with “small” US county-equivalents.

The picture for “small” counties is different. Small counties are almost 4 times as numerous as large counties. Several troubling indications are apparent as it reaches June 6.

  • The Red Band (new disease spread) continues to have area, meaning that COVID-19 has been steadily spreading to previously-uninfected counties within the previous two weeks.
  • The colored bands above occupy a growing proportion of the height, meaning that the number of counties with new cases each day has been growing.
  • The Gold Band (recovery) is shrinking over the last week and more (I confirmed this in the source data. On June 4th, there were 372 new small counties with no new cases over 14 days, down from the high count of 397 on May 28th)

All of the graphs in this post should reflect new data when I update the Google Sheet that drives them, but next I’d like to make a truly live graph, and then do all of the same with global numbers from Our World In Data.

Content – Copyright 2020 – D. A. Whinery 

Comments are closed.