Introduction

This R markdown document describes a portion of the data analysis for a reporting project examining the effects of climate-change driven temperature increases on the health of people who live in cities. The project was done in partnership with the University of Maryland Philip Merrill College of Journalism, Capital News Service, the Howard Center for Investigative Journalism, NPR, Wide Angle Youth Media and WMAR. It also moved on the Associated Press wire.

For each sentence in the story “Heat & Inequality: In Baltimore, the burden of rising temperatures isn’t shared” based on Howard Center data analysis, this document provides the original fact, the code and code output that support that fact, and an explanation where necessary.

Here are links to stories in the series published by participating organizations:

CNSMaryland

NPR

WMAR

Associated Press

Line-By-Line Fact-Check

Fact: 11-Day Heat Wave [cq]

“So as a dangerous 11-day heat wave tormented the city in July, the hottest month ever recorded on the planet, fewer and fewer residents were going outside.”

Explanation [cq]

Over an 11-day period in July 2019 – July 12 through July 22 – each day during that stretch had a maximum temperature of at least 90 degrees F and maximum heat index values of at least 92 degrees F. This was measured at a National Weather Service monitoring station located in the Inner Harbor.

Supporting code and output [cq]

 dmh %>%
  filter(month == 7,
         year == 2019,
         day >= 12,
         day <= 22) %>%
  group_by(`date`) %>%
  summarise(min_temp = min(avg_hourly_temperature_dmh),
            max_temp = max(avg_hourly_temperature_dmh),
            mean_temp = mean(avg_hourly_temperature_dmh),
            min_heat_index = min(avg_hourly_heat_index_dmh),
            max_heat_index = max(avg_hourly_heat_index_dmh),
            mean_heat_index = mean(avg_hourly_heat_index_dmh)
  ) %>%
  select(date, max_temp, max_heat_index)

Fact: Temperature inside and outside Tammy Jackson’s home [cq]

“Can’t even put your head out the door,” said Tammy Jackson, 48, on a day when the temperature outside hit 100 degrees Fahrenheit and 92 degrees in her home. “This is too much. Oh Lord, this is too much.”

Explanation [cq]

Tammy Jackson, a resident of Baltimore’s McElderry Park neighborhood, had a temperature and humidity sensor installed in her house by our reporters. In her home, the highest temperature reading on Saturday, July 20, the day referenced in the story, was 92 degrees (91.6 degrees F).

For outdoor temperature, in most of our analysis, we used hourly snapshot readings taken at a National Weather Service monitoring station in the Inner Harbor. As the table below shows, the highest hourly reading in that data set on July 20 was 99 degrees. But the NWS official daily summary for that monitoring station – which incorporates all readings taken, not just hourly snapshots – indicates it did hit 100 degrees at some point on July 20. The data can be viewed here.

Supporting code and output [cq]

# Max temperature in Jackson's House on July 20

tammy_day_hourly_averages %>%
  mutate(date = date(date_hour)) %>%
  filter(date == "2019-07-20") %>%
  group_by(date) %>%
  summarise(max_indoor_temperature = max(mean_indoor_temperature))
# Max temperature outside on July 20
dmh %>%
 filter(month == 7,
         year == 2019,
        day == 20) %>%
  group_by(`date`) %>%
  summarise(max_temp = max(avg_hourly_temperature_dmh))

Fact: Outdoor temperature in Baltimore on August 1, 2019 at 11:20 a.m. [cq]

“The graphic below shows temperatures in the 500 block of North Milton Street in McElderry Park at 11:20 a.m. on a summer morning in August.”

Explanation [cq]

The date and time referenced in the graphic was 11:20 a.m. on August 1, 2019. To get the outdoor temperature we used the NWS hourly snapshots at the Inner Harbor monitoring station. In the 11 a.m. hour, it was 87.1 degrees.

Supporting code and output [cq]

dmh %>%
 filter(month == 8,
         year == 2019,
         day == 1,
         hour == 11) %>%
  select(date, avg_hourly_temperature_dmh)

Fact: Average annual temperature increase in U.S., Baltimore [cq]

“Average annual temperatures in Baltimore have gone up more than 3 degrees over the last century, nearly twice as much as the rest of the country.”

Explanation [cq]

To compute this average annual temperature change over the last century, we pulled historical average annual temperatures between 1895 and 2018, and the amount each annual average departed from the overall mean temperature during that period. The data for Baltimore and the U.S. came from the National Climactic Data Center. We used linear regression to examine the trend and produce a line of best fit through plotted points. We took the slope of the line of best fit and multiplied it by the number of years to arrive at the temperature increase over that period for Baltimore (3.4 degrees F), the U.S. (1.9 degrees F) and the difference between the two. Baltimore increased by 3.4 degrees F, 80 percent (nearly twice as much) as the U.S. (1.9 percent). A note: we got the idea to use this method from the excellent Washington Post interactive “2°C: BEYOND THE LIMIT”, which arrived at the same figures as we did for Baltimore and the U.S.

Supporting code and output [cq]

# Create linear model showing trend of change in annual average temperature change 1895-2018 for the US
us_lm <- lm(anomaly ~ date, data = us_change_temp)

# Get the slope of the line of best fit from the linear model
us_slope <- tidy(us_lm) %>%
  filter(term == "date") %>%
  mutate(location = "us") %>%
  select(location, estimate) %>%
  rename(slope = estimate)

# Get the number of yearly observations from original data
us_years_count <- us_change_temp %>%
  # group_by(date) %>%
  summarise(years_count= n()) %>%
  mutate(location = "us") %>%
  select(location, years_count)

# Calculate degrees change
us_historical_change <- us_years_count %>%
  left_join(us_slope, by = "location") %>%
  mutate(change_degrees = years_count * slope)

# Create linear model showing trend of change in annual average temperature change 1895-2018 for Baltimore
baltimore_lm <- lm(anomaly ~ date, data = baltimore_change_temp)

# Get the slope of the line of best fit from the linear model
baltimore_slope <- tidy(baltimore_lm) %>%
  filter(term == "date") %>%
  mutate(location = "baltimore") %>%
  select(location, estimate) %>%
  rename(slope = estimate)

# Get the number of yearly observations from original data
baltimore_years_count <- baltimore_change_temp %>%
  # group_by(date) %>%
  summarise(years_count= n()) %>%
  mutate(location = "baltimore") %>%
  select(location, years_count)

# Calculate degrees change
baltimore_historical_change <- baltimore_years_count %>%
  left_join(baltimore_slope, by = "location") %>%
  mutate(change_degrees = years_count * slope)

# Bind together US and Baltimore
us_baltimore <- bind_rows(baltimore_historical_change, us_historical_change)

# Show change degrees and diference between two values as percentage
difference <- us_baltimore %>%
  select(location, change_degrees) %>%
  tidyr::spread(location, change_degrees) %>%
  mutate(difference = (baltimore-us)/us)

difference

Fact: Projected increase in hot days in Baltimore [cq]

“And the planet’s warming has gained momentum, say researchers who estimate the number of very hot days in Baltimore could increase six-fold by the middle of the century.”

Explanation [cq]

The Union of Concerned Scientists [Killer Heat in the United States] study released this summer included detailed tables projecting the number of days of 100+ heat index days for most U.S. cities, and projected how many additional days of 100+ heat index days there would be by mid-century. They found there were six days of 100+ heat index days per year historically in Baltimore, and by mid-century there would be 37 days, a 6x increase. Note: our own analysis of historical Baltimore heat index data at a NWS Inner Harbor monitoring station found that the 6 days per year figure was likely a conservative estimate.

Supporting code and output [cq]

heat_index_projections %>%
  select(state, city, historical_100_plus, midcentury_no_action_100_plus) %>%
  filter(state == "MD", city == "Baltimore") %>%
  mutate(increase_x = midcentury_no_action_100_plus/historical_100_plus)

Fact: Heat index on first floor of Tammy Jackson’s home [cq]

“Photo caption: The heat index on the first floor of Tammy Jackson’s McElderry Park home registered 93 degrees, 9 degrees hotter than it was outside, at 10 p.m. Sunday, July 21. Jackson has several grandchildren with asthma.”

Explanation [cq]

Tammy Jackson, a resident of Baltimore’s McElderry Park neighborhood, had a temperature and humidity sensor installed in her house by our reporters. In her home, at 10 p.m. on Sunday, July 21, the heat index was 93 degrees F, 9 degrees hotter than the outside heat index of 84 degrees.

Supporting code and output [cq]

tammy_day_hourly_averages %>%
  mutate(date = date(date_hour)) %>%
  mutate(hour = hour(date_hour)) %>%
  filter(date == "2019-07-21") %>%
  filter(hour == 22) %>%
  select(date, mean_indoor_heat_index, mean_outdoor_heat_index, indoor_heat_index_difference)

Fact: 8 Degree F difference between hottest and coolest neighborhoods [cq]

“Researchers at Portland State University in Oregon and the Science Museum of Virginia have mapped these areas, called urban heat islands, and data shows that temperatures here and in surrounding neighborhoods can run 8 degrees F hotter than in communities that have more trees and less pavement.”

Explanation [cq]

The researchers took detailed measurements of Baltimore’s urban heat island on August 29, 2018, and found differences across the city. Using the mean afternoon temperature for each neighborhood (excluding the representation of Leakin Park in our data – we found the coolest neighborhood was Dickeyville (91 degrees F) and the hottest was McElderry Park (99.4 degrees F), a difference of 8.4 degrees F.

Supporting code and output [cq]

nsa_tree_temp %>%
  select(nsa_name, temp_mean_aft) %>%
  filter(nsa_name != "gwynns falls/leakin park") %>%
  filter((temp_mean_aft == min(temp_mean_aft)) | (temp_mean_aft == max(temp_mean_aft))) %>%
  arrange(desc(temp_mean_aft)) %>%
  tidyr::spread(nsa_name, temp_mean_aft) %>%
  mutate(difference_coolest_hottest = `mcelderry park`-`dickeyville`)

Fact: Heat and poverty, life expectancy, crime, unemployment [cq]

“Graphic Caption: People who live in the hottest parts of the city are more likely to be poor, to live shorter lives, and to experience higher rates of violent crime and unemployment.”

Explanation [cq]

In Baltimore’s “community statistical areas”, we examined the relationship between heat (mean afternoon temperature in our urban heat island data) and poverty, life expectancy, unemployment rates and violent crime by computing the correlation coefficient (r) for each metric. An r of 1 would indicate a perfect positive linear relationship and an r of -1 would indicate a perfect negative linear relationship and 0 indicating no relationship. There were moderate correlations with poverty (r =.4), violent crime (r=.58), life expectancy (r=-.41), and the unemployment rate (r=.32).

A table with the values used in the graphic is also displayed below. The same table is used in the scatterplot graphics that appear lower in the story.

Supporting code and output [cq]

# Correlation coefficients
csa_tree_temp_demographics %>%
  select_if(is.numeric) %>%
  as.matrix() %>%
  correlate() %>%
  focus(matches("temp_")) %>%
  mutate(variable=rowname) %>%
  select(variable, temp_mean_aft) %>%
  filter(str_detect(variable, "households_living|violent|life|unemployment"))
# Graphic table
csa_tree_temp_demographics %>%
  select(csa2010, temp_mean_aft, matches("households_living|violent|life|unemployment"))

Fact: McElderry Park is the hottest neighborhood in Baltimore [cq]

“McElderry Park, which despite its lyrical name offers little green space, is one of these: the hottest neighborhood in Baltimore, a city whose climate has long been classified as humid subtropical.”

Explanation [cq]

Using the mean afternoon temperature in the urban heat island study, McElderry Park was the city’s hottest neighborhood, with a mean afternoon temperature of 99.4 degrees F.

Supporting code and output [cq]

nsa_tree_temp %>%
  select(nsa_name, temp_mean_aft) %>%
  arrange(desc(temp_mean_aft))

Fact: Heat and chronic illness [cq]

“Residents in the hottest areas have higher rates of chronic illnesses affected by heat, including asthma and COPD.”

Explanation [cq]

We used condition prevalence rates from inpatient hospital admissions and emergency room visits, which are only available by ZIP code, and the median afternoon temperature of Baltimore ZIP codes, as calculated in the urban heat island data to examine relationships. The correlation coefficient for people diagnosed with asthma during hospital visits (r=.56) and emergency room visits (r=.49), and people diagnosed with COPD during hospital visits (r=.67) and emergency room visits (r=.57), indicated a moderate to strong positive relationship between heat and asthma and copd rates. This is not causal. In hotter areas, rates of asthma and copd are higher, and vice versa.

Supporting code and output [cq]

ip_full_zip_correlation_matrix %>%
    filter(str_detect(rowname, "asthma|copd")) %>%
    select(rowname, temp_median_aft)
op_er_full_zip_correlation_matrix %>%
    filter(str_detect(rowname, "asthma|copd")) %>%
    select(rowname, temp_median_aft)

Fact: Frequency of EMS Calls Increase with Temperature [cq]

“In hot weather, emergency medical calls for some chronic conditions increase. The rate of emergency medical calls for cardiac arrest and congestive heart failure, for example, nearly double when the heat index hits 103 degrees.”

Explanation [cq]

By merging a dataset of EMS calls with a dataset of hourly Baltimore heat index values, we were able to determine the heat index at the time of each EMS call during summer 2018. We looked at select conditions affected by heat, and compared how call rates changed when it was very hot – over 103 heat index, a level defined by NWS as "dangerous – and when the heat index was under 80. The second and third columns below reflect the number of hours that passed between calls (on average) when the heat index was below 80 degrees or above 103 degrees.

For example, when the heat index was above 103, there was a call for cardiac arrest every 2.94 hours. When the heat index was under 80, the calls happened less frequently, every 5.3 hours. That meant there were 80 percent more calls per day in very hot weather for cardiac arrest. There were 70 percent more calls for congestive heart failure. There were 345x more calls for heat exhaustion in hot weather and 19x more calls for dehydration.

Supporting code and output [cq]

# Select conditions
conditions <- c("Heat Exhaustion/Heat Stroke", "Dehydration","Respiratory Distress", "COPD (Emphysema/Chronic Bronchitis)", "End Stage Renal Disease", "Diabetic Hyperglycemia", "Diabetic Hypoglycemia", "Cardiac Arrest", "CHF (Congestive Heart Failure)")

# Calculate the total number of hours over the course of Summer 2018 that the heat index fell into each heat index level, as defined by the national weather service: not unsafe (under 80), caution (80-89), extreme caution (90-102), danger (103-124).   

heat_index_count_per_nws_five_scale_bucket <- dmh_ems %>%
  select(heat_index_nws_five_scale_bucket) %>%
  group_by(heat_index_nws_five_scale_bucket) %>%
  summarise(heat_index_count_per_nws_five_scale_bucket=n()) %>%
  arrange(heat_index_nws_five_scale_bucket)

# For each target condition, calculate the number of hours between calls at each temperature level.  This metric allows us to account for the fact that simply counting calls in each bucket would be flawed, because it wouldn't adjust for the rarity of very hot temperatures.

EMS_all %>%
  filter(primary_impression_group %in% conditions) %>%
  group_by(primary_impression_group, adjusted_heat_index_nws_five_scale_bucket) %>%
  summarise(condition_calls_count_per_bucket=n()) %>%
  inner_join(heat_index_count_per_nws_five_scale_bucket, by = c("adjusted_heat_index_nws_five_scale_bucket" = "heat_index_nws_five_scale_bucket")) %>%
  mutate(hours_per_call = heat_index_count_per_nws_five_scale_bucket/condition_calls_count_per_bucket) %>%
  select(primary_impression_group, adjusted_heat_index_nws_five_scale_bucket, hours_per_call) %>%
  tidyr::spread(adjusted_heat_index_nws_five_scale_bucket, hours_per_call) %>%
  select(primary_impression_group, `not_unsafe_under_80`,`danger_103_124`) %>%
  mutate(`calls_per_day_under_80` = 24/`not_unsafe_under_80`) %>%
  mutate(`calls_per_day_over_103` = 24/`danger_103_124`) %>%
  mutate(difference_day_percent = ((`calls_per_day_over_103`-`calls_per_day_under_80`)/`calls_per_day_under_80`))

Fact: Hot neighborhoods have lower incomes [cq]

“The city’s hottest areas are poorer, which means the residents don’t have the resources to move out.”

Explanation [cq]

In Baltimore’s “community statistical areas”, we examined the relationship between heat (mean afternoon temperature in our urban heat island data) and poverty. There were moderate correlations between poverty and heat (r =.4).

Supporting code and output [cq]

# Correlation coefficients
csa_tree_temp_demographics %>%
  select_if(is.numeric) %>%
  as.matrix() %>%
  correlate() %>%
  focus(matches("temp_")) %>%
  mutate(variable=rowname) %>%
  select(variable, temp_mean_aft) %>%
  filter(str_detect(variable, "family_households"))

Fact: More affluent communities have more trees [cq]

“The streets have fewer trees than those in more affluent communities.”

Explanation [cq]

In Baltimore “community statistical areas”, there is a moderate negative correlation between an area’s tree canopy cover and the area’s poverty rate, with a correlation coefficient (r) of -.34. Generally, the poorer the area, the fewer trees it will have, and vice versa.

Supporting code and output [cq]

 csa_tree_temp_demographics %>%
  select_if(is.numeric) %>%
  as.matrix() %>%
  correlate() %>%
  focus(matches("15_lid_mean")) %>%
  mutate(variable=rowname) %>%
  select(variable, `15_lid_mean`) %>%
  filter(variable=="percent_of_family_households_living_below_the_poverty_line")

Fact: Crime rates higher in poorer neighborhoods [cq]

“Crime rates are higher, so many people won’t put an air-conditioning unit in a first-floor window for fear of break-ins.”

Explanation [cq]

In Baltimore “community statistical areas”, there is a moderate positive correlation between an area’s violent crime rate and the area’s poverty rate, with a correlation coefficient (r) of .43, where 1 would reference a perfect positive correlation. Generally, the poorer the area, the more violent crime, and vice versa.

Supporting code and output [cq]

 csa_tree_temp_demographics %>%
  select_if(is.numeric) %>%
  as.matrix() %>%
  correlate() %>%
  focus(matches("percent_of_family_households_living_below_the_poverty_line")) %>%
  mutate(variable=rowname) %>%
  select(variable, percent_of_family_households_living_below_the_poverty_line) %>%
  filter(str_detect(variable, "violent")) %>%
  rename(poverty_rate = percent_of_family_households_living_below_the_poverty_line)

Fact: Highest temperature, humidity sensor values [cq]

“Reporters from the University of Maryland’s Howard Center for Investigative Journalism and Capital News Service placed sensors that record heat and humidity inside several homes in McElderry Park and nearby neighborhoods. Those sensors recorded temperatures that reached as high as 97 degrees and heat index values of 119 degrees.”

Explanation [cq]

In the homes referenced in this story where we placed sensors, the highest heat index reading we captured was 119 degrees, and the highest temperature reading was 96.8 degrees.

Supporting code and output [cq]

# bind together four sensor data sets
all_sensors_day_minute_averages <- bind_rows(michael_day_minute_averages, tammy_day_minute_averages, stephanie_day_minute_averages, audrey_day_minute_averages)

# return the highest temperature
all_sensors_day_minute_averages %>%
  filter(mean_indoor_temperature == max(mean_indoor_temperature)) %>%
  select(date_hour_minute, mean_indoor_temperature)
# return the highest heat index
all_sensors_day_minute_averages %>%
  filter(mean_indoor_heat_index == max(mean_indoor_heat_index)) %>%
  select(date_hour_minute, mean_indoor_heat_index)

Fact: Hotter inside than outside [cq]

"In some homes, those readings showed that it was hotter inside than outside. At 4 p.m. Saturday, July 20, the temperature in Baltimore hit 99 degrees F, but the humidity made it feel like 108 degrees F. Here is what the combination of heat and humidity felt like inside three East Baltimore rowhouses at 4 p.m. that day:

  • Michael Thomas and Alberta Wilkerson, second floor sensor. Temperature: 96. Relative humidity: 52%. What it felt like: 109.
  • Audrey DeWitt, first floor sensor. Temperature: 87. Relative humidity: 52%. What it felt like: 91.
  • Stephanie Pingley, second floor sensor. Temperature: 95. Relative humidity: 53%. What it felt like: 107."

Explanation [cq]

The outdoor heat, humidity and heat index were calculated using data from the Inner Harbor NWS monitoring station. Sensors we placed in their homes were used to calculate the indoor temperatures, heat index and humidity.

Supporting code and output [cq]

# temperature and heat index at 4 p.m. Saturday, July 20, 2019
dmh %>%
 filter(month == 7,
         year == 2019,
         day == 20,
         hour == 16) %>%
  select(date, hour, avg_hourly_temperature_dmh, avg_hourly_heat_index_dmh)
# Michael and Alberta
michael_day_hourly_averages %>%
  mutate(date = date(date_hour)) %>%
  mutate(hour = hour(date_hour)) %>%
  filter(date == "2019-07-20") %>%
  filter(hour == 16) %>%
  select(date, hour, mean_indoor_temperature, mean_indoor_relative_humidity, mean_indoor_heat_index)
# Audrey
audrey_day_hourly_averages %>%
  mutate(date = date(date_hour)) %>%
  mutate(hour = hour(date_hour)) %>%
  filter(date == "2019-07-20") %>%
  filter(hour == 16) %>%
  select(date, hour, mean_indoor_temperature, mean_indoor_relative_humidity, mean_indoor_heat_index)
# Stephanie
stephanie_day_hourly_averages %>%
  mutate(date = date(date_hour)) %>%
  mutate(hour = hour(date_hour)) %>%
  filter(date == "2019-07-20") %>%
  filter(hour == 16) %>%
  select(date, hour, mean_indoor_temperature, mean_indoor_relative_humidity, mean_indoor_heat_index)

Fact: Tammy Jackson’s house on July 21 [cq]

“On the first floor of Jackson’s two-story rowhouse, the heat index registered 93 degrees at 10 p.m. on Sunday, July 21. Outside, the heat index was 9 degrees lower, 84 degrees.”

Explanation [cq]

The following code shows the values presented in the sentence above.

Supporting code and output [cq]

tammy_day_hourly_averages %>%
  mutate(date = date(date_hour)) %>%
  mutate(hour = hour(date_hour)) %>%
  filter(date == "2019-07-21") %>%
  filter(hour == 22) %>%
  select(date_hour, mean_indoor_heat_index, mean_outdoor_heat_index, indoor_heat_index_difference)

Fact: Hotter inside Stephanie Pingley’s house than outside [cq]

“A sensor inside a bedroom showed that the heat index during the heat wave was consistently higher inside than outside Pingley’s house.”

Explanation [cq]

Between July 16 and July 23, the heat index inside of Pingley’s house was higher than the outside for more hours (108) than hours when the opposite was true (60).

Supporting code and output [cq]

stephanie_day_hourly_averages %>%
  filter(date_hour >= "2019-07-16",
         date_hour < "2019-07-23") %>%
  mutate(difference = case_when(
      indoor_heat_index_difference > 0 ~ "hours hotter inside",
      indoor_heat_index_difference <= 0 ~ "hours hotter outside"
  )) %>%
  group_by(difference) %>%
  summarise(count=n())

Fact: Hottest seven-day stretch of the summer [cq]

“The seven-day stretch between July 16 and July 22 were the hottest consecutive days of the summer.”

Explanation [cq]

The average temperature between July 16 and July 22 was 94.2 degrees, hotter than any period of the summer so far.

Supporting code and output [cq]

dmh %>%
  filter(year == 2019) %>%
  filter(date >= "2019-06-21",
         date <= "2019-09-21")