This R markdown document describes a portion of the data analysis for a reporting project examining the effects of climate-change driven temperature increases on the health of people who live in cities. The project was done in partnership with the University of Maryland Philip Merrill College of Journalism, Capital News Service, the Howard Center for Investigative Journalism, NPR, Wide Angle Youth Media and WMAR. It also moved on the Associated Press wire.
For each sentence in the stories “As Rising Heat Bakes U.S. Cities, The Poor Often Feel It Most” and “How High Heat Can Impact Mental Health” based on Howard Center data analysis, this document provides the original fact, the code and code output that support that fact, and an explanation where necessary.
Here are links to stories in the series published by participating organizations:
CNSMaryland
NPR
WMAR
Associated Press
#######################
#### Load Packages ####
#######################
# For debugging rm(list=ls())
library(tidyverse) # For general data science goodness
library(corrr) # For correlation goodness
library(lubridate) # For working with that datetime
library(spelling) # spellcheck
# Turn off scientific notation in RStudio (prevents coersion to character type)
options(scipen = 999)
#########################
#### Store Variables ####
#########################
#### Common path to output data folder ####
path_to_data <- "../../data/output-data/"
###################
#### Load Data ####
###################
### Outdoor temperature data
# Inner Harbor temperature data
folder <- "baltimore_weather_stations/"
dmh <- read_csv(paste0(path_to_data, folder, "dmh.csv"))
### Urban heat island, tree canopy, demographics data
folder <- "tree_temp_demographics/"
# Neighborhood geography
nsa_tree_temp <- read_csv(paste0(path_to_data, folder, "nsa_tree_temp.csv"))
# Community statistical area geography
csa_tree_temp_demographics <- read_csv(paste0(path_to_data, folder, "csa_tree_temp_demographics.csv"))
### Hospital Data
folder <- "hospital_data/"
# Inpatient admissions data
ip_full_zip_medicaid_correlation_matrix <- read_csv(paste0(path_to_data, folder, "ip/ip_full_zip_medicaid_correlation_matrix.csv"))
# Emergency room admissions data
op_er_full_zip_medicaid_correlation_matrix <- read_csv(paste0(path_to_data, folder, "op_er/op_er_full_zip_medicaid_correlation_matrix.csv"))
### EMS Data
folder <- "ems/"
dmh_ems <- read_csv(paste0(path_to_data, folder, "dmh_ems.csv"))
EMS_all <- read_csv(paste0(path_to_data, folder, "EMS_all.csv"))
Using the urban heat island measurements taken in August 2018 showing block-by-block temperature variations, we calculated median afternoon temperatures for each of the city’s 278 neighborhoods. At 95.4 degrees, Franklin Square was warmer than 67 percent (about two-thirds) of Baltimore’s neighborhoods. The city’s coolest neighborhood – “Gwynns Falls/Leakin Park” – was 89.3 degrees F, 6.1 degrees less than Franklin Square.
# Rank, neighborhoods cooler
nsa_tree_temp %>%
select(nsa_name, temp_median_aft) %>%
mutate(rank = dense_rank(temp_median_aft), pct_nhoods_cooler = (rank/max(rank))*100) %>%
filter(nsa_name == "franklin square")
# Difference between Franklin Square and coolest
nsa_tree_temp %>%
select(nsa_name, temp_median_aft) %>%
filter((nsa_name == "franklin square") | (temp_median_aft == min(temp_median_aft))) %>%
spread(nsa_name, temp_median_aft) %>%
mutate(difference = `franklin square` - `gwynns falls/leakin park`)
Franklin Square is entirely contained by a “community statistical area” known as “Southwest Baltimore”. The poverty rate here is 36 percent, or more than one-third. There are 55 CSAs in Baltimore. Southwest Baltimore has the 51st highest poverty rate, making it poorer than more than 90 percent of CSAs.
csa_tree_temp_demographics %>%
select(`csa2010`, percent_of_family_households_living_below_the_poverty_line) %>%
mutate(rank = dense_rank(percent_of_family_households_living_below_the_poverty_line), pct_nhoods_wealthier = (rank/max(rank))) %>%
filter(`csa2010` == "southwest baltimore")
In Baltimore’s “community statistical areas”, we examined the relationship between heat (median afternoon temperature in our urban heat island data) and poverty. An r of 1 would indicate a perfect positive linear relationship, an r of -1 would indicate a perfect negative linear relationship, and 0 would indicate no relationship. There were moderate positive correlations between heat poverty (r =.38). In general, the hotter the neighborhood, the higher the poverty rate, and vice versa.
csa_tree_temp_demographics %>%
select_if(is.numeric) %>%
as.matrix() %>%
correlate() %>%
focus(matches("temp_")) %>%
mutate(variable=rowname) %>%
select(variable, temp_median_aft) %>%
filter(variable=="percent_of_family_households_living_below_the_poverty_line")
Using median afternoon temperature in a 2018 study by researchers analyzing the city’s urban heat island, there was a 9.98 degree difference between the city’s hottest neighborhood (McElderry Park) and coolest neighborhood (Gwynns Falls/Leakin Park).
# Hottest and coolest neighborhoods
nsa_tree_temp %>%
select(nsa_name, temp_median_aft) %>%
filter((temp_median_aft == min(temp_median_aft)) | (temp_median_aft == max(temp_median_aft)))
# Difference between hottest and coolest
nsa_tree_temp %>%
select(nsa_name, temp_median_aft) %>%
filter((temp_median_aft == min(temp_median_aft)) | (temp_median_aft == max(temp_median_aft))) %>%
spread(nsa_name, temp_median_aft) %>%
mutate(difference = `mcelderry park` - `gwynns falls/leakin park`)
By merging a dataset of EMS calls with a dataset of hourly Baltimore heat index values, we were able to determine the heat index at the time of each EMS call during summer 2018. We looked at select conditions affected by heat, and compared how call rates changed when it was very hot – over 103 heat index, a level defined by NWS as "dangerous – and when the heat index was under 80. The second and third columns below reflect the number of hours that passed between calls (on average) when the heat index was below 80 degrees or above 103 degrees.
As expected, calls for heat stroke increased dramatically. When it was under 80 degrees, there was a call for heat stroke every 480 hours, or about once every three weeks. When it was 103 degrees or higher, there was a call for heat stroke every 1.4 hours.
# Select conditions
conditions <- c("Heat Exhaustion/Heat Stroke")
# Calculate the total number of hours over the course of Summer 2018 that the heat index fell into each heat index level, as defined by the national weather service: not unsafe (under 80), caution (80-89), extreme caution (90-102), danger (103-124).
heat_index_count_per_nws_five_scale_bucket <- dmh_ems %>%
select(heat_index_nws_five_scale_bucket) %>%
group_by(heat_index_nws_five_scale_bucket) %>%
summarise(heat_index_count_per_nws_five_scale_bucket=n()) %>%
arrange(heat_index_nws_five_scale_bucket)
# For each target condition, calculate the number of hours between calls at each temperature level. This metric allows us to account for the fact that simply counting calls in each bucket would be flawed, because it wouldn't adjust for the rarity of very hot temperatures.
EMS_all %>%
filter(primary_impression_group %in% conditions) %>%
group_by(primary_impression_group, adjusted_heat_index_nws_five_scale_bucket) %>%
summarise(condition_calls_count_per_bucket=n()) %>%
inner_join(heat_index_count_per_nws_five_scale_bucket, by = c("adjusted_heat_index_nws_five_scale_bucket" = "heat_index_nws_five_scale_bucket")) %>%
mutate(hours_per_call = heat_index_count_per_nws_five_scale_bucket/condition_calls_count_per_bucket) %>%
select(primary_impression_group, adjusted_heat_index_nws_five_scale_bucket, hours_per_call) %>%
tidyr::spread(adjusted_heat_index_nws_five_scale_bucket, hours_per_call) %>%
select(primary_impression_group, `not_unsafe_under_80`,`danger_103_124`)
By merging a dataset of EMS calls with a dataset of hourly Baltimore heat index values, we were able to determine the heat index at the time of each EMS call during summer 2018. We looked at select conditions affected by heat, and compared how call rates changed when it was very hot – over 103 heat index, a level defined by NWS as “dangerous” – and when the heat index was under 80. The third and fourth columns below reflect the number of hours that passed between calls (on average) when the heat index was below 80 degrees or above 103 degrees. The table below reflects the percentage difference (second column) in call rates between the two temperature groupings.
# Select conditions
conditions <- c("Dehydration","Respiratory Distress", "COPD (Emphysema/Chronic Bronchitis)", "Cardiac Arrest", "CHF (Congestive Heart Failure)", "Behavioral/Psychiatric Disorder", "Dehydration", "Chest Pain - STEMI","Hypertension","Substance/Drug Abuse","Withdrawal/Overdose Drugs","Withdrawal/Overdose ETOH")
# Calculate the total number of hours over the course of Summer 2018 that the heat index fell into each heat index level, as defined by the national weather service: not unsafe (under 80), caution (80-89), extreme caution (90-102), danger (103-124).
heat_index_count_per_nws_five_scale_bucket <- dmh_ems %>%
select(heat_index_nws_five_scale_bucket) %>%
group_by(heat_index_nws_five_scale_bucket) %>%
summarise(heat_index_count_per_nws_five_scale_bucket=n()) %>%
arrange(heat_index_nws_five_scale_bucket)
# For each target condition, calculate the number of hours between calls at each temperature level. This metric allows us to account for the fact that simply counting calls in each bucket would be flawed, because it wouldn't adjust for the rarity of very hot temperatures.
EMS_all %>%
filter(primary_impression_group %in% conditions) %>%
group_by(primary_impression_group, adjusted_heat_index_nws_five_scale_bucket) %>%
summarise(condition_calls_count_per_bucket=n()) %>%
inner_join(heat_index_count_per_nws_five_scale_bucket, by = c("adjusted_heat_index_nws_five_scale_bucket" = "heat_index_nws_five_scale_bucket")) %>%
mutate(hours_per_call = heat_index_count_per_nws_five_scale_bucket/condition_calls_count_per_bucket) %>%
select(primary_impression_group, adjusted_heat_index_nws_five_scale_bucket, hours_per_call) %>%
tidyr::spread(adjusted_heat_index_nws_five_scale_bucket, hours_per_call) %>%
select(primary_impression_group, `not_unsafe_under_80`,`danger_103_124`) %>%
mutate(`calls_per_day_under_80` = 24/`not_unsafe_under_80`) %>%
mutate(`calls_per_day_over_103` = 24/`danger_103_124`) %>%
mutate(difference_day_percent = ((`calls_per_day_over_103`-`calls_per_day_under_80`)/`calls_per_day_under_80`)) %>%
select(primary_impression_group, difference_day_percent, everything())
We examined rates of chronic condition prevalence for low-income people in different parts of Baltimore by examining in-patient hospital admissions for people with Medicaid. We discovered that low-income people in different parts of the city had different prevalence rates for chronic conditions affected by heat – asthma, COPD, heart disease and diabetes. And, we found, those differences varied in line with temperature differences in the area in which they lived.
There were moderate to strong positive relationships (copd, r=.75; asthma, r=.52; heart_disease, r=.71; diabetes, r=.5) between a ZIP code’s prevalence rate for chronic medical conditions as diagnosed in inpatient hospital visits and a ZIP code’s median afternoon temperature as measured by urban heat island researchers in August 2018. There were moderate to strong positive relationships (copd, r=.51; asthma, r=.38; heart_disease, r=.44; diabetes, r=.44) between a ZIP code’s prevalence rate for chronic medical conditions as diagnosed in emergency room visits and a ZIP code’s median afternoon temperature as measured by urban heat island researchers in August 2018. That is to say: the higher the neighborhood temperature, the higher the disease rate among the poorest inhabitants, and vice versa. This is not a causal relationship we are describing.
# Inpatient medicaid admissions
ip_full_zip_medicaid_correlation_matrix %>%
filter(str_detect(rowname, "asthma|copd|heart_disease|diabetes")) %>%
select(rowname, temp_median_aft)
# ER medicaid visits
op_er_full_zip_medicaid_correlation_matrix %>%
filter(str_detect(rowname, "asthma|copd|heart_disease|diabetes")) %>%
select(rowname, temp_median_aft)
In 2007, 14.2 percent of the neighborhood was covered by tree canopy in the summer. By 2015, that had increased to 16.1 percent, a 1.9 percentage point increase. In 2015, two-thirds of Baltimore’s 278 neighborhoods had more tree canopy than Franklin Square’s 16 percent coverage.
# change
nsa_tree_temp %>%
select(nsa_name, `avg_canopy_%_07` = `07_lid_mean`, `avg_canopy_15` = `15_lid_mean`, `%_point_change` = lid_change_percent_point) %>%
filter(nsa_name == "franklin square")
# rank
nsa_tree_temp %>%
select(nsa_name, `avg_canopy_15` = `15_lid_mean`) %>%
arrange(nsa_name) %>%
mutate(rank = dense_rank(desc(`avg_canopy_15`)), pct_nhoods_w_more_tree_canopy = (rank/max(rank))) %>%
filter(nsa_name == "franklin square")
By merging a dataset of EMS calls with a dataset of hourly Baltimore heat index values, we were able to determine the heat index at the time of each EMS call during summer 2018. We looked at select conditions affected by heat, and compared how call rates changed when it was very hot – over 103 heat index, a level defined by NWS as “dangerous” – and when the heat index was under 80. The third and fourth columns below reflect the number of hours that passed between calls (on average) when the heat index was below 80 degrees or above 103 degrees. The table below reflects the percentage difference (second column) in call rates between the two temperature groupings.
In Summer 2018, when the heat index was under 80 degrees, there was a medical call for a behavioral and psychiatric disorder every 1.77 hours (1 hour, 46 minutes). When the heat index hit 103 degrees, the rate of calls increased dramatically – to one call every 1.29 hours (1 hour, 17 minutes). That’s an increase in the rate of 29 minutes, or an increase of 37 percent.
# Select conditions
conditions <- c("Behavioral/Psychiatric Disorder")
# Calculate the total number of hours over the course of Summer 2018 that the heat index fell into each heat index level, as defined by the national weather service: not unsafe (under 80), caution (80-89), extreme caution (90-102), danger (103-124).
heat_index_count_per_nws_five_scale_bucket <- dmh_ems %>%
select(heat_index_nws_five_scale_bucket) %>%
group_by(heat_index_nws_five_scale_bucket) %>%
summarise(heat_index_count_per_nws_five_scale_bucket=n()) %>%
arrange(heat_index_nws_five_scale_bucket)
# For each target condition, calculate the number of hours between calls at each temperature level. This metric allows us to account for the fact that simply counting calls in each bucket would be flawed, because it wouldn't adjust for the rarity of very hot temperatures.
EMS_all %>%
filter(primary_impression_group %in% conditions) %>%
group_by(primary_impression_group, adjusted_heat_index_nws_five_scale_bucket) %>%
summarise(condition_calls_count_per_bucket=n()) %>%
inner_join(heat_index_count_per_nws_five_scale_bucket, by = c("adjusted_heat_index_nws_five_scale_bucket" = "heat_index_nws_five_scale_bucket")) %>%
mutate(hours_per_call = heat_index_count_per_nws_five_scale_bucket/condition_calls_count_per_bucket) %>%
select(primary_impression_group, adjusted_heat_index_nws_five_scale_bucket, hours_per_call) %>%
spread(adjusted_heat_index_nws_five_scale_bucket, hours_per_call) %>%
select(primary_impression_group, `not_unsafe_under_80`,`danger_103_124`) %>%
mutate(percent_change = (`not_unsafe_under_80`-`danger_103_124`)/`danger_103_124`) %>%
mutate(`calls_per_day_under_80` = 24/`not_unsafe_under_80`) %>%
mutate(`calls_per_day_over_103` = 24/`danger_103_124`) %>%
mutate(difference_day_percent = ((`calls_per_day_over_103`-`calls_per_day_under_80`)/`calls_per_day_under_80`))
By merging a dataset of EMS calls with a dataset of hourly Baltimore heat index values, we were able to determine the heat index at the time of each EMS call during summer 2018. We looked at select conditions affected by heat, and compared how call rates changed when it was very hot – over 103 heat index, a level defined by NWS as “dangerous” – and when the heat index was under 80. The third and fourth columns below reflect the number of hours that passed between calls (on average) when the heat index was below 80 degrees or above 103 degrees. The table below reflects the percentage difference in call rates between the two temperature groupings.
In Summer 2018, when the heat index was under 80 degrees, there was a medical call for substance abuse every 2.96 hours (nearly 3 hours). When the heat index hit 103 degrees, the rate of calls increased dramatically – to one call every 1.36 hours. That’s an increase in the rate of about an hour and a half, or an increase of 118 percent. A 100 percent increase is a doubling, so it’s fair to say more than doubled.
# Select conditions
conditions <- c("Substance/Drug Abuse")
# Calculate the total number of hours over the course of Summer 2018 that the heat index fell into each heat index level, as defined by the national weather service: not unsafe (under 80), caution (80-89), extreme caution (90-102), danger (103-124).
heat_index_count_per_nws_five_scale_bucket <- dmh_ems %>%
select(heat_index_nws_five_scale_bucket) %>%
group_by(heat_index_nws_five_scale_bucket) %>%
summarise(heat_index_count_per_nws_five_scale_bucket=n()) %>%
arrange(heat_index_nws_five_scale_bucket)
# For each target condition, calculate the number of hours between calls at each temperature level. This metric allows us to account for the fact that simply counting calls in each bucket would be flawed, because it wouldn't adjust for the rarity of very hot temperatures.
EMS_all %>%
filter(primary_impression_group %in% conditions) %>%
group_by(primary_impression_group, adjusted_heat_index_nws_five_scale_bucket) %>%
summarise(condition_calls_count_per_bucket=n()) %>%
inner_join(heat_index_count_per_nws_five_scale_bucket, by = c("adjusted_heat_index_nws_five_scale_bucket" = "heat_index_nws_five_scale_bucket")) %>%
mutate(hours_per_call = heat_index_count_per_nws_five_scale_bucket/condition_calls_count_per_bucket) %>%
select(primary_impression_group, adjusted_heat_index_nws_five_scale_bucket, hours_per_call) %>%
spread(adjusted_heat_index_nws_five_scale_bucket, hours_per_call) %>%
select(primary_impression_group, `not_unsafe_under_80`,`danger_103_124`) %>%
mutate(percent_change = (`not_unsafe_under_80`-`danger_103_124`)/`danger_103_124`) %>%
mutate(`calls_per_day_under_80` = 24/`not_unsafe_under_80`) %>%
mutate(`calls_per_day_over_103` = 24/`danger_103_124`) %>%
mutate(difference_day_percent = ((`calls_per_day_over_103`-`calls_per_day_under_80`)/`calls_per_day_under_80`))
-30-