This R markdown document describes a portion of the data analysis for a reporting project examining the effects of climate-change driven temperature increases on the health of people who live in cities. The project was done in partnership with the University of Maryland Philip Merrill College of Journalism, Capital News Service, the Howard Center for Investigative Journalism, NPR, Wide Angle Youth Media and WMAR. It also moved on the Associated Press wire.
For each sentence in the story “The Role of Trees: No trees, no shade, no relief as climate heats up” based on Howard Center data analysis, this document provides the original fact, the code and code output that support that fact, and an explanation where necessary.
Here are links to stories in the series published by participating organizations:
CNSMaryland
NPR
WMAR
Associated Press
Note that in some cases, to protect the privacy of nearby residents, we did not include addresses of particular street trees in this memo.
#######################
#### Load Packages ####
#######################
library(tidyverse) # For general data science goodness
library(DescTools) # For %like% operator
library(corrr) # For correlation goodness
library(spelling) # For spell check
# Turn off scientific notation in RStudio (prevents coersion to character type)
options(scipen = 999)
#########################
#### Store Variables ####
#########################
# Common path to output data folder
path_to_data <- "../../data/output-data/"
###################
#### Load Data ####
###################
## Outdoor temperature data
# Inner Harbor temperature data
folder <- "baltimore_weather_stations/"
dmh <- read_csv(paste0(path_to_data, folder, "dmh.csv"))
## Urban heat island, tree canopy, demographics data
folder <- "tree_temp_demographics/"
# Neighborhood geography
nsa_tree_temp <- read_csv(paste0(path_to_data, folder, "nsa_tree_temp.csv"))
# Community statistical area geography
csa_tree_temp_demographics <- read_csv(paste0(path_to_data, folder, "csa_tree_temp_demographics.csv"))
# Blocks geography
blocks_tree_temp_demographics <- read_csv(paste0(path_to_data, folder, "blocks_tree_temp_demographics.csv"))
## Redlining tree canopy calculations
folder <- "redlining_trees/"
redlining_tree <- read_csv(paste0(path_to_data, folder, "redlining_tree.csv"))
## Street trees
folder <- "street_trees/"
# Street trees categorized by neighborhood
street_trees_nsa_categorized <- read_csv(paste0(path_to_data, "street_trees/street_trees_nsa_categorized.csv"))
# Street trees summarized by neighborhood
street_trees_nsa_summarized <- read_csv(paste0(path_to_data, "street_trees/street_trees_nsa_summarized.csv"))
“He needs a lot of water too, working in the summer heat here at the edge of the Broadway East neighborhood, on one of the city’s hottest — and poorest — blocks.”
The scene described in the story took place on a block in Broadway East on North Milton Ave between Oliver Street and East Federal Street, which is in U.S. census “block” with the ID 245100803011000. With a mean afternoon temperature of 98.3 degrees in an August 2018 urban heat island study showing block-by-block variations in temperatures, this was the 236 hottest block in the city, out of 13,598 blocks. To get data on poverty within a reasonable margin of error, we have to go to a larger level of geography. This block is located inside the Clifton-Berea “community statistical area.” In this CSA – one of 55 in the city – 28 percent of households are below the poverty line, which is the 10th highest poverty rate in the city.
# Block of interest ranked by heat
blocks_tree_temp_demographics %>%
select(geoid10, temp_mean_aft) %>%
mutate(rank = rank(-temp_mean_aft)) %>%
filter(geoid10 == "245100803011000")
# Total number of city blocks
blocks_tree_temp_demographics %>%
summarise(count=n())
# CSA ranked by poverty
csa_tree_temp_demographics %>%
mutate(rank = rank(-percent_of_family_households_living_below_the_poverty_line)) %>%
filter(csa2010 == "clifton-berea") %>%
select(matches("percent_of_family_households_living_below_the_poverty_line|csa2010|rank"))
# Total number of CSAs
csa_tree_temp_demographics %>%
summarise(count=n())
“The city’s poorest areas tend to have less tree canopy than wealthier areas, a pattern that is especially pronounced on the concrete-dense east side, in neighborhoods like Broadway East.”
There is a moderate negative correlation between a “community statistical areas” poverty rate and the amount of tree cover it had in 2015 (r = -.34). In other words, places with a high poverty rate will have fewer trees, in general, and vice versa. Broadway East illustrates this. Most of the neighborhood is divided between two CSAs – Greenmount East and Clifton-Berea. Greenmount East is 14th (of 55) for poverty in the city, and has less tree canopy than 40 (of 55) areas. Clifton-Berea is 10th for poverty and has less tree canopy than 48 neighborhoods.
# Build correlation matrix between poverty and tree canopy
csa_tree_temp_demographics %>%
select(perc_below_poverty = percent_of_family_households_living_below_the_poverty_line,
avg_canopy_2015 = `15_lid_mean`) %>%
as.matrix() %>%
correlate() %>%
mutate(variable=rowname) %>%
select(variable, perc_below_poverty) %>%
filter(variable == "avg_canopy_2015")
# Rank of tree cover and poverty rate for Clifton-Berea and Greenmount East (which holds most of Broadway East)
csa_tree_temp_demographics %>%
mutate(poverty_rank = rank(-percent_of_family_households_living_below_the_poverty_line),
canopy_rank = rank(-`15_lid_mean`)) %>%
filter(str_detect(csa2010,"clifton-berea|greenmount")) %>%
select(matches("percent_of_family_households_living_below_the_poverty_line|csa2010|rank"))
# Total CSAs
csa_tree_temp_demographics %>%
summarise(count=n())
The graphic generated below appears in the story, with the following headline and subhead: “In Baltimore, poorer areas have less tree canopy. Areas with more people living below the poverty line generally have less tree cover.”
The head and subhead are based on the analysis in the previous heading.
# Select CSAs to label
target_csas <- c("greenmount east", "clifton-berea", "greater roland park/poplar hill")
# Poverty to canopy GRAPH
csa_tree_temp_demographics %>%
# Start ggplot and set x and y for entire plot
ggplot(aes(
x = percent_of_family_households_living_below_the_poverty_line/100,
y = `15_lid_mean`
)) +
# This section for the basic scatterplot
geom_point(aes(color = `15_lid_mean`),
size=4) +
# This section for circling all sample neighborhood points
geom_point(data = csa_tree_temp_demographics %>%
filter(csa2010 %in% target_csas),
aes(color = `15_lid_mean`),
size=6, shape = 1) +
# This section shows the trend line
geom_smooth(se = FALSE, # Removes gray banding
method = glm,
color = "black") +
# This section for labeling Canton, etc.
ggrepel::geom_label_repel(data = csa_tree_temp_demographics %>%
filter(csa2010 %in% target_csas) %>%
mutate(csa2010 = case_when(
csa2010 == "greenmount east" ~ "Greenmount East \n(includes part of Broadway East)",
csa2010 == "clifton-berea" ~ "Clifton-Berea \n(includes part of Broadway East)",
csa2010 == "greater roland park/poplar hill" ~ "Greater Roland Park/Poplar Hill",
T ~ csa2010)),
aes(label = csa2010),
label.size =.25,
min.segment.length = .1,
segment.alpha = .5,
alpha = .85,
nudge_x = .05,
nudge_y = .06) +
# Colors and label formatting follow
#coord_flip() +
scale_colour_gradient(low = "#E0FEA9", high = "#144A11") +
labs(title = "",
subtitle = "",
x = "Percent of households living below the poverty line",
y = "Percent of land covered by trees") +
scale_x_continuous(label = scales::percent_format(accuracy = 1.0),
breaks = seq(0, 1, .1)) +
scale_y_continuous(label = scales::percent_format(accuracy = 1.0),
breaks = seq(0, 1, .1)) +
theme_bw() +
theme(legend.position = "none",
plot.title = element_blank(),
plot.subtitle = element_blank(),
axis.title=element_text(size=16,face="bold"),
axis.text=element_text(size=16)
)