This R markdown document describes a portion of the data analysis for a reporting project examining the effects of climate-change driven temperature increases on the health of people who live in cities. The project was done in partnership with the University of Maryland Philip Merrill College of Journalism, Capital News Service, the Howard Center for Investigative Journalism, NPR, Wide Angle Youth Media and WMAR. It also moved on the Associated Press wire.

For each sentence in the story “The Role of Trees: No trees, no shade, no relief as climate heats up” based on Howard Center data analysis, this document provides the original fact, the code and code output that support that fact, and an explanation where necessary.

Here are links to stories in the series published by participating organizations:




Associated Press

Line-by-Line Fact Check [Role of Trees]

Fact: A certain block in Broadway East is one of the city’s hottest [cq]

“He needs a lot of water too, working in the summer heat here at the edge of the Broadway East neighborhood, on one of the city’s hottest — and poorest — blocks.”

Explanation [cq]

The scene described in the story took place on a block in Broadway East on North Milton Ave between Oliver Street and East Federal Street, which is in U.S. census “block” with the ID 245100803011000. With a mean afternoon temperature of 98.3 degrees in an August 2018 urban heat island study showing block-by-block variations in temperatures, this was the 236 hottest block in the city, out of 13,598 blocks. To get data on poverty within a reasonable margin of error, we have to go to a larger level of geography. This block is located inside the Clifton-Berea “community statistical area.” In this CSA – one of 55 in the city – 28 percent of households are below the poverty line, which is the 10th highest poverty rate in the city.

Supporting code and output [cq]

# Block of interest ranked by heat
blocks_tree_temp_demographics %>%
  select(geoid10, temp_mean_aft) %>%
  mutate(rank = rank(-temp_mean_aft)) %>%
  filter(geoid10 == "245100803011000") 
# Total number of city blocks
blocks_tree_temp_demographics %>%
# CSA ranked by poverty
csa_tree_temp_demographics %>%
  mutate(rank = rank(-percent_of_family_households_living_below_the_poverty_line)) %>%
  filter(csa2010 == "clifton-berea") %>%
# Total number of CSAs
csa_tree_temp_demographics %>%

Fact: Poorest areas have less tree canopy [cq]

“The city’s poorest areas tend to have less tree canopy than wealthier areas, a pattern that is especially pronounced on the concrete-dense east side, in neighborhoods like Broadway East.”

Explanation [cq]

There is a moderate negative correlation between a “community statistical areas” poverty rate and the amount of tree cover it had in 2015 (r = -.34). In other words, places with a high poverty rate will have fewer trees, in general, and vice versa. Broadway East illustrates this. Most of the neighborhood is divided between two CSAs – Greenmount East and Clifton-Berea. Greenmount East is 14th (of 55) for poverty in the city, and has less tree canopy than 40 (of 55) areas. Clifton-Berea is 10th for poverty and has less tree canopy than 48 neighborhoods.

Supporting code and output [cq]

# Build correlation matrix between poverty and tree canopy
csa_tree_temp_demographics %>%
  select(perc_below_poverty = percent_of_family_households_living_below_the_poverty_line,
         avg_canopy_2015 = `15_lid_mean`) %>%
  as.matrix() %>%
  correlate() %>%
  mutate(variable=rowname) %>%
  select(variable, perc_below_poverty) %>%
  filter(variable == "avg_canopy_2015")
# Rank of tree cover and poverty rate for Clifton-Berea and Greenmount East (which holds most of Broadway East)
csa_tree_temp_demographics %>%
  mutate(poverty_rank = rank(-percent_of_family_households_living_below_the_poverty_line),
         canopy_rank = rank(-`15_lid_mean`)) %>%
  filter(str_detect(csa2010,"clifton-berea|greenmount")) %>%
# Total CSAs
csa_tree_temp_demographics %>%

Fact: Poverty to canopy graphic [cq]

The graphic generated below appears in the story, with the following headline and subhead: “In Baltimore, poorer areas have less tree canopy. Areas with more people living below the poverty line generally have less tree cover.”

Explanation [cq]

The head and subhead are based on the analysis in the previous heading.

Supporting code and output [cq]

# Select CSAs to label
target_csas <- c("greenmount east", "clifton-berea", "greater roland park/poplar hill")

# Poverty to canopy GRAPH
csa_tree_temp_demographics %>%
  # Start ggplot and set x and y for entire plot
    x = percent_of_family_households_living_below_the_poverty_line/100, 
    y = `15_lid_mean`
    )) +
  # This section for the basic scatterplot
  geom_point(aes(color = `15_lid_mean`),
             size=4) +
  # This section for circling all sample neighborhood points
  geom_point(data = csa_tree_temp_demographics %>%
               filter(csa2010 %in% target_csas),
             aes(color = `15_lid_mean`),
             size=6, shape = 1) +
  # This section shows the trend line
  geom_smooth(se = FALSE, # Removes gray banding
              method = glm, 
              color = "black") +
  # This section for labeling Canton, etc.
  ggrepel::geom_label_repel(data = csa_tree_temp_demographics %>%
                              filter(csa2010 %in% target_csas) %>%
                              mutate(csa2010 = case_when(
                                csa2010 == "greenmount east" ~ "Greenmount East \n(includes part of Broadway East)", 
                                csa2010 == "clifton-berea" ~ "Clifton-Berea \n(includes part of Broadway East)",
                                csa2010 == "greater roland park/poplar hill" ~ "Greater Roland Park/Poplar Hill",
                                T ~ csa2010)),
            aes(label = csa2010),
            label.size =.25,
            min.segment.length = .1,
            segment.alpha = .5,
            alpha = .85,
            nudge_x = .05,
            nudge_y = .06) +
  # Colors and label formatting follow
  #coord_flip() +
  scale_colour_gradient(low = "#E0FEA9", high = "#144A11") +
  labs(title = "",
       subtitle = "",
       x = "Percent of households living below the poverty line",
       y = "Percent of land covered by trees") +
  scale_x_continuous(label = scales::percent_format(accuracy = 1.0),
                     breaks = seq(0, 1, .1)) + 
  scale_y_continuous(label = scales::percent_format(accuracy = 1.0),
                     breaks = seq(0, 1, .1)) + 
  theme_bw() +
  theme(legend.position = "none",
        plot.title = element_blank(),
        plot.subtitle = element_blank(),