Introduction

This data notebook contains the analysis that generated facts in the story “As Walmart sales soared, workers got scant COVID-19 protection from OSHA” from the series “Essential and Exposed”. For each sentence in the story generated by original data analysis, we have provided the corresponding code and results.

Load Libraries, Settings and Global Functions

# For general data science
library(tidyverse)

# For data cleaning
library(janitor)

# For loading Excel files
library(readxl)

# For working with datetime
library(lubridate)

# For U.S. Census Bureau data
library(tigris)

# For pretty tables
library(kableExtra)
library(knitr)

## Function to filter for Walmart facilities
# We created a function to identify all Walmart-owned facilities across various datasets. Those facilities include retail stores and distribution centers for either Walmart or Sam's Club.

filter_walmart = function(dataframe, company_name_column){
   walmart_pattern = "walmart|wal mart|wal-mart|sam club|sams club|sam's club"

   walmart_related = dataframe %>%
    # Make all establishment names lowercase.
    mutate(name = tolower(!! sym(company_name_column))) %>%
    # Create a new column that detects whether establishment names include any variations on Walmart or Sam's Club.
    mutate(is_walmart = str_detect(name, walmart_pattern )) %>%
    # Filter for establishments that are part of the Walmart company.
    filter(is_walmart == TRUE)

   return(walmart_related)
}

## Function for formatted table output

output_formatted_table = function(table, text){
  
  table %>% 
    kable(caption = text) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), font_size = 14, fixed_thead = T) %>%
    scroll_box(width = "100%", height = "200px")
}

# Function to download inspection and violation data
## This will download latest version of inspection data.  For our story, we used data through 2021-05-17.
## The code to download inspection and violation data was modified from [code released by the Center for Public Integrity](https://github.com/PublicI/osha-fatality-inspections). 

download_osha_data <- function(folder, data_directory){
  file_date <- str_remove_all(Sys.Date(), "-") # Set file date
  data_path <- folder
  data_dir <- data_directory
  data_url <- "https://enfxfr.dol.gov/data_catalog/OSHA"
  data_file <- paste0(data_directory, file_date, ".csv.zip")
  if (!file.exists(file.path(data_path, data_dir))) {
    dir.create(data_path)
    download.file(paste(data_url, data_file, sep = "/"),
                  file.path(data_path, data_file))
    unzip(file.path(data_path, data_file),
          exdir = file.path(data_path, data_dir)) }
    
    if(data_directory == "osha_inspection_"){
      
      return(
        list.files(
          path = file.path(data_path, data_directory),
          pattern = "*.csv",
          full.names = TRUE
        ) %>%
          map_dfr(
            read_csv,
            col_types = cols(
              .default = "c",
              open_date = col_date(),
              case_mod_date = col_date(),
              close_conf_date = col_date(),
              close_case_date = col_date()
            )
          )
      )
      
    }
    if(data_directory == "osha_violation_"){
      return(
        list.files(
          path = file.path(data_path, data_directory),
          pattern = "*.csv",
          full.names = TRUE
        ) %>%
          map_dfr(
            read_csv,
            col_types = cols(
              .default = "c",
              issuance_date = col_date(),
              abate_date = col_date(),
              load_dt = col_date(),
              fta_final_order_date = col_date(),
              fta_contest_date = col_date(),
              final_order_date = col_date(),
              contest_date = col_date(), 
              fta_issuance_date = col_date(),
              current_penalty = col_number(), 
              fta_penalty = col_number(),
              initial_penalty = col_number()
            )
          )
      )
      
    }
}

Load and Clean Data

Federal OSHA

COVID-19 Complaints

We examined closed COVID-19 complaints published by the federal Occupational Safety and Health Administration. The data includes all valid complaints received by OSHA since the start of the pandemic. Federal and state OSHA offices submit their data to the federal database. However, certain data may be unique to the state’s operating system or may not be transferred to federal OSHA. This means that, while state OSHA complaints are accessible via the federal OSHA website, totals may not always be comprehensive.

Federal OSHA’s database only reveals companies’ names when the complaints associated with them are closed. Therefore, when analyzing complaints for Walmart on a national level, analysis is based on closed complaints. Source data is available at OSHA.gov’s Covid-19 page.

OSHA updates the complaints data weekly. We used complaints data spanning various date ranges to match the date ranges of other datasets used in our analysis. Variable names involving “OIS,” which stands for OSHA Information System, indicate the federal database.

# Function to clean closed complaints
clean_ois_closed_complaints = function(database){
    result_database = database %>% 
    # Take out the first two rows because they aren't row headings.
    remove_empty() %>%
    slice(-1) %>%
    # Bring row headings to the first line of the data set.
    row_to_names(1) %>%
    # Clean column names
    clean_names() %>%
    # rid is the osha office number
    #first two numbers are OSHA office region, if the 3rd digit is 5 it is a state_plan office, else its a federal plan office
    mutate(osha_office_region = substr(rid, 1,2), office_type_number = substr(rid, 3,3), office_type = ifelse(office_type_number == "5", "state_plan_office", "federal_plan_office") ) %>% 
    #make a column to identify if an inspection occurred
    mutate(inspection_occurred = ifelse(is.na(insp_number) == FALSE,'yes', 'no')) %>% 
    #convert date
    mutate(upa_receipt_date = excel_numeric_to_date(as.numeric(upa_receipt_date)))
    return(result_database)
}

# Read in closed complaints for 2021-03-07 --------------------

ois_closed_complaints_20210307 = read_excel("../etl/federal_osha_complaints/input_data/Closed_Federal_State_Plan_COVID-19_Complaints_Through_0307_2021.xlsx") %>% 
  clean_ois_closed_complaints()

# Read in closed complaints for 2021-03-21 --------------------

ois_closed_complaints_20210321 = read_excel("../etl/federal_osha_complaints/input_data/Closed_Federal_State_Plan_Valid_COVID-19_Complaints_Through_0321_2021.xlsx") %>% 
  clean_ois_closed_complaints()

# Read in closed federal complaints database for 2021-03-28 --------------------
ois_closed_complaints_20210328 = read_excel("../etl/federal_osha_complaints/input_data/Closed_Federal_State_Plan_Valid_COVID-19_Complaints_Through_0328_2021.xlsx") %>% 
  clean_ois_closed_complaints()

# Federal citations data --------------------
federal_covid_violations_20210325 = read_excel("data/federal_pipeline/COVID Case Tracker - Cases with Issued Violations as of 2021-03-25.xlsx") %>% 
  clean_names() %>%
  # Create a match_nr from the inspection number to match with data downloaded from OSHA
  mutate(match_nr =  str_sub(inspection_number, 2,7)) %>%
  mutate(estab_name = tolower(establishment_name))

Inspections and Violations Published by Federal OSHA

The dataset we used consists of inspection case details for OSHA inspections conducted annually. According to the OSHA website, the data “includes information regarding the impetus for conducting the inspection, and details on citations and penalty assessments resulting from violations of OSHA standards.”

The data can be viewed on on OSHA’s enforcement data page. The code to download inspection and violation data was modified from code released by the Center for Public Integrity.

# Download inspections and violations data
## These lines are commented so as not to execute when the markdown file is knitted to produce html. 
#df.inspections = download_osha_data("data/osha_inspections_violations", "osha_inspection_", file_date)
#df.violations = download_osha_data("data/osha_inspections_violations", "osha_violation_", file_date)

# This are the above datasets that were downloaded as of 2021-05-07. In order to fit on Git, the data was filtered from 2020-01-01 to 2021-05-07
df.inspections =  read_csv("inspections_20200101_20210507.csv")
df.violations =  read_csv("violations_20200101_20210507.csv")

## The citation sheet that OSHA provided has inspection numbers, while the enforcement data has activity numbers. The activity number is a longer number with the inspection number contained within it. In order to match them we followed this method:
## 1) Filter enforcement data for 2020 to present.
## 2) Create a unique identifier to match inspection numbers to activity numbers. Example:
## activity number = 34[452565]4
## inspection number = 1[452565]

# Use OSHA's data key to label the data
df.inspections_cleaned = df.inspections %>%
  
  # Turn activity number into a number that we can match with the inspection number provided in federal COVID-19 citation list
  mutate(match_nr = str_sub(activity_nr, 3,8)) %>%
  # Break apart the RID to interpret: first two numbers are OSHA office region, if the 3rd digit is 5 it is a state_plan office, else its a federal plan office
  mutate(osha_office_region = substr(reporting_id, 1,2),
         office_type_number = substr(reporting_id, 3,3),
         office_type = ifelse(office_type_number == "5", "state_plan_office", "federal_plan_office") ) %>%
  mutate(estab_name = tolower(estab_name)) %>%
  # Add year 
  mutate(year = year(open_date))

df.violations_cleaned = df.violations %>%
  # Create a number from activity number that we can match with inspection number
  mutate(match_nr = str_sub(activity_nr, 3,8)) %>% 
  # Create a year column from issuance date
  mutate(year = year(issuance_date))

# Transform inspection number into a match number to match with inspections database
federal_covid_violations_20210325 = federal_covid_violations_20210325 %>%
  mutate(match_nr =  str_sub(inspection_number, 2,7)) %>%
  mutate(estab_name = tolower(establishment_name))

Oregon OSHA

We obtained data on COVID-19 complaints and inspections from Oregon state OSHA through a public records request. This data allowed us to see which inspections and violations in the state were related to COVID-19 — a detail that could not be determined from federal data. The state data also showed complaint descriptions, which were not included for Oregon in the federal database.

# Read in Oregon state OSHA data received through public records request
or_public_20210311 = read_excel("sources/state_foias/OR/Public Complaint Workbook -- March 22 2021 based on March 11 data.xlsx", sheet = "covid") %>% 
  clean_names()

or_inspections_20210316 <- read_excel("sources/state_foias/OR/Public copy - Insp and vios - COVID - updated 03162021.xlsx", sheet = "Inspections", skip = 2) %>% 
  clean_names() %>% 
  mutate(initial_penalty = as.numeric(initial_penalty))

Utah OSHA

We obtained data on COVID-19 complaints from Utah state OSHA through a public records request. The data covers March 2020 to March 2021.

# Utah OSHA state data received through a public records request -------------
ut_osha_complaints_202003_to_202103 = read_excel("sources/state_foias/UT/UOSH COVID-19 Complaints March 2020 - March 2021.xlsx", skip = 1) %>% 
  clean_names()

COVID-19 in Workplaces

We obtained health department data tracking COVID-19 in workplaces in 10 states. States rarely disclose company-level outbreak data publicly.

In three states, health departments publish company-level COVID-19 data online.

  • The Colorado Department of Health and Public Safety publishes COVID-19 outbreak data on its website. We last updated this data on March 24, 2021.
  • The New Mexico Environment Department publishes data on COVID-19 rapid responses on its website. We last updated this data on March 21, 2021.
  • The Oregon Health Authority publishes COVID-19 outbreak data on its website. We last updated this data on March 24, 2021. We aggregated the outbreaks from two types of pdfs. One set, “Weekly Report,” was published before December 2020. The second set, “Weekly Outbreak Reports,” was published starting December 2020. We filtered for workplace outbreaks.

We attempted to obtain company-level data from every remaining state through a public records request to the state health department. We were only able to obtain it in seven: Arkansas, Kansas, Maine, Rhode Island, Utah, Vermont and Washington. In most other instances, states cited exemptions in state public records laws in refusing to disclose the data or did not respond to repeated requests.

## Load company level data 

# AR --------------------
ar_occurrences <- read_csv("../etl/state_health_department_outbreaks/input_data/arkansas/worker_illness_cleaned_master.csv") %>%
  mutate(date=as_date(date))


# CO --------------------
# Read in active outbreaks
co_outbreaks_active_20210324 = read_excel("../etl/state_health_department_outbreaks/input_data/colorado/covid-19_ob_weekly_report_03_24_2021.xlsx",
                                 sheet = "Active", col_types = c("text",
                                                                 "text", "date", "text", "text", "text",
                                                                 "text", "date", "numeric", "numeric",
                                                                 "numeric", "numeric", "numeric",
                                                                 "numeric", "numeric", "numeric")) %>% clean_names()

# Read in resolved outbreaks
co_outbreaks_resolved_20210324 = read_excel("../etl/state_health_department_outbreaks/input_data/colorado/covid-19_ob_weekly_report_03_24_2021.xlsx",
                                 sheet = "Resolved", col_types = c("text",
                                                                   "text", "date", "text", "text", "text",
                                                                   "text", "date", "numeric", "numeric",
                                                                   "numeric", "numeric", "numeric",
                                                                   "numeric", "numeric", "numeric")) %>% clean_names()

# Combine active and resolved outbreaks
co_outbreaks_20210324 = bind_rows(co_outbreaks_active_20210324, co_outbreaks_resolved_20210324)

# Read in list of unique Walmart locations in Colorado OSHA complaints
co_walmart_complaints_addresses <- read_excel("sources/co_walmart_complaints_outbreaks_addresses.xlsx") %>%
  mutate(address = str_squish(address))


# KS --------------------
ks_outbreaks_20200922_to_20210210 = read_excel("../etl/state_health_department_outbreaks/input_data/kansas/kansas_structured_20200922-20210210.xlsx",
                          col_types = c("text", "text", "text",
                                        "text", "numeric", "date", "text"))%>% clean_names()


# ME --------------------
me_outbreaks_2021026 = read_excel("../etl/state_health_department_outbreaks/input_data/maine/Maine_outbreaks_structured_2021-02-26.xlsx",
                          col_types = c("text", "date", "date",
                                        "text", "text")) %>% clean_names()


# NM --------------------
nm_covid_occurrences_20210321 = read_excel("../etl/state_health_department_outbreaks/input_data/new_mexico/new_mexico_rapid_responses_all_20210321.xlsx",
    col_types = c("date", "text", "text",
        "text", "text", "text", "text", "numeric")) %>% clean_names()


# OR --------------------
or_outbreaks_20210324 = read_csv("../etl/state_health_department_outbreaks/input_data/oregon/or_outbreaks_combined_cleaned_20200414_20210324.csv") %>% mutate(begin_date = ifelse(is.na(investigation_start_date), as.Date(first_reported), as.Date(investigation_start_date))) %>%
  mutate(begin_date = as_date(begin_date))


# RI --------------------
ri_outbreaks_20210225 = read_excel("../etl/state_health_department_outbreaks/input_data/rhode_island/Rhode Island Occupational Outbreaks Data 02252021.xlsx",
    skip = 2) %>% clean_names()


# UT --------------------
ut_outbreaks_20210319 = read_csv("../etl/state_health_department_outbreaks/input_data/utah/utah_outbreaks_2021_03_19.csv",
                        col_types = cols(`Outbreak Date` = col_date(format = "%m/%d/%Y"))) %>% clean_names()

ut_outbreaks_addresses = read_excel("sources/ut_outbreaks_addresses.xlsx") %>% clean_names()



# VT --------------------
vt_outbreaks_20210211 = read_csv("../etl/state_health_department_outbreaks/input_data/vermont/vermont_2021-02-11.csv",
                        col_types = cols(data_resolved = col_date(format = "%m/%d/%Y"), data_outbreak_reported = col_date(format = "%m/%d/%Y")))


# WA --------------------
# Read in data extracted from pdf.  Part 1 includes first set of columns needed for the analysis. 
washington_walmart_sheet_1 <- read_csv("sources/washington/tabula-covid_outbreaks_washington_Part1.csv") %>%
  clean_names() %>%
  filter(case_id != "CASE_ID") %>%
  filter(case_id != "46 Vulnerable Ind") %>%
  filter_walmart("case_name") %>% 
  mutate(case_name_copy = tolower(name)) %>%
  mutate(case_name_copy = str_replace_all(case_name_copy,"walla walla","walla-walla")) %>%
  separate(case_name_copy,sep=" ", into=c("year","county","covid","d","e","f","g","h")) %>% 
  mutate(d = "walmart") %>%
  mutate(store_name = paste(d,e,f,g,sep=" ")) %>%
  mutate(store_name = str_remove_all(store_name,"NA|mcdonald's|‐ |‐|#| night shift|night|\\(|\\)|pharmacy, |spokane| mart| walmart")) %>%
  mutate(store_name = str_replace_all(store_name,"  C"," C")) %>%
  mutate(store_name = str_replace_all(store_name,"  "," ")) %>%
  mutate(store_name = tolower(str_trim(store_name))) %>%
  mutate(store_name = str_replace(store_name,"n colton walmart","walmart colton st")) %>%
  mutate(store_name = case_when(
    str_detect(store_name,"colton") ~ "walmart colton st",
    case_id == 101922141 ~ "walmart richland 2",
    TRUE ~ store_name)) %>%
  select(-(covid:h)) %>%
  mutate(unique_store_name_id = "x") %>%
  mutate(unique_store_name_id_notes = "x") %>%
  select(case_id, case_name,county,store_name,unique_store_name_id,unique_store_name_id_notes,year,everything(),-status,-contains("cluster"),-name,-is_walmart) %>%
  arrange(county,store_name)

# Read in data extracted from pdf.  Part 5 includes next set of columns needed for the analysis. 
washington_walmart_sheet_5 <- read_csv("sources/washington/tabula-covid_outbreaks_washington_Part5.csv") %>%
  clean_names() %>%
  filter(case_id != "CASE_ID")

# Join Sheets 1 and 2, filter for only needed columns
washington_walmart_join <- washington_walmart_sheet_1 %>%
  left_join(washington_walmart_sheet_5,by="case_id") %>%
  distinct() %>%
  select(case_id:investigation_status,contains("mployees_confirmed"),contains("mployees_probable")) 

# We wrote out the results of the joined data set. Through research, we manually assigned the official Walmart store number to each store referenced in the outbreak reports from Washington state.
washington_walmart_unique_identifiers <- read_csv("sources/washington/walmart_unique_identifiers.csv") %>%
  select(case_id,final_unique_store_id) %>%
  mutate(unique_store_name_option_a = final_unique_store_id) %>%
  separate(unique_store_name_option_a, into=c("unique_store_name_option_a","unique_store_name_option_b"), sep=" OR | or ") %>%
  mutate(unique_store_name_option_b = case_when(
    is.na(unique_store_name_option_b) ~ unique_store_name_option_a,
    TRUE ~ unique_store_name_option_b
  )) %>%
  mutate(across(everything(),str_trim)) 

# We joined the Walmart unique identifiers list with the the Washington Walmart outbreaks dataframe
washington_walmarts <- washington_walmart_join %>% 
  inner_join(washington_walmart_unique_identifiers, by="case_id") %>% 
  select(case_id, unique_store_name_option_a, unique_store_name_option_b, unique_store_options = final_unique_store_id, covid19_employees_confirmed = covid19_e_mployees_confirmed, covid19_employees_probable = covid19_e_mployees_probable) %>% 
  mutate(covid19_employees_confirmed = as.numeric(replace_na(covid19_employees_confirmed, 0))) %>% 
  mutate(covid19_employees_probable = as.numeric(replace_na(covid19_employees_probable, 0))) %>% 
  mutate(total_cases = covid19_employees_confirmed + covid19_employees_probable)

State Worker Population

We obtained data on state-level, seasonally-adjusted worker population from the Bureau of Labor Statistics. We used data from December 2020.

# Load 2020 employment data
worker_data<- read_excel("data/weighted_workers/employment_stats.xlsx") %>%
  clean_names() %>%
  # Total employment numbers are truncated (column is listed in units of 1,000). Create a new column to show the full number
  mutate(total_employment = total_employment_thousands * 1000)

region_state_key <- read_excel("data/weighted_workers/region_state_key.xlsx",
    col_types = c("text", "text"))

# Load key that describes type of workplaces covered, based on https://www.osha.gov/stateplans
federal_state_plan_key <- read_excel("data/weighted_workers/federal_versus_state_key.xlsx",
    col_types = c("text", "text", "text"))

state_list<- fips_codes%>%
  distinct(state, state_name)

# Join list of state abbreviations to dataframe containing worker population in each state
worker_data = left_join(worker_data, state_list, by = c("state" ="state_name" ))

Line-By-Line Fact-Check

FACT: At least 18 employees got sick [CQ | Full & AP]

At least 18 employees got sick [at Aurora Walmart, store number: #1492].

Explanation

We filtered the Colorado health department COVID-19 outbreak data for the Aurora Walmart, store number: #1492. The “total_staff_cases” field shows 18 employees fell ill. 

Supporting code and output

co_aurora_walmart_outbreaks_20210324 = filter_walmart(co_outbreaks_20210324, "setting_name") %>% 
  mutate(setting_name = tolower(setting_name)) %>%
  # Get rid of :month that appears
  mutate(setting_name_clean = sub(":.*", "", setting_name)) %>%  
  # Get store number
  mutate(store_number = str_extract(setting_name_clean, "#\\S+") ) %>% 
  # Filter for store number
  filter(store_number == "#1492") %>%
  select(setting_name, date_reported_to_public_health,total_staff_cases)
  
output_formatted_table(co_aurora_walmart_outbreaks_20210324, "Aurora Walmart Outbreak, 18 employees")
Aurora Walmart Outbreak, 18 employees
setting_name date_reported_to_public_health total_staff_cases
walmart #1492 2020-04-23 18

FACT: Across 10 states, one quarter of Walmart-owned facilities had COVID-19 cases [CQ | Full & AP]

“The Walmart where Kunz worked was one of at least 151 Walmart facilities in 10 states with available data where multiple COVID-19 illnesses were recorded, a reporting consortium led by the Howard Center for Investigative Journalism at the University of Maryland, found.”

Full: “That’s one quarter of the company’s stores and distribution centers in those states.”

AP: “On average, one quarter of the company’s stores and distribution centers in those states were affected.”

Explanation

To produce this finding, we identified each unique Walmart facility where state health department data showed multiple COVID-19 cases among employees (in 10 states with available data). We compared that to the list and count of Walmart facilities in each state. For any locations where the exact identity of a store was ambiguous, we chose the grouping method that led to the lowest number of unique stores affected by COVID-19, and included that minimum number in our overall total.

Supporting code and output

# Filter out the walmart companies from each outbreak dataset
ar_occurrences_walmart = filter_walmart(ar_occurrences, "company_name")
co_outbreaks_20210324_walmart = filter_walmart(co_outbreaks_20210324, "setting_name")
ks_outbreaks_20200922_to_20210210_walmart = filter_walmart(ks_outbreaks_20200922_to_20210210, "facility")
me_outbreaks_2021026_walmart = filter_walmart(me_outbreaks_2021026, "name")
nm_covid_occurrences_20210321_walmart = filter_walmart(nm_covid_occurrences_20210321, "name")
or_outbreaks_20210324_walmart = filter_walmart(or_outbreaks_20210324, "facility_name")
ri_outbreaks_20210225_walmart = filter_walmart(ri_outbreaks_20210225, "employer")
ut_outbreaks_20210319_walmart = filter_walmart(ut_outbreaks_20210319, "company_name")
vt_outbreaks_20210211_walmart = filter_walmart(vt_outbreaks_20210211, "estab_name") 

## Get number of unique Walmarts from each outbreak dataset

# AR --------------------
# Group by company name and city to get unique number of Walmarts
ar_occurrences_walmart_unique = ar_occurrences_walmart %>%
  group_by(company_name, city) %>%
  summarise(num = n())

num_ar_occurrences_walmart_unique = nrow(ar_occurrences_walmart_unique)

# Colorado --------------------
# Clean up Walmart outbreak names
# Data contains total staff cases
co_outbreaks_20210324_walmart_clean = co_outbreaks_20210324_walmart %>%
  mutate(setting_name = tolower(setting_name)) %>%
  # Get rid of :month that appears
  mutate(setting_name_clean = sub(":.*", "", setting_name)) %>%  
  # Get store number
  mutate(store_number = str_extract(setting_name_clean, "#\\S+") ) %>% 
  # Adding store number for Walmart Pharmacy in Delta
  mutate(store_number = case_when(
    setting_name_clean == "walmart pharmacy" & colorado_county_exposure_location == "Delta" ~ "#5458",
    TRUE ~ store_number))

# Get unique Walmart outbreaks
co_outbreaks_20210324_walmart_unique = co_outbreaks_20210324_walmart_clean %>%
  group_by(store_number,colorado_county_exposure_location ) %>%
  summarise(num = n())

# If the distribution centers are two different distribution centers, than the actual count is one higher
num_co_outbreaks_20210324_walmart_unique = nrow(co_outbreaks_20210324_walmart_unique)


# KS --------------------
# https://www.coronavirus.kdheks.gov/DocumentCenter/View/1489/Policy-FAQs
# Kansas releases data "locations associated with cases." Data does not include specific information about employees.
# Only one case that shows up in Kansas outbreak data
ks_outbreaks_20200922_to_20210210_walmart_unique = ks_outbreaks_20200922_to_20210210_walmart %>%
  group_by(facility, city_or_state) %>%
  summarise(num = n())

num_ks_outbreaks_20200922_to_20210210_walmart_unique = nrow(ks_outbreaks_20200922_to_20210210_walmart_unique)


# ME --------------------
# Maine releases "COVID-19 outbreak investigations associated with worksites in Maine"
# Does not include specific informationa about employees
me_outbreaks_2021026_walmart_unique = me_outbreaks_2021026_walmart %>% 
  group_by(name) %>% summarise(num = n())

num_me_outbreaks_2021026_unique_walmart = nrow(me_outbreaks_2021026_walmart_unique)


# NM --------------------
# "represents positive cases in the workplace"
# https://www.env.nm.gov/rapid-response-data/
# Made sure addresses did not have misspellings, and then grouped on addresses to get unique Walmart locations
nm_covid_occurrences_20210321_walmart_unique = nm_covid_occurrences_20210321_walmart %>%
  mutate(address_clean = case_when(
                      # This is a fuel center, part of a Sam's Club that's already in the dataset.
                     address == "4400 N MAIN ST # A" ~ "4400 N MAIN ST",
                     # This is an auto care center, part of a Walmart that's already in the dataset.
                     address == "460 NEW MEXICO HIGHWAY 528" ~ "460 HIGHWAY 528",
                     # The Walmart in "Eunice" NM, is actually the one in Hobbs.
                     address == "3800 LOVINGTON HWY" ~ "3800 N LOVINGTON HWY",
                     TRUE ~ address))%>%
  group_by(address_clean, county ) %>%
  summarise(num = n())

num_nm_covid_occurrences_20210321_walmart_unique = nrow(nm_covid_occurrences_20210321_walmart_unique)

# OR --------------------
# "OHA is publishing data on cases in workplace outbreaks"
# https://www.oregon.gov/oha/covid19/Documents/DataReports/COVID-19-Weekly-Outbreak-Report-2021-1-21-FINAL.pdf
or_outbreaks_20210324_walmart_unique = or_outbreaks_20210324_walmart %>%
  # Get rid of repeating white spaces 
  mutate(address = str_squish(address)) %>%
  group_by(address) %>%
  summarise(num = n())

num_or_outbreaks_20210324_walmart_unique = nrow(or_outbreaks_20210324_walmart_unique)

# RI --------------------
# Defined as "workplace outbreaks"
# The data provided by the Rhode Island Department of Health was aggregated by company. For Walmart it indicated that there were multiple locations so we assumed at least two locations. There are nine Walmart facilities in the state.
num_ri_outbreaks_20210225_walmart_unique = nrow(ri_outbreaks_20210225_walmart)+1


# UT --------------------
# Defined as "Per CSTE, an outbreak is defined as two* or more laboratory-confirmed COVID-19 cases among workers at a facility with onset of illness within a 14-day period"
ut_outbreaks_20210319_walmart_cleaned = ut_outbreaks_20210319_walmart %>%
  # Clean names to make it the same
  mutate(company_name_clean = case_when(
    # Solve whitespace problem
    company_name == "Walmart 900E" ~ "Walmart 900 E",
    TRUE ~ company_name
  )) %>%
  # Join to addresses from Walmart's website
  left_join(ut_outbreaks_addresses, by = c("company_name_clean" = "name"))

ut_outbreaks_20210319_walmart_unique = ut_outbreaks_20210319_walmart_cleaned %>%
  group_by(company_name_clean, address) %>%
  summarise(num = n()) %>% 
  # There is no identifier for the west valley city (in total there are three WVC locations and the other two in our outbreak data are identified)
  filter(company_name_clean != "Walmart WVC")


num_ut_outbreaks_20210319_walmart_unique = nrow(ut_outbreaks_20210319_walmart_unique)


# VT --------------------
# Definition doesn't matter, no walmarts
num_vt_outbreaks_20210211_walmart_unique = nrow(vt_outbreaks_20210211_walmart)


# WA --------------------
# Includes field with staff cases
# Do we want to filter out NA or 0 cases
num_wa_outbreaks_walmart_unique <- washington_walmarts %>% 
  distinct(unique_store_name_option_a) %>% 
  nrow()

## Calculations --------------------

# Get all unique walmart numbers from outbreak data 
unique_walmarts_with_covid_cases = sum(num_co_outbreaks_20210324_walmart_unique,
                                       num_me_outbreaks_2021026_unique_walmart, 
                                       num_nm_covid_occurrences_20210321_walmart_unique, 
                                       num_or_outbreaks_20210324_walmart_unique, 
                                       num_ks_outbreaks_20200922_to_20210210_walmart_unique,
                                       num_ar_occurrences_walmart_unique, 
                                       num_ri_outbreaks_20210225_walmart_unique, 
                                       num_ut_outbreaks_20210319_walmart_unique, 
                                       num_vt_outbreaks_20210211_walmart_unique, 
                                       num_wa_outbreaks_walmart_unique)

# Get number of Walmart owned stores and distribution centers in each state as of 2021-05-03
# https://corporate.walmart.com/our-story/locations/united-states?multi=false


ar_walmarts = 141
co_walmarts = 108
ks_walmarts = 84
me_walmarts = 26
nm_walmarts = 54
or_walmarts = 46
ri_walmarts = 9
ut_walmarts = 62
vt_walmarts = 6
wa_walmarts = 68

total_walmarts = sum(ar_walmarts,
                     co_walmarts,
                     ks_walmarts,
                     me_walmarts,
                     nm_walmarts,
                     or_walmarts,
                     ri_walmarts,
                     ut_walmarts,
                     vt_walmarts,
                     wa_walmarts
                     )

# States where we received company level reports of COVID-19 in the workplace
ls_company_level_outbreak_data = list(ri_outbreaks_20210225,vt_outbreaks_20210211, ut_outbreaks_20210319, or_outbreaks_20210324, me_outbreaks_2021026, ks_outbreaks_20200922_to_20210210, co_outbreaks_20210324, nm_covid_occurrences_20210321, ar_occurrences, washington_walmarts)

average_percent_walmarts_affected_by_covid = mean(c((num_ar_occurrences_walmart_unique/ar_walmarts),
     (num_ks_outbreaks_20200922_to_20210210_walmart_unique/ks_walmarts),
     (num_co_outbreaks_20210324_walmart_unique/co_walmarts), 
     (num_me_outbreaks_2021026_unique_walmart/me_walmarts), 
     (num_nm_covid_occurrences_20210321_walmart_unique/nm_walmarts), 
     (num_or_outbreaks_20210324_walmart_unique/or_walmarts), 
     (num_ri_outbreaks_20210225_walmart_unique/ri_walmarts), 
     (num_ut_outbreaks_20210319_walmart_unique/ut_walmarts), 
     (num_vt_outbreaks_20210211_walmart_unique/vt_walmarts),
     (num_wa_outbreaks_walmart_unique/wa_walmarts)))*100

table_average_percent_walmarts_affected_by_covid = tibble(avg_pct_walmarts_with_covid_cases = round(average_percent_walmarts_affected_by_covid,2))

# Results --------------------
pct_walmarts_with_covid_cases = tibble( number_of_states = length(ls_company_level_outbreak_data), 'unique_walmarts_with_covid_cases' = unique_walmarts_with_covid_cases, "total_walmarts" = total_walmarts) %>% 
  mutate(pct_walmarts_with_covid_cases = (unique_walmarts_with_covid_cases / total_walmarts)*100)

output_formatted_table(pct_walmarts_with_covid_cases, "Across 10 states, one quarter of 151 Walmart-owned facilities had COVID-19 cases")
Across 10 states, one quarter of 151 Walmart-owned facilities had COVID-19 cases
number_of_states unique_walmarts_with_covid_cases total_walmarts pct_walmarts_with_covid_cases
10 151 604 25
output_formatted_table(table_average_percent_walmarts_affected_by_covid, "On average, one quarter of the company's stores and distribution centers in those states were affected")
On average, one quarter of the company’s stores and distribution centers in those states were affected
avg_pct_walmarts_with_covid_cases
25.17

FACT: 3% inspected, 12.5% with violations [CQ | Full & AP]

“When workers try to sound the alarm to OSHA by submitting COVID-related complaints, only a fraction lead to inspections, and even fewer result in a citation. As of late March, 3% of closed COVID-19 complaints to federal OSHA offices deemed valid by the agency resulted in an inspection, 12.5% of which led to citations with an average penalty of $13,000. OSHA has reduced over a third of penalties. For Walmart, slightly fewer complaints resulted in inspections — 2.6%. No inspections led to a citation.”

Explanation

To produce these findings, we started the list of closed valid COVID-19 complaints released by OSHA as of March 21. To find complaints that led to inspections, we filtered the data set for only those complaints with a corresponding inspection number. To discover inspections that led to citations, we used a list OSHA provided to us. To obtain current penalty information, we pulled data from OSHA’s enforcement data site. A data visualization embedded in the story also makes use of these findings.

Supporting Code and Output

## Pipeline for all valid closed federal complaints

# Get all inspections from 2020 to present
df.inspections_2020topresent = df.inspections_cleaned %>% filter(year >= 2020)

# Get all violations from 2020 to present
df.violations_2020topresent = df.violations_cleaned %>% filter(year >= 2020)


# Get unique match_nr's from the federal violations sheet
unique_inspection_numbers_federal_violations = unique(federal_covid_violations_20210325$match_nr)

# Filter the inspections for all of the inspections listed in the federal covid database. This will give us information on what triggered the inspection
detailed_inspections_with_covid_violation = df.inspections_2020topresent %>% filter(match_nr %in% unique_inspection_numbers_federal_violations)

# Get a list of the unique activity numbers so we can match up to violations database
unique_activity_numbers_inspection_with_covid_violations = unique(detailed_inspections_with_covid_violation$activity_nr)

# Get all violation info for the inspection numbers
detailed_violations_covid = df.violations_2020topresent %>% filter(activity_nr %in% unique_activity_numbers_inspection_with_covid_violations)


# Grouped all the violations together to get their current penalty and initial penalty resulting from the inspection activity number
summary_detailed_violations_covid = detailed_violations_covid %>%
  group_by(activity_nr) %>%
  summarise(total_initial = sum(initial_penalty), total_current = sum(current_penalty))

# Join the initial and current money info for the inspections back to the detailed_inspections_with_covid_violation
final_detailed_inspections_with_covid_violations = left_join(detailed_inspections_with_covid_violation, summary_detailed_violations_covid, by = "activity_nr") %>%
  # Select columns that we need
  select(activity_nr, reporting_id, estab_name, site_address, open_date, case_mod_date, close_conf_date, close_case_date, match_nr, office_type,  year, total_initial, total_current)

# Combine the final detailed sheet (this includes the initial penalty, current penalty, what triggered the inspection)
complete_federal_covid_violations_20210325 = left_join(federal_covid_violations_20210325,final_detailed_inspections_with_covid_violations, by = "match_nr") %>% 
  # Checking to make sure that initial penalties from the downloaded data match the excel sheet provided by osha
  mutate(check_initial_match = ifelse(total_initial_penalty == total_initial, "yes", "no")) %>%
  # Calculate the change from initial penalty to current penalty
  mutate(penalty_change = total_initial - total_current)

## Prepare dataframe of federal closed complaints --------------------

# Creates a dataframe of closed federal complaints as of March 21, 2021
federal_closed_ois_complaints_20210321  = ois_closed_complaints_20210321 %>%
  # Choose federal plan offices
  filter(office_type == "federal_plan_office") 

# --------------------

valid_closed_covid_complaints_to_inspections = federal_closed_ois_complaints_20210321 %>% 
  group_by(inspection_occurred) %>% 
  summarise(number_complaints = n()) %>% 
  pivot_wider(names_from = inspection_occurred, values_from = number_complaints) %>% 
  rename("complaint_with_inspection" ="yes", "complaint_with_no_inspection" = "no" ) %>% 
  mutate(total_complaints = complaint_with_inspection + complaint_with_no_inspection) %>% 
  mutate(pct_complaints_to_inspections = round(complaint_with_inspection / (total_complaints) * 100,2)) 

# --------------------

# Create a dataframe of unique inspection numbers. 
federal_closed_ois_complaints_20210321_insp_numbers = federal_closed_ois_complaints_20210321  %>%
  filter(inspection_occurred == "yes") %>% 
  select(insp_number) %>%
  separate_rows(insp_number) %>%
  unique() %>% 
  tibble() %>% 
  mutate(insp_number = as.numeric(insp_number))

# Only keep the inspection numbers from the citation sheet that OSHA sent, that appear in the federal closed citation data
federal_closed_complaints_inspection_resulting_in_violation = complete_federal_covid_violations_20210325 %>% 
  filter(inspection_number %in% federal_closed_ois_complaints_20210321_insp_numbers$insp_number)

# Number of inspections related to complaints that resulted in a citation
num_closed_federal_complaints_inspections_cited = nrow(federal_closed_complaints_inspection_resulting_in_violation)

average_initial_penalty = tibble( average_initial_penalty = mean(federal_closed_complaints_inspection_resulting_in_violation$total_initial_penalty))


# --------------------

inspection_to_citation_percent = federal_closed_ois_complaints_20210321 %>% 
  group_by(inspection_occurred) %>% 
  summarise(number_complaints_with_inspections = n()) %>% 
  filter(inspection_occurred == "yes") %>% 
  add_column("resulting_citations" = num_closed_federal_complaints_inspections_cited) %>% 
  mutate(percent_inspections_resulting_in_violations = round((resulting_citations/number_complaints_with_inspections)*100,2)) %>% 
  select(-inspection_occurred)

# --------------------

# Get percent of penalties that have citations have been reduced
reduced_penalties = federal_closed_complaints_inspection_resulting_in_violation %>% 
  mutate(penalty_reduced = case_when(
    penalty_change > 0 ~ "reduced_penalty",
    TRUE ~ "no_change"
  )) %>% 
  group_by(penalty_reduced) %>% 
  summarise(num = n()) %>%
  pivot_wider(names_from = penalty_reduced, values_from = num) %>% 
  mutate(total = reduced_penalty + no_change) %>% 
  mutate(pct_penalties_reduced = round(reduced_penalty / (total) *100,2)) 

# Results ---------------------
output_formatted_table(valid_closed_covid_complaints_to_inspections, "3% of closed COVID-19 complaints to federal OSHA offices deemed valid by the agency resulted in an inspection")
3% of closed COVID-19 complaints to federal OSHA offices deemed valid by the agency resulted in an inspection
complaint_with_no_inspection complaint_with_inspection total_complaints pct_complaints_to_inspections
12339 376 12715 2.96
output_formatted_table(inspection_to_citation_percent,"12.5% of which led to citations" )
12.5% of which led to citations
number_complaints_with_inspections resulting_citations percent_inspections_resulting_in_violations
376 47 12.5
output_formatted_table(average_initial_penalty, "with an average penalty of $13,000")
with an average penalty of $13,000
average_initial_penalty
13065.79
output_formatted_table(reduced_penalties, "OSHA has reduced over a third of penalties")
OSHA has reduced over a third of penalties
no_change reduced_penalty total pct_penalties_reduced
26 21 47 44.68
## Pipeline for all valid closed federal complaints regarding Walmart

# Calculate percent of Walmart complaints that had a related inspection
valid_closed_walmart_covid_complaints_to_inspections = federal_closed_ois_complaints_20210321 %>% 
  filter_walmart("estab_name") %>% 
  group_by(inspection_occurred) %>% 
  summarise(number_complaints = n()) %>% 
  pivot_wider(names_from = inspection_occurred, values_from = number_complaints) %>% 
  rename("complaint_resulted_in_inspection" ="yes", "no_inspection_occurred" = "no" ) %>% 
  mutate(pct_complaints_to_inspections = round(complaint_resulted_in_inspection / (complaint_resulted_in_inspection + no_inspection_occurred) * 100,2)) 

# ---------------------

# Create a dataframe of inspection numbers related to closed federal Walmart complaints
federal_closed_ois_walmart_complaints_20210321_insp_numbers = federal_closed_ois_complaints_20210321  %>%
  filter_walmart("estab_name") %>% 
  filter(inspection_occurred == "yes") %>% 
  select(insp_number) %>%
  separate_rows(insp_number) %>%
  unique() %>% 
  tibble() %>% 
  mutate(insp_number = as.numeric(insp_number))

# Filter for inspection numbers resulting from Walmart COVID-19 complaints in the violation database
federal_closed_walmart_complaints_inspected_cited = complete_federal_covid_violations_20210325 %>% 
  # Filter the inspection numbers from the violation sheet for the inspection numbers tied with closed walmart complaints
  filter(inspection_number %in% federal_closed_ois_walmart_complaints_20210321_insp_numbers$insp_number)

num_federal_closed_walmart_complaints_inspections_cited = nrow(federal_closed_walmart_complaints_inspected_cited)

# --------------------

# Calculate percentage of Walmart complaints whose inspections resulted in citation
walmart_inspection_to_citation_percent = federal_closed_ois_complaints_20210321 %>% 
  filter_walmart("estab_name") %>% 
  group_by(inspection_occurred) %>% 
  summarise(number_complaints_with_inspections = n()) %>% 
  filter(inspection_occurred == "yes") %>% 
  add_column("resulting_citations" = num_federal_closed_walmart_complaints_inspections_cited) %>% 
  mutate(percent_inspections_resulting_in_violations = round((resulting_citations/number_complaints_with_inspections)*100,2)) %>% 
  select(-inspection_occurred)

# Results --------------------
output_formatted_table(valid_closed_walmart_covid_complaints_to_inspections, "For Walmart, slightly fewer complaints resulted in inspections — 2.6%.")
For Walmart, slightly fewer complaints resulted in inspections — 2.6%.
no_inspection_occurred complaint_resulted_in_inspection pct_complaints_to_inspections
113 3 2.59
output_formatted_table(walmart_inspection_to_citation_percent, "No inspections led to a citation.")
No inspections led to a citation.
number_complaints_with_inspections resulting_citations percent_inspections_resulting_in_violations
3 0 0

FACT: More than two dozen Grants Pass Walmart OSHA complaints [CQ | Full & AP]

“In Grants Pass, Oregon, Walmart workers and customers filed more than two dozen complaints about unsafe conditions in 2020 with the state worker-safety agency. The store allowed customers to enter without masks and allowed customers to remove masks once inside, a complaint from September read. ‘Management tells us to say nothing and to not even bother … management with it,’ the complaint said.”

Explanation

Using complaint data provided by Oregon OSHA, we filtered for complaints about the Grants Pass Walmart on Terry Lane for September 2020. The data contains one record per “hazard” per complaint. To get unique complaints, we filtered for rows that have an item number of 1.

Supporting code and output

num_grants_pass_walmart_2020_complaints = filter_walmart(or_public_20210311, "employer_name")%>%
# Lowercase all the words in the address column to better filter it
  mutate(site_location = tolower(site_location))%>%
# Look for Grants Pass, Terry Lane location complaints
  filter(str_detect(site_location, "terry"))%>%
# Find complaints in 2020
  filter(intake_date >= as_date("2020-01-01") & intake_date <= as_date("2020-12-31")) %>% 
# The item number indicates a given hazard within a single complaint. 
# To get unique complaints, filter for just item counts that are 1
  filter(item_count == 1) %>% 
  nrow() 

# Create dataframe
grants_pass_walmart_2020_complaints = tibble("num_grants_pass_walmart_2020_complaints" = num_grants_pass_walmart_2020_complaints)
  
# Results --------------------
output_formatted_table(grants_pass_walmart_2020_complaints, "More than 2 dozen COVID-19 complaints at the Terry Lane Walmart in Grants Pass")
More than 2 dozen COVID-19 complaints at the Terry Lane Walmart in Grants Pass
num_grants_pass_walmart_2020_complaints
25
september_complaint_check<-filter_walmart(or_public_20210311, "employer_name")%>%
# Lowercase all the words in the address column to better filter it
  mutate(site_location = tolower(site_location))%>%
# Look for Grants Pass, Terry Lane location complaints
  filter(str_detect(site_location, "terry"))%>%
# Find complaints in September
  filter(intake_date > as_date("2020-09-01") & intake_date < as_date("2020-09-30")) %>% 
# Filter for specific complaint
  filter(description == "Still allowing customers to enter without masks, and allowing customers who enter to remove masks once inside. Management tells us to say nothing and to not even bother them (management with it).") 

# Results --------------------
output_formatted_table(september_complaint_check, "Quoted Complaint from the Terry Lane Walmart in Grants Pass, September 2020")
Quoted Complaint from the Terry Lane Walmart in Grants Pass, September 2020
complaint_number intake_date complaint_year code_ois_naics_vv business_type_desc mbusiness employer_name mname employer_dba site_location site_city site_state site_zip county_name item_count description hazard_location confidential anonymous complainant_type name is_walmart
NA 2020-09-22 0 452910 supercenter Grocery Store Wal-Mart Associates Inc Walmart NA 135 ne terry ln Grants Pass OR 97526 JOSEPHINE 1 Still allowing customers to enter without masks, and allowing customers who enter to remove masks once inside. Management tells us to say nothing and to not even bother them (management with it). NA Not confidential Anonymous NA wal-mart associates inc TRUE

FACT: 18 people between December and March [CQ | Full & AP]

“Between December and March, at least 18 people were infected in an outbreak linked to that Walmart. In January, while the outbreak was still active, another OSHA complaint alleged Walmart ‘may not be informing employees when in contact with a positive case for COVID-19 in the workplace.’”

Explanation

Using the complaint data provided by Oregon OSHA, we filtered for complaints about the Grants Pass Walmart on Terry Lane for January 2021.

Supporting code and output

# Create an object that feeds in Oregon Walmart outbreak information
terry_lane_fact_check <-or_outbreaks_20210324_walmart %>%
  # Lowercase all the addresses in the "address" column
  mutate(address = tolower(address))%>%
  # Return only rows that contain Terry Lane in them
  filter(str_detect(address, "terry")) %>%
  select(-X1)

# Results --------------------
output_formatted_table(terry_lane_fact_check, "At least 18 COVID-19 cases")
At least 18 COVID-19 cases
facility_name county first_reported total_cases total_deaths resolved_status category address investigation_start_date most_recent_onset students staff_volunteer report_name report_dates facility_name_squished address_squished begin_date name is_walmart
Walmart Josephine NA 18 NA active workplace 135 ne terry ln, grants pass, or 97526 2020-12-27 2021-02-14 NA NA COVID-19-Weekly-Outbreak-Report-2021-2-24-FINAL.csv 2021-2-24 walmart 135neterryln,grantspass,or97526 2020-12-27 walmart TRUE
# You can see, an outbreak in Grants Pass, Terry Ln. Walmart; 18 total related cases, investigation start date in December. Even though the last report date says February, our data goes through March 24th.

# Create object to check January complaints, and feed in Walmart complaints from Oregon OSHA complaints by using Walmart filter
january_complaint_check<- filter_walmart(or_public_20210311, "employer_name")%>%
  # Lowercase all the words in the address column to better filter it
  mutate(site_location = tolower(site_location))%>%
  # Look for Grants Pass, Terry lane location complaints
  filter(str_detect(site_location, "terry"))%>%
  # Find complaints in January
  filter(intake_date > as_date("2020-12-31") & intake_date < as_date("2021-02-01"))

# Results --------------------
output_formatted_table(january_complaint_check, "Quoted Complaint from the Terry Lane Walmart in Grants Pass, January 2021")
Quoted Complaint from the Terry Lane Walmart in Grants Pass, January 2021
complaint_number intake_date complaint_year code_ois_naics_vv business_type_desc mbusiness employer_name mname employer_dba site_location site_city site_state site_zip county_name item_count description hazard_location confidential anonymous complainant_type name is_walmart
209449896 2021-01-13 2021 452112 store Grocery Store Wal-Mart Associates Inc Walmart Wal-Mart Associates Inc 135 ne terry ln Grants Pass OR 97526 JOSEPHINE 1 Employer may not be informing employees when in contact with a positive case for COVID 19 in the workplace. Employer may not be effectively performing contract tracing within the establishment to make notice to employees who may have be in contact with a positive COVID 19 case. Employer may not be taking the steps to effectively sanitize work stations between employees. 135 NE Terry Ln Grants Pass, OR 97526; Bakery department, asset protection office, and pick up Confidential Not anonymous Withheld wal-mart associates inc TRUE

FACT: 98% of workplaces with outbreaks had no COVID-19 complaints [CQ | Full & AP]

"In Colorado, approximately 98% of all workplaces with reported COVID-19 outbreaks did not appear on OSHA’s list as of March 2021.

Explanation

We used a fuzzy matching library to match company names in Colorado outbreak data to company names in closed valid OSHA complaint data for Colorado. We then manually verified matches and non-matches to create a list of companies that had outbreaks that also had closed valid OSHA complaints and companies that did not. We also created a list of unique company locations in the outbreak data using OpenRefine. The fuzzy matching was executed in a separate python script file.One caveat: closed Colorado OSHA complaints from March 7, 2021 were compared to Colorado outbreak data from March 24, 2021. Those file versions were the two most recent available when we did our matching analysis.

Supporting code and output

# Read in Colorado unique locations that had outbreaks
co_unique_locations_outbreaks<- read_csv("data/state_foia/co/co_dedup_after_openrefine_FINAL.csv") %>% clean_names()
  
# Clean column names
co_unique_locations_outbreaks = co_unique_locations_outbreaks %>% 
  # Count locations
  group_by(setting_name, colorado_county_exposure_location)%>%
  summarise(locations_with_an_outbreak = n())

# Sum total number of unique locations with outbreaks 
unique_colorado_outbreak_locations = sum(co_unique_locations_outbreaks$locations_with_an_outbreak)

# Read in complaints matching data
colorado_outbreak_complaints_matches<-read_csv("data/state_foia/co/hand_verified_colorado_complaints_outbreaks_final.csv")

colorado_outbreak_complaints_matches = colorado_outbreak_complaints_matches %>% 
  clean_names()%>%
  # Filter to return rows where humans identified a match between a complaint and an outbreak
  filter(human_is_it_a_match_y_n_m == "y")%>%
  # Count the number of matches
  group_by(human_is_it_a_match_y_n_m)%>%
  summarise(number_of_matches = n())

# Add a column containing the total number of unique locations with outbreaks
colorado_outbreak_complaints_matches<- colorado_outbreak_complaints_matches %>% 
  add_column(num_unique_outbreak_locations= unique_colorado_outbreak_locations)%>%
  select(number_of_matches, num_unique_outbreak_locations)%>%
  # Get percentage of outbreaks with no matching complaint
  mutate(percent_outbreaks_with_no_matches = round(100 - ((number_of_matches/num_unique_outbreak_locations)*100),2))

# Results --------------------
output_formatted_table(colorado_outbreak_complaints_matches, "98% of workplaces with outbreaks had no COVID-19 complaints")
98% of workplaces with outbreaks had no COVID-19 complaints
number_of_matches num_unique_outbreak_locations percent_outbreaks_with_no_matches
68 3682 98.15

Fact: One-third of Colorado Walmart locations had outbreaks, no complaints [CQ | Full]

“One-third of Walmart facilities in the state had recorded outbreaks, but none of those 37 facilities appeared on OSHA’s list as of March 2021 – including two stores where a total of three employees died.”

Explanation

We filtered Colorado outbreak data for all unique Walmart facilities, and compared that number to total Walmart facilities in the state. The total number of Walmart-owned facilities in Colorado was extracted from Walmart’s corporate site. There are two Walmart distribution centers and 106 retail units. We also filtered closed valid OSHA complaints for Colorado Walmart facilities and checked for the existence of those facilities in Colorado outbreak data.

Supporting code and output

# Clean up Walmart outbreak names
co_outbreaks_20210324_walmart_clean = co_outbreaks_20210324_walmart %>%
  mutate(setting_name = tolower(setting_name)) %>%
  # Get rid of :month that appears
  mutate(setting_name_clean = sub(":.*", "", setting_name)) 
  

co_outbreaks_20210324_walmart_unique = co_outbreaks_20210324_walmart_clean %>%
  # Get store number
  mutate(store_number = str_extract(setting_name_clean, "#\\S+") ) %>%
  # Add store number to Walmart pharmacy in Delta Walmart
  mutate(store_number = case_when(
    # Add store number to Walmart pharmacy in Delta Walmart
    setting_name_clean == "walmart pharmacy" & colorado_county_exposure_location == "Delta" ~ "#5458",
    TRUE ~ store_number
  )) %>% 
  group_by(setting_name_clean, store_number) %>% 
  summarise(num = n())

# Numbers --------------------
number_unique_colorado_walmart_outbreaks = nrow(co_outbreaks_20210324_walmart_unique)
number_colorado_walmarts = 108

pct_co_walmart_outbreaks = tibble('number_unique_colorado_walmart_outbreaks' = number_unique_colorado_walmart_outbreaks, "number_colorado_walmarts" = number_colorado_walmarts ) %>% 
  mutate(pct_co_walmart_outbreaks = round((number_unique_colorado_walmart_outbreaks / number_colorado_walmarts)*100,2))

# Result --------------------
output_formatted_table(pct_co_walmart_outbreaks, "One-third of Colorado Walmart locations with outbreaks" )
One-third of Colorado Walmart locations with outbreaks
number_unique_colorado_walmart_outbreaks number_colorado_walmarts pct_co_walmart_outbreaks
37 108 34.26
co_ois_complaints_20210328_walmart = ois_closed_complaints_20210328 %>%
  filter(site_state == "CO") %>%
  filter_walmart("estab_name") %>%
  mutate(site_address_1 = str_squish(site_address_1)) %>%
  # Left join to store numbers that we got from looking up address
  left_join(co_walmart_complaints_addresses, by = c("site_address_1" = "address"))

# Answer how many of the Walmarts that had outbreaks have had a closed complaint
co_outbreaks_with_complaints = semi_join(co_outbreaks_20210324_walmart_unique, co_ois_complaints_20210328_walmart, by = "store_number")

num_co_outbreaks_with_complaints = nrow(co_outbreaks_with_complaints)

number_walmarts_with_outbreaks_that_had_osha_complaints = tibble("number_walmarts_with_outbreaks_that_had_osha_complaints" = num_co_outbreaks_with_complaints)

# Result --------------------
output_formatted_table(number_walmarts_with_outbreaks_that_had_osha_complaints, "Colorado Walmarts with outbreaks had no COVID-19 complaints" )
Colorado Walmarts with outbreaks had no COVID-19 complaints
number_walmarts_with_outbreaks_that_had_osha_complaints
0
# Show that the stores with deaths did not have an OSHA complaint
co_outbreaks_20210324_walmart_deaths = co_outbreaks_20210324_walmart %>% 
  filter(total_staff_deaths > 0) %>% 
  mutate(store_number = str_extract(name, "#\\S+") ) %>%
  select(setting_name,colorado_county_exposure_location,date_outbreak_was_considered_closed,total_staff_deaths)


total_co_walmart_deaths = sum(co_outbreaks_20210324_walmart_deaths$total_staff_deaths)

co_walmart_deaths_that_had_a_complaint = co_ois_complaints_20210328_walmart %>% filter(store_number %in%  co_outbreaks_20210324_walmart_deaths$store_number)

num_co_walmart_stores_with_deaths_that_had_a_complaint = tibble(num_co_walmart_stores_with_deaths_that_had_a_complaint = nrow(co_walmart_deaths_that_had_a_complaint), total_co_walmart_deaths = total_co_walmart_deaths )

# Result --------------------
output_formatted_table(co_outbreaks_20210324_walmart_deaths, "Two Walmarts where a total of three employees died")
Two Walmarts where a total of three employees died
setting_name colorado_county_exposure_location date_outbreak_was_considered_closed total_staff_deaths
Walmart #1434 El Paso NA 1
Walmart #1492 Arapahoe 2020-06-09 2
output_formatted_table(num_co_walmart_stores_with_deaths_that_had_a_complaint, "The two Walmarts with deaths also had no COVID-19 complaints")
The two Walmarts with deaths also had no COVID-19 complaints
num_co_walmart_stores_with_deaths_that_had_a_complaint total_co_walmart_deaths
0 3

FACT: Complaint rate five times higher in state-plan states [CQ | Full & AP]

“In states where state agencies oversee most workplaces, the rate of complaints was five times higher than in states where the federal government exclusively oversees workplace safety.”

Explanation

We compared the number of valid closed complaints in state plan states to the number of valid closed complaints in states covered by federal OSHA. We did not include partial plan states in our analysis. We used total employment figures from the Bureau of Labor Statistics to weight the comparison.

Supporting code and output

complaints_by_plan = ois_closed_complaints_20210328 %>%
  # Join dataframe with a table that describes the OSHA plan type for each state
  left_join(federal_state_plan_key, by = c("site_state" = "state")) %>%
  # Exclude partial plan states
  filter(coverage != "state-local") %>%
  # Group and count by state key.
  group_by(site_state, plan_type) %>%
  count() %>%
  inner_join(worker_data, by = c("site_state" = "state.y")) %>%
  #filtering out PR because it is not a state
  filter(site_state != "PR") %>%
  group_by(plan_type) %>%
  summarise(total_complaints = sum(n), total_employees = sum(total_employment)) %>%
  mutate(complaints_per_10000_workers = (total_complaints/total_employees)*10000)

complaints_by_plan_output = complaints_by_plan %>%
  mutate(complaints_per_10000_workers = round(complaints_per_10000_workers,2))
  
complaints_by_plan_result = complaints_by_plan %>% 
  # Select necessary columns
  select(plan_type, complaints_per_10000_workers) %>% 
  # Pivot wider
  pivot_wider(names_from = plan_type, values_from = complaints_per_10000_workers) %>% 
  # Rename column names
  rename(federal_complaint_per_10K_workers = federal_plan, state_complaint_per_10K_workers = state_plan) %>% 
  # Calculate how many times higher rate of complaints in states where state agencies oversee most workplaces versus where the federal government exclusively oversees workplace safety
  mutate(ratio = state_complaint_per_10K_workers/federal_complaint_per_10K_workers) %>%
  mutate(across(everything(),function(x) round(x,2))) %>%
  mutate(ratio = paste0("1:",round(ratio,0)))

# Result --------------------
output_formatted_table(complaints_by_plan_output, "Complaint rates")
Complaint rates
plan_type total_complaints total_employees complaints_per_10000_workers
federal_plan 8520 63512500 1.34
state_plan 37447 58588200 6.39
output_formatted_table(complaints_by_plan_result, "Calculated ratio")
Calculated ratio
federal_complaint_per_10K_workers state_complaint_per_10K_workers ratio
1.34 6.39 1:5

FACT: Inspections from COVID-19 complaints in Oregon Walmarts [CQ | Full & AP]

“In Oregon, which is among the states with the most COVID-19 complaints for Walmart, only one complaint led to an inspection, and not until March 22, 2021 – more than a year after the pandemic started.”

Explanation

We compared total closed Walmart COVID-19 complaints that appeared in federal OSHA database for each state. We filtered the list of inspections that Oregon’s state OSHA provided for Walmart facilities. In the state inspection list, one Walmart appears. We confirmed with Oregon’s state OSHA that this inspection was opened on Dec. 5, 2020 and was not in response to a complaint, so we excluded it from this analysis. In an email shortly before our publication deadline, an agency spokesperson said that on March 22, 2021 the agency opened an investigation into Walmart in response to a COVID-19 complaint. It does not appear in our data, which is current through March 16, 2021.

Supporting code and output

walmart_complaints_by_state = ois_closed_complaints_20210328 %>%
  filter_walmart("estab_name") %>%
  group_by(site_state) %>%
  count() %>% 
  arrange(desc(n)) %>%
  rename(number_complaints = n)

# Result --------------------
# Ranking of walmart complaints by state using OIS data
output_formatted_table(head(walmart_complaints_by_state), "States with the most closed valid Walmart COVID-19 Complaints")
States with the most closed valid Walmart COVID-19 Complaints
site_state number_complaints
OR 104
CA 72
IN 71
NC 42
PR 25
NV 23
# Look at Oregon inspection data complaints from OR OSHA --------------------
# Create a new object, filter Oregon inspections to return only Walmart inspections, only one Walmart that appears, not marked as related to a COVID-19 complaints
or_inspections_walmart_state_data <- filter_walmart(or_inspections_20210316, "legal_name")

# Result --------------------
output_formatted_table(or_inspections_walmart_state_data,"Walmart inspection from a Fatality/Catastrophe")
Walmart inspection from a Fatality/Catastrophe
inspection_no federal_fiscal_year inspection_open_date type category scope inspected_naics description employer_no location_no legal_name loc_city loc_county close_conference_date citation_issued initial_penalty citation_green_card_date inspection_related_to_a_covid_complaint inspection_marked_as_covid_related inspection_marked_as_covid_related_remote covid_food_processing_emphasis_inspection name is_walmart
317728094 2021 2020-12-05 Fatality/Catastrophe Health Partial inspection 452112 DISCOUNT DEPARTMENT STORES 2019842 29 Wal-Mart Associates Inc Happy Valley CLACKAMAS . NA NA . NA X NA NA wal-mart associates inc TRUE

FACT: 10 Oregon Walmarts with outbreaks [CQ | Full & AP]

“As of March 24, at least 10 Walmart locations in the state were linked to outbreaks with over five cases, including a Hermiston distribution center linked to 124 recorded cases.”

Explanation

We determined the number of unique Walmart locations that appeared in Oregon’s outbreak data. According to the Oregon Health Authority, “OHA is only reporting workplace outbreaks with five or more cases and only for workplaces with at least 30 workers … Case counts include all persons linked to the outbreak, which may include household members and other close contacts.”

Supporting code and output

# Get Walmart-related outbreaks in OR:
or_walmart_outbreaks_unique <- or_outbreaks_20210324_walmart %>%
  # Get rid of extra whitespaces
  mutate(address = str_squish(address)) %>% 
  # Filter to one outbreak report per Walmart.  
  distinct(address, .keep_all=TRUE) %>%
  select(-X1)

# Count of stores
num_or_outbreaks_20210324_walmart_unique = or_walmart_outbreaks_unique %>%
  summarize(unique_walmart_facilities_w_outbreaks = n())

# Result --------------------
output_formatted_table(num_or_outbreaks_20210324_walmart_unique, "Count of unique Walmart COVID-19 outbreak locations in Oregon")
Count of unique Walmart COVID-19 outbreak locations in Oregon
unique_walmart_facilities_w_outbreaks
10
# Result --------------------
output_formatted_table(or_walmart_outbreaks_unique, "List of unique Walmart COVID-19 outbreak locations")
List of unique Walmart COVID-19 outbreak locations
facility_name county first_reported total_cases total_deaths resolved_status category address investigation_start_date most_recent_onset students staff_volunteer report_name report_dates facility_name_squished address_squished begin_date name is_walmart
Walmart Clackamas NA 8 NA resolved workplace 10000 SW 82nd Ave, Happy Valley, OR 97086 2020-12-01 2020-12-12 NA NA COVID-19-Weekly-Outbreak-Report-2021-1-13-FINAL.csv 2021-1-13 walmart 10000sw82ndave,happyvalley,or97086 2020-12-01 walmart TRUE
Walmart Linn NA 8 NA active workplace 1330 Goldfish Farm Rd SE, Albany, OR 97322 2020-11-26 2020-12-01 NA NA COVID-19-Weekly-Outbreak-Report-2020-12-16-FINAL.csv 2020-12-16 walmart 1330goldfishfarmrdse,albany,or97322 2020-11-26 walmart TRUE
Walmart Umatilla NA 58 NA active workplace 1350 N 1st St, Hermiston, OR 97838 2020-07-09 2020-09-23 NA NA COVID-19-Weekly-Report-2020-09-30-FINAL.xlsx 2020-09-3 walmart 1350n1stst,hermiston,or97838 2020-07-09 walmart TRUE
Walmart Josephine NA 18 NA active workplace 135 NE Terry Ln, Grants Pass, OR 97526 2020-12-27 2021-02-14 NA NA COVID-19-Weekly-Outbreak-Report-2021-2-24-FINAL.csv 2021-2-24 walmart 135neterryln,grantspass,or97526 2020-12-27 walmart TRUE
Walmart Malheur NA 10 NA active workplace 1775 E Idaho Ave, Ontario, OR 97914 2020-07-01 2020-08-08 NA NA COVID-19-Weekly-Report-2020-08-19-FINAL.xlsx 2020-08-19 walmart 1775eidahoave,ontario,or97914 2020-07-01 walmart TRUE
Walmart Supercenter Marion NA 5 NA active workplace 2035 Lancaster Dr NE, Salem, OR 97305 2021-01-11 2021-01-02 NA NA COVID-19-Weekly-Outbreak-Report-2021-1-27-FINAL.csv 2021-1-27 walmartsupercenter 2035lancasterdrne,salem,or97305 2021-01-11 walmart supercenter TRUE
Walmart Coos NA 11 NA resolved workplace 2051 Newmark Ave, Coos Bay, OR 97420 2020-10-16 2020-10-18 NA NA COVID-19-Weekly-Outbreak-Report-2020-12-16-FINAL.csv 2020-12-16 walmart 2051newmarkave,coosbay,or97420 2020-10-16 walmart TRUE
Walmart Distribution Center Umatilla NA 124 NA active workplace 2650 US-395, Hermiston, OR 97838 2020-06-30 2021-03-07 NA NA COVID-19-Weekly-Outbreak-Report-2021-3-17-FINAL.csv 2021-3-17 walmartdistributioncenter 2650us-395,hermiston,or97838 2020-06-30 walmart distribution center TRUE
Walmart Supercenter Lane NA 13 NA active workplace 2659 Olympic St, Springfield, OR 97477 2021-02-18 2021-02-23 NA NA COVID-19-Weekly-Outbreak-Report-2021-3-10-FINAL.csv 2021-3-1 walmartsupercenter 2659olympicst,springfield,or97477 2021-02-18 walmart supercenter TRUE
Walmart Multnomah NA 19 NA active workplace 4200 SE 82nd Ave, Portland, OR 97266 2020-11-28 2020-12-11 NA NA COVID-19-Weekly-Outbreak-Report-2020-12-23-FINAL.csv 2020-12-23 walmart 4200se82ndave,portland,or97266 2020-11-28 walmart TRUE
# Point out the Hermiston outbreak, with 124 cases:
hermiston_outbreak <- or_walmart_outbreaks_unique %>% filter(facility_name == 'Walmart Distribution Center')

# Result --------------------
output_formatted_table(hermiston_outbreak, "Outbreak information on Walmart distribution center in Hermiston")
Outbreak information on Walmart distribution center in Hermiston
facility_name county first_reported total_cases total_deaths resolved_status category address investigation_start_date most_recent_onset students staff_volunteer report_name report_dates facility_name_squished address_squished begin_date name is_walmart
Walmart Distribution Center Umatilla NA 124 NA active workplace 2650 US-395, Hermiston, OR 97838 2020-06-30 2021-03-07 NA NA COVID-19-Weekly-Outbreak-Report-2021-3-17-FINAL.csv 2021-3-17 walmartdistributioncenter 2650us-395,hermiston,or97838 2020-06-30 walmart distribution center TRUE

FACT: Nearly every one of New Mexico Walmart workplaces had reported cases of COVID-19 [CQ | Full & AP]

“In New Mexico, nearly every one of Walmart’s 54 workplaces has reported cases of employees testing positive for coronavirus, according to data from the state’s Environment Department.”

Explanation

We filtered New Mexico workplace-level COVID-19 data for Walmart facilities to obtain the number of unique Walmart locations. We then compared that to the overall number of Walmart facilities in the state. The total number of Walmart-owned facilities in New Mexico was extracted from Walmart’s corporate site. There are 53 retail units and one Walmart distribution center. New Mexico defines a “rapid response” COVID-19 event as “one or more employees test[ing] positive for COVID-19.”

Supporting code and output

## NM -------------------------

nm_covid_occurrences_20210321_walmart_unique = nm_covid_occurrences_20210321_walmart %>%
  mutate(address_clean = case_when(
                      # This is a fuel center, part of a Sam's Club that's already in the dataset.
                     address == "4400 N MAIN ST # A" ~ "4400 N MAIN ST",
                     # This is an auto care center, part of a Walmart that's already in the dataset.
                     address == "460 NEW MEXICO HIGHWAY 528" ~ "460 HIGHWAY 528",
                     # The Walmart in "Eunice" NM, is actually the one in Hobbs.
                     address == "3800 LOVINGTON HWY" ~ "3800 N LOVINGTON HWY",
                     TRUE ~ address))%>%
  group_by(address_clean, county ) %>%
  summarise(num = n())



# Numbers --------------------
# 52 Walmart workplaces remain:
num_nm_covid_occurrences_20210321_walmart_unique = nrow(nm_covid_occurrences_20210321_walmart_unique)
# Number of walmarts in New Mexico
number_nm_walmarts = 54

# Result ---------------------
pct_nm_walmarts_with_covid = tibble("number_unique_nm_walmarts_with_covid" = num_nm_covid_occurrences_20210321_walmart_unique, "number_newmexico_walmarts" = number_nm_walmarts) %>% mutate("pct_nm_walmarts_with_covid" = round((number_unique_nm_walmarts_with_covid/number_newmexico_walmarts)*100,2))

output_formatted_table(pct_nm_walmarts_with_covid, "Unique New Mexico Walmart facilities with COVID-19 cases")
Unique New Mexico Walmart facilities with COVID-19 cases
number_unique_nm_walmarts_with_covid number_newmexico_walmarts pct_nm_walmarts_with_covid
52 54 96.3

FACT: At least 387 complaints after attorneys general letter [CQ | Full]

“Attorneys general in 11 states and the District of Columbia expressed concern in a letter to Walmart CEO Doug McMillon, citing reports of symptomatic employees remaining”undiagnosed due to the lack of testing capacity" and sick workers being pressured to work by store managers. There were at least 85 complaints about Walmart or Walmart-owned Sam’s Club stores and distribution centers in those states after the letter was sent, and at least 387 across the country, according to the federal OSHA COVID-19 complaints database."

Explanation

We filtered the federal database of closed OSHA complaints related to COVID-19 for complaints at Walmart facilities after June 2, 2020, the date of the letter from the attorneys general. We further filtered the data to identify Walmart complaints in the 11 states (and Washington, D.C.) represented by the attorneys general who signed the letter.

Supporting code and output

# The letter was sent June 2, 2020. Walmart complaints after June 2, 2020
walmart_complaints_after_ag_letter <- ois_closed_complaints_20210328 %>%
    filter_walmart("estab_name") %>% 
    filter(upa_receipt_date >"2020-06-02")

# Walmart complaints from states where attorneys general sent letter to Walmart
walmart_complaints_after_ag_letter_specificStates = walmart_complaints_after_ag_letter %>% 
  filter(site_state %in% c("CA", "CT", "DE", "DC", "IL", "MA", "MI", "MN", "NM", "PA", "WA", "MD"))

# Result ---------------------
num_walmart_complaints_after_ag_letter = tibble("complaints_from_states_where_AG_sent_letter" = nrow(walmart_complaints_after_ag_letter_specificStates),"complaints_from_across_country" = nrow(walmart_complaints_after_ag_letter))

output_formatted_table(num_walmart_complaints_after_ag_letter, "Walmart Complaints after Attorneys General Letter")
Walmart Complaints after Attorneys General Letter
complaints_from_states_where_AG_sent_letter complaints_from_across_country
85 387

FACT: About 20% of Utah OSHA COVID-19 complaints were referred to local health departments [CQ | Full]

“From 2020 to 2021, Utah’s state OSHA referred about 20% of people complaining about COVID-19 hazards to local health departments.”

Explanation

We calculated the percentage of Utah OSHA complaints whose outcome was “Referred Complainant to Health Dept - Outside UOSH Jurisdiction.”

Supporting code and output

utah_osha_complaints_summary = ut_osha_complaints_202003_to_202103 %>%
  # Group complaints by outcome type.
  group_by(outcome) %>%
  # Calculate number of complaints with each outcome type.
  summarise(num_referred = n()) %>%
  mutate(total_complaints = sum(num_referred)) %>% 
  # What percent of the total was each outcome type?
  mutate(percent = round(100* num_referred / total_complaints,2)) %>% 
  filter(outcome == "Referred Complainant to Health Dept - Outside UOSH Jurisdiction")

# Result ---------------------
output_formatted_table(utah_osha_complaints_summary, "Percent of Utah OSHA COVID-19 complaints referred to health departments")
Percent of Utah OSHA COVID-19 complaints referred to health departments
outcome num_referred total_complaints percent
Referred Complainant to Health Dept - Outside UOSH Jurisdiction 323 1539 20.99

-30-