This data notebook contains the analysis that generated facts in the story “As Walmart sales soared, workers got scant COVID-19 protection from OSHA” from the series “Essential and Exposed”. For each sentence in the story generated by original data analysis, we have provided the corresponding code and results.
# For general data science
library(tidyverse)
# For data cleaning
library(janitor)
# For loading Excel files
library(readxl)
# For working with datetime
library(lubridate)
# For U.S. Census Bureau data
library(tigris)
# For pretty tables
library(kableExtra)
library(knitr)
## Function to filter for Walmart facilities
# We created a function to identify all Walmart-owned facilities across various datasets. Those facilities include retail stores and distribution centers for either Walmart or Sam's Club.
filter_walmart = function(dataframe, company_name_column){
walmart_pattern = "walmart|wal mart|wal-mart|sam club|sams club|sam's club"
walmart_related = dataframe %>%
# Make all establishment names lowercase.
mutate(name = tolower(!! sym(company_name_column))) %>%
# Create a new column that detects whether establishment names include any variations on Walmart or Sam's Club.
mutate(is_walmart = str_detect(name, walmart_pattern )) %>%
# Filter for establishments that are part of the Walmart company.
filter(is_walmart == TRUE)
return(walmart_related)
}
## Function for formatted table output
output_formatted_table = function(table, text){
table %>%
kable(caption = text) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), font_size = 14, fixed_thead = T) %>%
scroll_box(width = "100%", height = "200px")
}
# Function to download inspection and violation data
## This will download latest version of inspection data. For our story, we used data through 2021-05-17.
## The code to download inspection and violation data was modified from [code released by the Center for Public Integrity](https://github.com/PublicI/osha-fatality-inspections).
download_osha_data <- function(folder, data_directory){
file_date <- str_remove_all(Sys.Date(), "-") # Set file date
data_path <- folder
data_dir <- data_directory
data_url <- "https://enfxfr.dol.gov/data_catalog/OSHA"
data_file <- paste0(data_directory, file_date, ".csv.zip")
if (!file.exists(file.path(data_path, data_dir))) {
dir.create(data_path)
download.file(paste(data_url, data_file, sep = "/"),
file.path(data_path, data_file))
unzip(file.path(data_path, data_file),
exdir = file.path(data_path, data_dir)) }
if(data_directory == "osha_inspection_"){
return(
list.files(
path = file.path(data_path, data_directory),
pattern = "*.csv",
full.names = TRUE
) %>%
map_dfr(
read_csv,
col_types = cols(
.default = "c",
open_date = col_date(),
case_mod_date = col_date(),
close_conf_date = col_date(),
close_case_date = col_date()
)
)
)
}
if(data_directory == "osha_violation_"){
return(
list.files(
path = file.path(data_path, data_directory),
pattern = "*.csv",
full.names = TRUE
) %>%
map_dfr(
read_csv,
col_types = cols(
.default = "c",
issuance_date = col_date(),
abate_date = col_date(),
load_dt = col_date(),
fta_final_order_date = col_date(),
fta_contest_date = col_date(),
final_order_date = col_date(),
contest_date = col_date(),
fta_issuance_date = col_date(),
current_penalty = col_number(),
fta_penalty = col_number(),
initial_penalty = col_number()
)
)
)
}
}
We examined closed COVID-19 complaints published by the federal Occupational Safety and Health Administration. The data includes all valid complaints received by OSHA since the start of the pandemic. Federal and state OSHA offices submit their data to the federal database. However, certain data may be unique to the state’s operating system or may not be transferred to federal OSHA. This means that, while state OSHA complaints are accessible via the federal OSHA website, totals may not always be comprehensive.
Federal OSHA’s database only reveals companies’ names when the complaints associated with them are closed. Therefore, when analyzing complaints for Walmart on a national level, analysis is based on closed complaints. Source data is available at OSHA.gov’s Covid-19 page.
OSHA updates the complaints data weekly. We used complaints data spanning various date ranges to match the date ranges of other datasets used in our analysis. Variable names involving “OIS,” which stands for OSHA Information System, indicate the federal database.
# Function to clean closed complaints
clean_ois_closed_complaints = function(database){
result_database = database %>%
# Take out the first two rows because they aren't row headings.
remove_empty() %>%
slice(-1) %>%
# Bring row headings to the first line of the data set.
row_to_names(1) %>%
# Clean column names
clean_names() %>%
# rid is the osha office number
#first two numbers are OSHA office region, if the 3rd digit is 5 it is a state_plan office, else its a federal plan office
mutate(osha_office_region = substr(rid, 1,2), office_type_number = substr(rid, 3,3), office_type = ifelse(office_type_number == "5", "state_plan_office", "federal_plan_office") ) %>%
#make a column to identify if an inspection occurred
mutate(inspection_occurred = ifelse(is.na(insp_number) == FALSE,'yes', 'no')) %>%
#convert date
mutate(upa_receipt_date = excel_numeric_to_date(as.numeric(upa_receipt_date)))
return(result_database)
}
# Read in closed complaints for 2021-03-07 --------------------
ois_closed_complaints_20210307 = read_excel("../etl/federal_osha_complaints/input_data/Closed_Federal_State_Plan_COVID-19_Complaints_Through_0307_2021.xlsx") %>%
clean_ois_closed_complaints()
# Read in closed complaints for 2021-03-21 --------------------
ois_closed_complaints_20210321 = read_excel("../etl/federal_osha_complaints/input_data/Closed_Federal_State_Plan_Valid_COVID-19_Complaints_Through_0321_2021.xlsx") %>%
clean_ois_closed_complaints()
# Read in closed federal complaints database for 2021-03-28 --------------------
ois_closed_complaints_20210328 = read_excel("../etl/federal_osha_complaints/input_data/Closed_Federal_State_Plan_Valid_COVID-19_Complaints_Through_0328_2021.xlsx") %>%
clean_ois_closed_complaints()
# Federal citations data --------------------
federal_covid_violations_20210325 = read_excel("data/federal_pipeline/COVID Case Tracker - Cases with Issued Violations as of 2021-03-25.xlsx") %>%
clean_names() %>%
# Create a match_nr from the inspection number to match with data downloaded from OSHA
mutate(match_nr = str_sub(inspection_number, 2,7)) %>%
mutate(estab_name = tolower(establishment_name))
The dataset we used consists of inspection case details for OSHA inspections conducted annually. According to the OSHA website, the data “includes information regarding the impetus for conducting the inspection, and details on citations and penalty assessments resulting from violations of OSHA standards.”
The data can be viewed on on OSHA’s enforcement data page. The code to download inspection and violation data was modified from code released by the Center for Public Integrity.
# Download inspections and violations data
## These lines are commented so as not to execute when the markdown file is knitted to produce html.
#df.inspections = download_osha_data("data/osha_inspections_violations", "osha_inspection_", file_date)
#df.violations = download_osha_data("data/osha_inspections_violations", "osha_violation_", file_date)
# This are the above datasets that were downloaded as of 2021-05-07. In order to fit on Git, the data was filtered from 2020-01-01 to 2021-05-07
df.inspections = read_csv("inspections_20200101_20210507.csv")
df.violations = read_csv("violations_20200101_20210507.csv")
## The citation sheet that OSHA provided has inspection numbers, while the enforcement data has activity numbers. The activity number is a longer number with the inspection number contained within it. In order to match them we followed this method:
## 1) Filter enforcement data for 2020 to present.
## 2) Create a unique identifier to match inspection numbers to activity numbers. Example:
## activity number = 34[452565]4
## inspection number = 1[452565]
# Use OSHA's data key to label the data
df.inspections_cleaned = df.inspections %>%
# Turn activity number into a number that we can match with the inspection number provided in federal COVID-19 citation list
mutate(match_nr = str_sub(activity_nr, 3,8)) %>%
# Break apart the RID to interpret: first two numbers are OSHA office region, if the 3rd digit is 5 it is a state_plan office, else its a federal plan office
mutate(osha_office_region = substr(reporting_id, 1,2),
office_type_number = substr(reporting_id, 3,3),
office_type = ifelse(office_type_number == "5", "state_plan_office", "federal_plan_office") ) %>%
mutate(estab_name = tolower(estab_name)) %>%
# Add year
mutate(year = year(open_date))
df.violations_cleaned = df.violations %>%
# Create a number from activity number that we can match with inspection number
mutate(match_nr = str_sub(activity_nr, 3,8)) %>%
# Create a year column from issuance date
mutate(year = year(issuance_date))
# Transform inspection number into a match number to match with inspections database
federal_covid_violations_20210325 = federal_covid_violations_20210325 %>%
mutate(match_nr = str_sub(inspection_number, 2,7)) %>%
mutate(estab_name = tolower(establishment_name))
We obtained data on COVID-19 complaints and inspections from Oregon state OSHA through a public records request. This data allowed us to see which inspections and violations in the state were related to COVID-19 — a detail that could not be determined from federal data. The state data also showed complaint descriptions, which were not included for Oregon in the federal database.
# Read in Oregon state OSHA data received through public records request
or_public_20210311 = read_excel("sources/state_foias/OR/Public Complaint Workbook -- March 22 2021 based on March 11 data.xlsx", sheet = "covid") %>%
clean_names()
or_inspections_20210316 <- read_excel("sources/state_foias/OR/Public copy - Insp and vios - COVID - updated 03162021.xlsx", sheet = "Inspections", skip = 2) %>%
clean_names() %>%
mutate(initial_penalty = as.numeric(initial_penalty))
We obtained data on COVID-19 complaints from Utah state OSHA through a public records request. The data covers March 2020 to March 2021.
# Utah OSHA state data received through a public records request -------------
ut_osha_complaints_202003_to_202103 = read_excel("sources/state_foias/UT/UOSH COVID-19 Complaints March 2020 - March 2021.xlsx", skip = 1) %>%
clean_names()
We obtained health department data tracking COVID-19 in workplaces in 10 states. States rarely disclose company-level outbreak data publicly.
In three states, health departments publish company-level COVID-19 data online.
We attempted to obtain company-level data from every remaining state through a public records request to the state health department. We were only able to obtain it in seven: Arkansas, Kansas, Maine, Rhode Island, Utah, Vermont and Washington. In most other instances, states cited exemptions in state public records laws in refusing to disclose the data or did not respond to repeated requests.
## Load company level data
# AR --------------------
ar_occurrences <- read_csv("../etl/state_health_department_outbreaks/input_data/arkansas/worker_illness_cleaned_master.csv") %>%
mutate(date=as_date(date))
# CO --------------------
# Read in active outbreaks
co_outbreaks_active_20210324 = read_excel("../etl/state_health_department_outbreaks/input_data/colorado/covid-19_ob_weekly_report_03_24_2021.xlsx",
sheet = "Active", col_types = c("text",
"text", "date", "text", "text", "text",
"text", "date", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric")) %>% clean_names()
# Read in resolved outbreaks
co_outbreaks_resolved_20210324 = read_excel("../etl/state_health_department_outbreaks/input_data/colorado/covid-19_ob_weekly_report_03_24_2021.xlsx",
sheet = "Resolved", col_types = c("text",
"text", "date", "text", "text", "text",
"text", "date", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric")) %>% clean_names()
# Combine active and resolved outbreaks
co_outbreaks_20210324 = bind_rows(co_outbreaks_active_20210324, co_outbreaks_resolved_20210324)
# Read in list of unique Walmart locations in Colorado OSHA complaints
co_walmart_complaints_addresses <- read_excel("sources/co_walmart_complaints_outbreaks_addresses.xlsx") %>%
mutate(address = str_squish(address))
# KS --------------------
ks_outbreaks_20200922_to_20210210 = read_excel("../etl/state_health_department_outbreaks/input_data/kansas/kansas_structured_20200922-20210210.xlsx",
col_types = c("text", "text", "text",
"text", "numeric", "date", "text"))%>% clean_names()
# ME --------------------
me_outbreaks_2021026 = read_excel("../etl/state_health_department_outbreaks/input_data/maine/Maine_outbreaks_structured_2021-02-26.xlsx",
col_types = c("text", "date", "date",
"text", "text")) %>% clean_names()
# NM --------------------
nm_covid_occurrences_20210321 = read_excel("../etl/state_health_department_outbreaks/input_data/new_mexico/new_mexico_rapid_responses_all_20210321.xlsx",
col_types = c("date", "text", "text",
"text", "text", "text", "text", "numeric")) %>% clean_names()
# OR --------------------
or_outbreaks_20210324 = read_csv("../etl/state_health_department_outbreaks/input_data/oregon/or_outbreaks_combined_cleaned_20200414_20210324.csv") %>% mutate(begin_date = ifelse(is.na(investigation_start_date), as.Date(first_reported), as.Date(investigation_start_date))) %>%
mutate(begin_date = as_date(begin_date))
# RI --------------------
ri_outbreaks_20210225 = read_excel("../etl/state_health_department_outbreaks/input_data/rhode_island/Rhode Island Occupational Outbreaks Data 02252021.xlsx",
skip = 2) %>% clean_names()
# UT --------------------
ut_outbreaks_20210319 = read_csv("../etl/state_health_department_outbreaks/input_data/utah/utah_outbreaks_2021_03_19.csv",
col_types = cols(`Outbreak Date` = col_date(format = "%m/%d/%Y"))) %>% clean_names()
ut_outbreaks_addresses = read_excel("sources/ut_outbreaks_addresses.xlsx") %>% clean_names()
# VT --------------------
vt_outbreaks_20210211 = read_csv("../etl/state_health_department_outbreaks/input_data/vermont/vermont_2021-02-11.csv",
col_types = cols(data_resolved = col_date(format = "%m/%d/%Y"), data_outbreak_reported = col_date(format = "%m/%d/%Y")))
# WA --------------------
# Read in data extracted from pdf. Part 1 includes first set of columns needed for the analysis.
washington_walmart_sheet_1 <- read_csv("sources/washington/tabula-covid_outbreaks_washington_Part1.csv") %>%
clean_names() %>%
filter(case_id != "CASE_ID") %>%
filter(case_id != "46 Vulnerable Ind") %>%
filter_walmart("case_name") %>%
mutate(case_name_copy = tolower(name)) %>%
mutate(case_name_copy = str_replace_all(case_name_copy,"walla walla","walla-walla")) %>%
separate(case_name_copy,sep=" ", into=c("year","county","covid","d","e","f","g","h")) %>%
mutate(d = "walmart") %>%
mutate(store_name = paste(d,e,f,g,sep=" ")) %>%
mutate(store_name = str_remove_all(store_name,"NA|mcdonald's|‐ |‐|#| night shift|night|\\(|\\)|pharmacy, |spokane| mart| walmart")) %>%
mutate(store_name = str_replace_all(store_name," C"," C")) %>%
mutate(store_name = str_replace_all(store_name," "," ")) %>%
mutate(store_name = tolower(str_trim(store_name))) %>%
mutate(store_name = str_replace(store_name,"n colton walmart","walmart colton st")) %>%
mutate(store_name = case_when(
str_detect(store_name,"colton") ~ "walmart colton st",
case_id == 101922141 ~ "walmart richland 2",
TRUE ~ store_name)) %>%
select(-(covid:h)) %>%
mutate(unique_store_name_id = "x") %>%
mutate(unique_store_name_id_notes = "x") %>%
select(case_id, case_name,county,store_name,unique_store_name_id,unique_store_name_id_notes,year,everything(),-status,-contains("cluster"),-name,-is_walmart) %>%
arrange(county,store_name)
# Read in data extracted from pdf. Part 5 includes next set of columns needed for the analysis.
washington_walmart_sheet_5 <- read_csv("sources/washington/tabula-covid_outbreaks_washington_Part5.csv") %>%
clean_names() %>%
filter(case_id != "CASE_ID")
# Join Sheets 1 and 2, filter for only needed columns
washington_walmart_join <- washington_walmart_sheet_1 %>%
left_join(washington_walmart_sheet_5,by="case_id") %>%
distinct() %>%
select(case_id:investigation_status,contains("mployees_confirmed"),contains("mployees_probable"))
# We wrote out the results of the joined data set. Through research, we manually assigned the official Walmart store number to each store referenced in the outbreak reports from Washington state.
washington_walmart_unique_identifiers <- read_csv("sources/washington/walmart_unique_identifiers.csv") %>%
select(case_id,final_unique_store_id) %>%
mutate(unique_store_name_option_a = final_unique_store_id) %>%
separate(unique_store_name_option_a, into=c("unique_store_name_option_a","unique_store_name_option_b"), sep=" OR | or ") %>%
mutate(unique_store_name_option_b = case_when(
is.na(unique_store_name_option_b) ~ unique_store_name_option_a,
TRUE ~ unique_store_name_option_b
)) %>%
mutate(across(everything(),str_trim))
# We joined the Walmart unique identifiers list with the the Washington Walmart outbreaks dataframe
washington_walmarts <- washington_walmart_join %>%
inner_join(washington_walmart_unique_identifiers, by="case_id") %>%
select(case_id, unique_store_name_option_a, unique_store_name_option_b, unique_store_options = final_unique_store_id, covid19_employees_confirmed = covid19_e_mployees_confirmed, covid19_employees_probable = covid19_e_mployees_probable) %>%
mutate(covid19_employees_confirmed = as.numeric(replace_na(covid19_employees_confirmed, 0))) %>%
mutate(covid19_employees_probable = as.numeric(replace_na(covid19_employees_probable, 0))) %>%
mutate(total_cases = covid19_employees_confirmed + covid19_employees_probable)
We obtained data on state-level, seasonally-adjusted worker population from the Bureau of Labor Statistics. We used data from December 2020.
# Load 2020 employment data
worker_data<- read_excel("data/weighted_workers/employment_stats.xlsx") %>%
clean_names() %>%
# Total employment numbers are truncated (column is listed in units of 1,000). Create a new column to show the full number
mutate(total_employment = total_employment_thousands * 1000)
region_state_key <- read_excel("data/weighted_workers/region_state_key.xlsx",
col_types = c("text", "text"))
# Load key that describes type of workplaces covered, based on https://www.osha.gov/stateplans
federal_state_plan_key <- read_excel("data/weighted_workers/federal_versus_state_key.xlsx",
col_types = c("text", "text", "text"))
state_list<- fips_codes%>%
distinct(state, state_name)
# Join list of state abbreviations to dataframe containing worker population in each state
worker_data = left_join(worker_data, state_list, by = c("state" ="state_name" ))
At least 18 employees got sick [at Aurora Walmart, store number: #1492].
We filtered the Colorado health department COVID-19 outbreak data for the Aurora Walmart, store number: #1492. The “total_staff_cases” field shows 18 employees fell ill.
co_aurora_walmart_outbreaks_20210324 = filter_walmart(co_outbreaks_20210324, "setting_name") %>%
mutate(setting_name = tolower(setting_name)) %>%
# Get rid of :month that appears
mutate(setting_name_clean = sub(":.*", "", setting_name)) %>%
# Get store number
mutate(store_number = str_extract(setting_name_clean, "#\\S+") ) %>%
# Filter for store number
filter(store_number == "#1492") %>%
select(setting_name, date_reported_to_public_health,total_staff_cases)
output_formatted_table(co_aurora_walmart_outbreaks_20210324, "Aurora Walmart Outbreak, 18 employees")
setting_name | date_reported_to_public_health | total_staff_cases |
---|---|---|
walmart #1492 | 2020-04-23 | 18 |
“The Walmart where Kunz worked was one of at least 151 Walmart facilities in 10 states with available data where multiple COVID-19 illnesses were recorded, a reporting consortium led by the Howard Center for Investigative Journalism at the University of Maryland, found.”
Full: “That’s one quarter of the company’s stores and distribution centers in those states.”
AP: “On average, one quarter of the company’s stores and distribution centers in those states were affected.”
To produce this finding, we identified each unique Walmart facility where state health department data showed multiple COVID-19 cases among employees (in 10 states with available data). We compared that to the list and count of Walmart facilities in each state. For any locations where the exact identity of a store was ambiguous, we chose the grouping method that led to the lowest number of unique stores affected by COVID-19, and included that minimum number in our overall total.
# Filter out the walmart companies from each outbreak dataset
ar_occurrences_walmart = filter_walmart(ar_occurrences, "company_name")
co_outbreaks_20210324_walmart = filter_walmart(co_outbreaks_20210324, "setting_name")
ks_outbreaks_20200922_to_20210210_walmart = filter_walmart(ks_outbreaks_20200922_to_20210210, "facility")
me_outbreaks_2021026_walmart = filter_walmart(me_outbreaks_2021026, "name")
nm_covid_occurrences_20210321_walmart = filter_walmart(nm_covid_occurrences_20210321, "name")
or_outbreaks_20210324_walmart = filter_walmart(or_outbreaks_20210324, "facility_name")
ri_outbreaks_20210225_walmart = filter_walmart(ri_outbreaks_20210225, "employer")
ut_outbreaks_20210319_walmart = filter_walmart(ut_outbreaks_20210319, "company_name")
vt_outbreaks_20210211_walmart = filter_walmart(vt_outbreaks_20210211, "estab_name")
## Get number of unique Walmarts from each outbreak dataset
# AR --------------------
# Group by company name and city to get unique number of Walmarts
ar_occurrences_walmart_unique = ar_occurrences_walmart %>%
group_by(company_name, city) %>%
summarise(num = n())
num_ar_occurrences_walmart_unique = nrow(ar_occurrences_walmart_unique)
# Colorado --------------------
# Clean up Walmart outbreak names
# Data contains total staff cases
co_outbreaks_20210324_walmart_clean = co_outbreaks_20210324_walmart %>%
mutate(setting_name = tolower(setting_name)) %>%
# Get rid of :month that appears
mutate(setting_name_clean = sub(":.*", "", setting_name)) %>%
# Get store number
mutate(store_number = str_extract(setting_name_clean, "#\\S+") ) %>%
# Adding store number for Walmart Pharmacy in Delta
mutate(store_number = case_when(
setting_name_clean == "walmart pharmacy" & colorado_county_exposure_location == "Delta" ~ "#5458",
TRUE ~ store_number))
# Get unique Walmart outbreaks
co_outbreaks_20210324_walmart_unique = co_outbreaks_20210324_walmart_clean %>%
group_by(store_number,colorado_county_exposure_location ) %>%
summarise(num = n())
# If the distribution centers are two different distribution centers, than the actual count is one higher
num_co_outbreaks_20210324_walmart_unique = nrow(co_outbreaks_20210324_walmart_unique)
# KS --------------------
# https://www.coronavirus.kdheks.gov/DocumentCenter/View/1489/Policy-FAQs
# Kansas releases data "locations associated with cases." Data does not include specific information about employees.
# Only one case that shows up in Kansas outbreak data
ks_outbreaks_20200922_to_20210210_walmart_unique = ks_outbreaks_20200922_to_20210210_walmart %>%
group_by(facility, city_or_state) %>%
summarise(num = n())
num_ks_outbreaks_20200922_to_20210210_walmart_unique = nrow(ks_outbreaks_20200922_to_20210210_walmart_unique)
# ME --------------------
# Maine releases "COVID-19 outbreak investigations associated with worksites in Maine"
# Does not include specific informationa about employees
me_outbreaks_2021026_walmart_unique = me_outbreaks_2021026_walmart %>%
group_by(name) %>% summarise(num = n())
num_me_outbreaks_2021026_unique_walmart = nrow(me_outbreaks_2021026_walmart_unique)
# NM --------------------
# "represents positive cases in the workplace"
# https://www.env.nm.gov/rapid-response-data/
# Made sure addresses did not have misspellings, and then grouped on addresses to get unique Walmart locations
nm_covid_occurrences_20210321_walmart_unique = nm_covid_occurrences_20210321_walmart %>%
mutate(address_clean = case_when(
# This is a fuel center, part of a Sam's Club that's already in the dataset.
address == "4400 N MAIN ST # A" ~ "4400 N MAIN ST",
# This is an auto care center, part of a Walmart that's already in the dataset.
address == "460 NEW MEXICO HIGHWAY 528" ~ "460 HIGHWAY 528",
# The Walmart in "Eunice" NM, is actually the one in Hobbs.
address == "3800 LOVINGTON HWY" ~ "3800 N LOVINGTON HWY",
TRUE ~ address))%>%
group_by(address_clean, county ) %>%
summarise(num = n())
num_nm_covid_occurrences_20210321_walmart_unique = nrow(nm_covid_occurrences_20210321_walmart_unique)
# OR --------------------
# "OHA is publishing data on cases in workplace outbreaks"
# https://www.oregon.gov/oha/covid19/Documents/DataReports/COVID-19-Weekly-Outbreak-Report-2021-1-21-FINAL.pdf
or_outbreaks_20210324_walmart_unique = or_outbreaks_20210324_walmart %>%
# Get rid of repeating white spaces
mutate(address = str_squish(address)) %>%
group_by(address) %>%
summarise(num = n())
num_or_outbreaks_20210324_walmart_unique = nrow(or_outbreaks_20210324_walmart_unique)
# RI --------------------
# Defined as "workplace outbreaks"
# The data provided by the Rhode Island Department of Health was aggregated by company. For Walmart it indicated that there were multiple locations so we assumed at least two locations. There are nine Walmart facilities in the state.
num_ri_outbreaks_20210225_walmart_unique = nrow(ri_outbreaks_20210225_walmart)+1
# UT --------------------
# Defined as "Per CSTE, an outbreak is defined as two* or more laboratory-confirmed COVID-19 cases among workers at a facility with onset of illness within a 14-day period"
ut_outbreaks_20210319_walmart_cleaned = ut_outbreaks_20210319_walmart %>%
# Clean names to make it the same
mutate(company_name_clean = case_when(
# Solve whitespace problem
company_name == "Walmart 900E" ~ "Walmart 900 E",
TRUE ~ company_name
)) %>%
# Join to addresses from Walmart's website
left_join(ut_outbreaks_addresses, by = c("company_name_clean" = "name"))
ut_outbreaks_20210319_walmart_unique = ut_outbreaks_20210319_walmart_cleaned %>%
group_by(company_name_clean, address) %>%
summarise(num = n()) %>%
# There is no identifier for the west valley city (in total there are three WVC locations and the other two in our outbreak data are identified)
filter(company_name_clean != "Walmart WVC")
num_ut_outbreaks_20210319_walmart_unique = nrow(ut_outbreaks_20210319_walmart_unique)
# VT --------------------
# Definition doesn't matter, no walmarts
num_vt_outbreaks_20210211_walmart_unique = nrow(vt_outbreaks_20210211_walmart)
# WA --------------------
# Includes field with staff cases
# Do we want to filter out NA or 0 cases
num_wa_outbreaks_walmart_unique <- washington_walmarts %>%
distinct(unique_store_name_option_a) %>%
nrow()
## Calculations --------------------
# Get all unique walmart numbers from outbreak data
unique_walmarts_with_covid_cases = sum(num_co_outbreaks_20210324_walmart_unique,
num_me_outbreaks_2021026_unique_walmart,
num_nm_covid_occurrences_20210321_walmart_unique,
num_or_outbreaks_20210324_walmart_unique,
num_ks_outbreaks_20200922_to_20210210_walmart_unique,
num_ar_occurrences_walmart_unique,
num_ri_outbreaks_20210225_walmart_unique,
num_ut_outbreaks_20210319_walmart_unique,
num_vt_outbreaks_20210211_walmart_unique,
num_wa_outbreaks_walmart_unique)
# Get number of Walmart owned stores and distribution centers in each state as of 2021-05-03
# https://corporate.walmart.com/our-story/locations/united-states?multi=false
ar_walmarts = 141
co_walmarts = 108
ks_walmarts = 84
me_walmarts = 26
nm_walmarts = 54
or_walmarts = 46
ri_walmarts = 9
ut_walmarts = 62
vt_walmarts = 6
wa_walmarts = 68
total_walmarts = sum(ar_walmarts,
co_walmarts,
ks_walmarts,
me_walmarts,
nm_walmarts,
or_walmarts,
ri_walmarts,
ut_walmarts,
vt_walmarts,
wa_walmarts
)
# States where we received company level reports of COVID-19 in the workplace
ls_company_level_outbreak_data = list(ri_outbreaks_20210225,vt_outbreaks_20210211, ut_outbreaks_20210319, or_outbreaks_20210324, me_outbreaks_2021026, ks_outbreaks_20200922_to_20210210, co_outbreaks_20210324, nm_covid_occurrences_20210321, ar_occurrences, washington_walmarts)
average_percent_walmarts_affected_by_covid = mean(c((num_ar_occurrences_walmart_unique/ar_walmarts),
(num_ks_outbreaks_20200922_to_20210210_walmart_unique/ks_walmarts),
(num_co_outbreaks_20210324_walmart_unique/co_walmarts),
(num_me_outbreaks_2021026_unique_walmart/me_walmarts),
(num_nm_covid_occurrences_20210321_walmart_unique/nm_walmarts),
(num_or_outbreaks_20210324_walmart_unique/or_walmarts),
(num_ri_outbreaks_20210225_walmart_unique/ri_walmarts),
(num_ut_outbreaks_20210319_walmart_unique/ut_walmarts),
(num_vt_outbreaks_20210211_walmart_unique/vt_walmarts),
(num_wa_outbreaks_walmart_unique/wa_walmarts)))*100
table_average_percent_walmarts_affected_by_covid = tibble(avg_pct_walmarts_with_covid_cases = round(average_percent_walmarts_affected_by_covid,2))
# Results --------------------
pct_walmarts_with_covid_cases = tibble( number_of_states = length(ls_company_level_outbreak_data), 'unique_walmarts_with_covid_cases' = unique_walmarts_with_covid_cases, "total_walmarts" = total_walmarts) %>%
mutate(pct_walmarts_with_covid_cases = (unique_walmarts_with_covid_cases / total_walmarts)*100)
output_formatted_table(pct_walmarts_with_covid_cases, "Across 10 states, one quarter of 151 Walmart-owned facilities had COVID-19 cases")
number_of_states | unique_walmarts_with_covid_cases | total_walmarts | pct_walmarts_with_covid_cases |
---|---|---|---|
10 | 151 | 604 | 25 |
output_formatted_table(table_average_percent_walmarts_affected_by_covid, "On average, one quarter of the company's stores and distribution centers in those states were affected")
avg_pct_walmarts_with_covid_cases |
---|
25.17 |
“When workers try to sound the alarm to OSHA by submitting COVID-related complaints, only a fraction lead to inspections, and even fewer result in a citation. As of late March, 3% of closed COVID-19 complaints to federal OSHA offices deemed valid by the agency resulted in an inspection, 12.5% of which led to citations with an average penalty of $13,000. OSHA has reduced over a third of penalties. For Walmart, slightly fewer complaints resulted in inspections — 2.6%. No inspections led to a citation.”
To produce these findings, we started the list of closed valid COVID-19 complaints released by OSHA as of March 21. To find complaints that led to inspections, we filtered the data set for only those complaints with a corresponding inspection number. To discover inspections that led to citations, we used a list OSHA provided to us. To obtain current penalty information, we pulled data from OSHA’s enforcement data site. A data visualization embedded in the story also makes use of these findings.
## Pipeline for all valid closed federal complaints
# Get all inspections from 2020 to present
df.inspections_2020topresent = df.inspections_cleaned %>% filter(year >= 2020)
# Get all violations from 2020 to present
df.violations_2020topresent = df.violations_cleaned %>% filter(year >= 2020)
# Get unique match_nr's from the federal violations sheet
unique_inspection_numbers_federal_violations = unique(federal_covid_violations_20210325$match_nr)
# Filter the inspections for all of the inspections listed in the federal covid database. This will give us information on what triggered the inspection
detailed_inspections_with_covid_violation = df.inspections_2020topresent %>% filter(match_nr %in% unique_inspection_numbers_federal_violations)
# Get a list of the unique activity numbers so we can match up to violations database
unique_activity_numbers_inspection_with_covid_violations = unique(detailed_inspections_with_covid_violation$activity_nr)
# Get all violation info for the inspection numbers
detailed_violations_covid = df.violations_2020topresent %>% filter(activity_nr %in% unique_activity_numbers_inspection_with_covid_violations)
# Grouped all the violations together to get their current penalty and initial penalty resulting from the inspection activity number
summary_detailed_violations_covid = detailed_violations_covid %>%
group_by(activity_nr) %>%
summarise(total_initial = sum(initial_penalty), total_current = sum(current_penalty))
# Join the initial and current money info for the inspections back to the detailed_inspections_with_covid_violation
final_detailed_inspections_with_covid_violations = left_join(detailed_inspections_with_covid_violation, summary_detailed_violations_covid, by = "activity_nr") %>%
# Select columns that we need
select(activity_nr, reporting_id, estab_name, site_address, open_date, case_mod_date, close_conf_date, close_case_date, match_nr, office_type, year, total_initial, total_current)
# Combine the final detailed sheet (this includes the initial penalty, current penalty, what triggered the inspection)
complete_federal_covid_violations_20210325 = left_join(federal_covid_violations_20210325,final_detailed_inspections_with_covid_violations, by = "match_nr") %>%
# Checking to make sure that initial penalties from the downloaded data match the excel sheet provided by osha
mutate(check_initial_match = ifelse(total_initial_penalty == total_initial, "yes", "no")) %>%
# Calculate the change from initial penalty to current penalty
mutate(penalty_change = total_initial - total_current)
## Prepare dataframe of federal closed complaints --------------------
# Creates a dataframe of closed federal complaints as of March 21, 2021
federal_closed_ois_complaints_20210321 = ois_closed_complaints_20210321 %>%
# Choose federal plan offices
filter(office_type == "federal_plan_office")
# --------------------
valid_closed_covid_complaints_to_inspections = federal_closed_ois_complaints_20210321 %>%
group_by(inspection_occurred) %>%
summarise(number_complaints = n()) %>%
pivot_wider(names_from = inspection_occurred, values_from = number_complaints) %>%
rename("complaint_with_inspection" ="yes", "complaint_with_no_inspection" = "no" ) %>%
mutate(total_complaints = complaint_with_inspection + complaint_with_no_inspection) %>%
mutate(pct_complaints_to_inspections = round(complaint_with_inspection / (total_complaints) * 100,2))
# --------------------
# Create a dataframe of unique inspection numbers.
federal_closed_ois_complaints_20210321_insp_numbers = federal_closed_ois_complaints_20210321 %>%
filter(inspection_occurred == "yes") %>%
select(insp_number) %>%
separate_rows(insp_number) %>%
unique() %>%
tibble() %>%
mutate(insp_number = as.numeric(insp_number))
# Only keep the inspection numbers from the citation sheet that OSHA sent, that appear in the federal closed citation data
federal_closed_complaints_inspection_resulting_in_violation = complete_federal_covid_violations_20210325 %>%
filter(inspection_number %in% federal_closed_ois_complaints_20210321_insp_numbers$insp_number)
# Number of inspections related to complaints that resulted in a citation
num_closed_federal_complaints_inspections_cited = nrow(federal_closed_complaints_inspection_resulting_in_violation)
average_initial_penalty = tibble( average_initial_penalty = mean(federal_closed_complaints_inspection_resulting_in_violation$total_initial_penalty))
# --------------------
inspection_to_citation_percent = federal_closed_ois_complaints_20210321 %>%
group_by(inspection_occurred) %>%
summarise(number_complaints_with_inspections = n()) %>%
filter(inspection_occurred == "yes") %>%
add_column("resulting_citations" = num_closed_federal_complaints_inspections_cited) %>%
mutate(percent_inspections_resulting_in_violations = round((resulting_citations/number_complaints_with_inspections)*100,2)) %>%
select(-inspection_occurred)
# --------------------
# Get percent of penalties that have citations have been reduced
reduced_penalties = federal_closed_complaints_inspection_resulting_in_violation %>%
mutate(penalty_reduced = case_when(
penalty_change > 0 ~ "reduced_penalty",
TRUE ~ "no_change"
)) %>%
group_by(penalty_reduced) %>%
summarise(num = n()) %>%
pivot_wider(names_from = penalty_reduced, values_from = num) %>%
mutate(total = reduced_penalty + no_change) %>%
mutate(pct_penalties_reduced = round(reduced_penalty / (total) *100,2))
# Results ---------------------
output_formatted_table(valid_closed_covid_complaints_to_inspections, "3% of closed COVID-19 complaints to federal OSHA offices deemed valid by the agency resulted in an inspection")
complaint_with_no_inspection | complaint_with_inspection | total_complaints | pct_complaints_to_inspections |
---|---|---|---|
12339 | 376 | 12715 | 2.96 |
output_formatted_table(inspection_to_citation_percent,"12.5% of which led to citations" )
number_complaints_with_inspections | resulting_citations | percent_inspections_resulting_in_violations |
---|---|---|
376 | 47 | 12.5 |
output_formatted_table(average_initial_penalty, "with an average penalty of $13,000")
average_initial_penalty |
---|
13065.79 |
output_formatted_table(reduced_penalties, "OSHA has reduced over a third of penalties")
no_change | reduced_penalty | total | pct_penalties_reduced |
---|---|---|---|
26 | 21 | 47 | 44.68 |
## Pipeline for all valid closed federal complaints regarding Walmart
# Calculate percent of Walmart complaints that had a related inspection
valid_closed_walmart_covid_complaints_to_inspections = federal_closed_ois_complaints_20210321 %>%
filter_walmart("estab_name") %>%
group_by(inspection_occurred) %>%
summarise(number_complaints = n()) %>%
pivot_wider(names_from = inspection_occurred, values_from = number_complaints) %>%
rename("complaint_resulted_in_inspection" ="yes", "no_inspection_occurred" = "no" ) %>%
mutate(pct_complaints_to_inspections = round(complaint_resulted_in_inspection / (complaint_resulted_in_inspection + no_inspection_occurred) * 100,2))
# ---------------------
# Create a dataframe of inspection numbers related to closed federal Walmart complaints
federal_closed_ois_walmart_complaints_20210321_insp_numbers = federal_closed_ois_complaints_20210321 %>%
filter_walmart("estab_name") %>%
filter(inspection_occurred == "yes") %>%
select(insp_number) %>%
separate_rows(insp_number) %>%
unique() %>%
tibble() %>%
mutate(insp_number = as.numeric(insp_number))
# Filter for inspection numbers resulting from Walmart COVID-19 complaints in the violation database
federal_closed_walmart_complaints_inspected_cited = complete_federal_covid_violations_20210325 %>%
# Filter the inspection numbers from the violation sheet for the inspection numbers tied with closed walmart complaints
filter(inspection_number %in% federal_closed_ois_walmart_complaints_20210321_insp_numbers$insp_number)
num_federal_closed_walmart_complaints_inspections_cited = nrow(federal_closed_walmart_complaints_inspected_cited)
# --------------------
# Calculate percentage of Walmart complaints whose inspections resulted in citation
walmart_inspection_to_citation_percent = federal_closed_ois_complaints_20210321 %>%
filter_walmart("estab_name") %>%
group_by(inspection_occurred) %>%
summarise(number_complaints_with_inspections = n()) %>%
filter(inspection_occurred == "yes") %>%
add_column("resulting_citations" = num_federal_closed_walmart_complaints_inspections_cited) %>%
mutate(percent_inspections_resulting_in_violations = round((resulting_citations/number_complaints_with_inspections)*100,2)) %>%
select(-inspection_occurred)
# Results --------------------
output_formatted_table(valid_closed_walmart_covid_complaints_to_inspections, "For Walmart, slightly fewer complaints resulted in inspections — 2.6%.")
no_inspection_occurred | complaint_resulted_in_inspection | pct_complaints_to_inspections |
---|---|---|
113 | 3 | 2.59 |
output_formatted_table(walmart_inspection_to_citation_percent, "No inspections led to a citation.")
number_complaints_with_inspections | resulting_citations | percent_inspections_resulting_in_violations |
---|---|---|
3 | 0 | 0 |
“In Grants Pass, Oregon, Walmart workers and customers filed more than two dozen complaints about unsafe conditions in 2020 with the state worker-safety agency. The store allowed customers to enter without masks and allowed customers to remove masks once inside, a complaint from September read. ‘Management tells us to say nothing and to not even bother … management with it,’ the complaint said.”
Using complaint data provided by Oregon OSHA, we filtered for complaints about the Grants Pass Walmart on Terry Lane for September 2020. The data contains one record per “hazard” per complaint. To get unique complaints, we filtered for rows that have an item number of 1.
num_grants_pass_walmart_2020_complaints = filter_walmart(or_public_20210311, "employer_name")%>%
# Lowercase all the words in the address column to better filter it
mutate(site_location = tolower(site_location))%>%
# Look for Grants Pass, Terry Lane location complaints
filter(str_detect(site_location, "terry"))%>%
# Find complaints in 2020
filter(intake_date >= as_date("2020-01-01") & intake_date <= as_date("2020-12-31")) %>%
# The item number indicates a given hazard within a single complaint.
# To get unique complaints, filter for just item counts that are 1
filter(item_count == 1) %>%
nrow()
# Create dataframe
grants_pass_walmart_2020_complaints = tibble("num_grants_pass_walmart_2020_complaints" = num_grants_pass_walmart_2020_complaints)
# Results --------------------
output_formatted_table(grants_pass_walmart_2020_complaints, "More than 2 dozen COVID-19 complaints at the Terry Lane Walmart in Grants Pass")
num_grants_pass_walmart_2020_complaints |
---|
25 |
september_complaint_check<-filter_walmart(or_public_20210311, "employer_name")%>%
# Lowercase all the words in the address column to better filter it
mutate(site_location = tolower(site_location))%>%
# Look for Grants Pass, Terry Lane location complaints
filter(str_detect(site_location, "terry"))%>%
# Find complaints in September
filter(intake_date > as_date("2020-09-01") & intake_date < as_date("2020-09-30")) %>%
# Filter for specific complaint
filter(description == "Still allowing customers to enter without masks, and allowing customers who enter to remove masks once inside. Management tells us to say nothing and to not even bother them (management with it).")
# Results --------------------
output_formatted_table(september_complaint_check, "Quoted Complaint from the Terry Lane Walmart in Grants Pass, September 2020")
complaint_number | intake_date | complaint_year | code_ois_naics_vv | business_type_desc | mbusiness | employer_name | mname | employer_dba | site_location | site_city | site_state | site_zip | county_name | item_count | description | hazard_location | confidential | anonymous | complainant_type | name | is_walmart |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NA | 2020-09-22 | 0 | 452910 | supercenter | Grocery Store | Wal-Mart Associates Inc | Walmart | NA | 135 ne terry ln | Grants Pass | OR | 97526 | JOSEPHINE | 1 | Still allowing customers to enter without masks, and allowing customers who enter to remove masks once inside. Management tells us to say nothing and to not even bother them (management with it). | NA | Not confidential | Anonymous | NA | wal-mart associates inc | TRUE |
“Between December and March, at least 18 people were infected in an outbreak linked to that Walmart. In January, while the outbreak was still active, another OSHA complaint alleged Walmart ‘may not be informing employees when in contact with a positive case for COVID-19 in the workplace.’”
Using the complaint data provided by Oregon OSHA, we filtered for complaints about the Grants Pass Walmart on Terry Lane for January 2021.
# Create an object that feeds in Oregon Walmart outbreak information
terry_lane_fact_check <-or_outbreaks_20210324_walmart %>%
# Lowercase all the addresses in the "address" column
mutate(address = tolower(address))%>%
# Return only rows that contain Terry Lane in them
filter(str_detect(address, "terry")) %>%
select(-X1)
# Results --------------------
output_formatted_table(terry_lane_fact_check, "At least 18 COVID-19 cases")
facility_name | county | first_reported | total_cases | total_deaths | resolved_status | category | address | investigation_start_date | most_recent_onset | students | staff_volunteer | report_name | report_dates | facility_name_squished | address_squished | begin_date | name | is_walmart |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Walmart | Josephine | NA | 18 | NA | active | workplace | 135 ne terry ln, grants pass, or 97526 | 2020-12-27 | 2021-02-14 | NA | NA | COVID-19-Weekly-Outbreak-Report-2021-2-24-FINAL.csv | 2021-2-24 | walmart | 135neterryln,grantspass,or97526 | 2020-12-27 | walmart | TRUE |
# You can see, an outbreak in Grants Pass, Terry Ln. Walmart; 18 total related cases, investigation start date in December. Even though the last report date says February, our data goes through March 24th.
# Create object to check January complaints, and feed in Walmart complaints from Oregon OSHA complaints by using Walmart filter
january_complaint_check<- filter_walmart(or_public_20210311, "employer_name")%>%
# Lowercase all the words in the address column to better filter it
mutate(site_location = tolower(site_location))%>%
# Look for Grants Pass, Terry lane location complaints
filter(str_detect(site_location, "terry"))%>%
# Find complaints in January
filter(intake_date > as_date("2020-12-31") & intake_date < as_date("2021-02-01"))
# Results --------------------
output_formatted_table(january_complaint_check, "Quoted Complaint from the Terry Lane Walmart in Grants Pass, January 2021")
complaint_number | intake_date | complaint_year | code_ois_naics_vv | business_type_desc | mbusiness | employer_name | mname | employer_dba | site_location | site_city | site_state | site_zip | county_name | item_count | description | hazard_location | confidential | anonymous | complainant_type | name | is_walmart |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
209449896 | 2021-01-13 | 2021 | 452112 | store | Grocery Store | Wal-Mart Associates Inc | Walmart | Wal-Mart Associates Inc | 135 ne terry ln | Grants Pass | OR | 97526 | JOSEPHINE | 1 | Employer may not be informing employees when in contact with a positive case for COVID 19 in the workplace. Employer may not be effectively performing contract tracing within the establishment to make notice to employees who may have be in contact with a positive COVID 19 case. Employer may not be taking the steps to effectively sanitize work stations between employees. | 135 NE Terry Ln Grants Pass, OR 97526; Bakery department, asset protection office, and pick up | Confidential | Not anonymous | Withheld | wal-mart associates inc | TRUE |
"In Colorado, approximately 98% of all workplaces with reported COVID-19 outbreaks did not appear on OSHA’s list as of March 2021.
We used a fuzzy matching library to match company names in Colorado outbreak data to company names in closed valid OSHA complaint data for Colorado. We then manually verified matches and non-matches to create a list of companies that had outbreaks that also had closed valid OSHA complaints and companies that did not. We also created a list of unique company locations in the outbreak data using OpenRefine. The fuzzy matching was executed in a separate python script file.One caveat: closed Colorado OSHA complaints from March 7, 2021 were compared to Colorado outbreak data from March 24, 2021. Those file versions were the two most recent available when we did our matching analysis.
# Read in Colorado unique locations that had outbreaks
co_unique_locations_outbreaks<- read_csv("data/state_foia/co/co_dedup_after_openrefine_FINAL.csv") %>% clean_names()
# Clean column names
co_unique_locations_outbreaks = co_unique_locations_outbreaks %>%
# Count locations
group_by(setting_name, colorado_county_exposure_location)%>%
summarise(locations_with_an_outbreak = n())
# Sum total number of unique locations with outbreaks
unique_colorado_outbreak_locations = sum(co_unique_locations_outbreaks$locations_with_an_outbreak)
# Read in complaints matching data
colorado_outbreak_complaints_matches<-read_csv("data/state_foia/co/hand_verified_colorado_complaints_outbreaks_final.csv")
colorado_outbreak_complaints_matches = colorado_outbreak_complaints_matches %>%
clean_names()%>%
# Filter to return rows where humans identified a match between a complaint and an outbreak
filter(human_is_it_a_match_y_n_m == "y")%>%
# Count the number of matches
group_by(human_is_it_a_match_y_n_m)%>%
summarise(number_of_matches = n())
# Add a column containing the total number of unique locations with outbreaks
colorado_outbreak_complaints_matches<- colorado_outbreak_complaints_matches %>%
add_column(num_unique_outbreak_locations= unique_colorado_outbreak_locations)%>%
select(number_of_matches, num_unique_outbreak_locations)%>%
# Get percentage of outbreaks with no matching complaint
mutate(percent_outbreaks_with_no_matches = round(100 - ((number_of_matches/num_unique_outbreak_locations)*100),2))
# Results --------------------
output_formatted_table(colorado_outbreak_complaints_matches, "98% of workplaces with outbreaks had no COVID-19 complaints")
number_of_matches | num_unique_outbreak_locations | percent_outbreaks_with_no_matches |
---|---|---|
68 | 3682 | 98.15 |
“One-third of Walmart facilities in the state had recorded outbreaks, but none of those 37 facilities appeared on OSHA’s list as of March 2021 – including two stores where a total of three employees died.”
We filtered Colorado outbreak data for all unique Walmart facilities, and compared that number to total Walmart facilities in the state. The total number of Walmart-owned facilities in Colorado was extracted from Walmart’s corporate site. There are two Walmart distribution centers and 106 retail units. We also filtered closed valid OSHA complaints for Colorado Walmart facilities and checked for the existence of those facilities in Colorado outbreak data.
# Clean up Walmart outbreak names
co_outbreaks_20210324_walmart_clean = co_outbreaks_20210324_walmart %>%
mutate(setting_name = tolower(setting_name)) %>%
# Get rid of :month that appears
mutate(setting_name_clean = sub(":.*", "", setting_name))
co_outbreaks_20210324_walmart_unique = co_outbreaks_20210324_walmart_clean %>%
# Get store number
mutate(store_number = str_extract(setting_name_clean, "#\\S+") ) %>%
# Add store number to Walmart pharmacy in Delta Walmart
mutate(store_number = case_when(
# Add store number to Walmart pharmacy in Delta Walmart
setting_name_clean == "walmart pharmacy" & colorado_county_exposure_location == "Delta" ~ "#5458",
TRUE ~ store_number
)) %>%
group_by(setting_name_clean, store_number) %>%
summarise(num = n())
# Numbers --------------------
number_unique_colorado_walmart_outbreaks = nrow(co_outbreaks_20210324_walmart_unique)
number_colorado_walmarts = 108
pct_co_walmart_outbreaks = tibble('number_unique_colorado_walmart_outbreaks' = number_unique_colorado_walmart_outbreaks, "number_colorado_walmarts" = number_colorado_walmarts ) %>%
mutate(pct_co_walmart_outbreaks = round((number_unique_colorado_walmart_outbreaks / number_colorado_walmarts)*100,2))
# Result --------------------
output_formatted_table(pct_co_walmart_outbreaks, "One-third of Colorado Walmart locations with outbreaks" )
number_unique_colorado_walmart_outbreaks | number_colorado_walmarts | pct_co_walmart_outbreaks |
---|---|---|
37 | 108 | 34.26 |
co_ois_complaints_20210328_walmart = ois_closed_complaints_20210328 %>%
filter(site_state == "CO") %>%
filter_walmart("estab_name") %>%
mutate(site_address_1 = str_squish(site_address_1)) %>%
# Left join to store numbers that we got from looking up address
left_join(co_walmart_complaints_addresses, by = c("site_address_1" = "address"))
# Answer how many of the Walmarts that had outbreaks have had a closed complaint
co_outbreaks_with_complaints = semi_join(co_outbreaks_20210324_walmart_unique, co_ois_complaints_20210328_walmart, by = "store_number")
num_co_outbreaks_with_complaints = nrow(co_outbreaks_with_complaints)
number_walmarts_with_outbreaks_that_had_osha_complaints = tibble("number_walmarts_with_outbreaks_that_had_osha_complaints" = num_co_outbreaks_with_complaints)
# Result --------------------
output_formatted_table(number_walmarts_with_outbreaks_that_had_osha_complaints, "Colorado Walmarts with outbreaks had no COVID-19 complaints" )
number_walmarts_with_outbreaks_that_had_osha_complaints |
---|
0 |
# Show that the stores with deaths did not have an OSHA complaint
co_outbreaks_20210324_walmart_deaths = co_outbreaks_20210324_walmart %>%
filter(total_staff_deaths > 0) %>%
mutate(store_number = str_extract(name, "#\\S+") ) %>%
select(setting_name,colorado_county_exposure_location,date_outbreak_was_considered_closed,total_staff_deaths)
total_co_walmart_deaths = sum(co_outbreaks_20210324_walmart_deaths$total_staff_deaths)
co_walmart_deaths_that_had_a_complaint = co_ois_complaints_20210328_walmart %>% filter(store_number %in% co_outbreaks_20210324_walmart_deaths$store_number)
num_co_walmart_stores_with_deaths_that_had_a_complaint = tibble(num_co_walmart_stores_with_deaths_that_had_a_complaint = nrow(co_walmart_deaths_that_had_a_complaint), total_co_walmart_deaths = total_co_walmart_deaths )
# Result --------------------
output_formatted_table(co_outbreaks_20210324_walmart_deaths, "Two Walmarts where a total of three employees died")
setting_name | colorado_county_exposure_location | date_outbreak_was_considered_closed | total_staff_deaths |
---|---|---|---|
Walmart #1434 | El Paso | NA | 1 |
Walmart #1492 | Arapahoe | 2020-06-09 | 2 |
output_formatted_table(num_co_walmart_stores_with_deaths_that_had_a_complaint, "The two Walmarts with deaths also had no COVID-19 complaints")
num_co_walmart_stores_with_deaths_that_had_a_complaint | total_co_walmart_deaths |
---|---|
0 | 3 |
“In states where state agencies oversee most workplaces, the rate of complaints was five times higher than in states where the federal government exclusively oversees workplace safety.”
We compared the number of valid closed complaints in state plan states to the number of valid closed complaints in states covered by federal OSHA. We did not include partial plan states in our analysis. We used total employment figures from the Bureau of Labor Statistics to weight the comparison.
complaints_by_plan = ois_closed_complaints_20210328 %>%
# Join dataframe with a table that describes the OSHA plan type for each state
left_join(federal_state_plan_key, by = c("site_state" = "state")) %>%
# Exclude partial plan states
filter(coverage != "state-local") %>%
# Group and count by state key.
group_by(site_state, plan_type) %>%
count() %>%
inner_join(worker_data, by = c("site_state" = "state.y")) %>%
#filtering out PR because it is not a state
filter(site_state != "PR") %>%
group_by(plan_type) %>%
summarise(total_complaints = sum(n), total_employees = sum(total_employment)) %>%
mutate(complaints_per_10000_workers = (total_complaints/total_employees)*10000)
complaints_by_plan_output = complaints_by_plan %>%
mutate(complaints_per_10000_workers = round(complaints_per_10000_workers,2))
complaints_by_plan_result = complaints_by_plan %>%
# Select necessary columns
select(plan_type, complaints_per_10000_workers) %>%
# Pivot wider
pivot_wider(names_from = plan_type, values_from = complaints_per_10000_workers) %>%
# Rename column names
rename(federal_complaint_per_10K_workers = federal_plan, state_complaint_per_10K_workers = state_plan) %>%
# Calculate how many times higher rate of complaints in states where state agencies oversee most workplaces versus where the federal government exclusively oversees workplace safety
mutate(ratio = state_complaint_per_10K_workers/federal_complaint_per_10K_workers) %>%
mutate(across(everything(),function(x) round(x,2))) %>%
mutate(ratio = paste0("1:",round(ratio,0)))
# Result --------------------
output_formatted_table(complaints_by_plan_output, "Complaint rates")
plan_type | total_complaints | total_employees | complaints_per_10000_workers |
---|---|---|---|
federal_plan | 8520 | 63512500 | 1.34 |
state_plan | 37447 | 58588200 | 6.39 |
output_formatted_table(complaints_by_plan_result, "Calculated ratio")
federal_complaint_per_10K_workers | state_complaint_per_10K_workers | ratio |
---|---|---|
1.34 | 6.39 | 1:5 |
“In Oregon, which is among the states with the most COVID-19 complaints for Walmart, only one complaint led to an inspection, and not until March 22, 2021 – more than a year after the pandemic started.”
We compared total closed Walmart COVID-19 complaints that appeared in federal OSHA database for each state. We filtered the list of inspections that Oregon’s state OSHA provided for Walmart facilities. In the state inspection list, one Walmart appears. We confirmed with Oregon’s state OSHA that this inspection was opened on Dec. 5, 2020 and was not in response to a complaint, so we excluded it from this analysis. In an email shortly before our publication deadline, an agency spokesperson said that on March 22, 2021 the agency opened an investigation into Walmart in response to a COVID-19 complaint. It does not appear in our data, which is current through March 16, 2021.
walmart_complaints_by_state = ois_closed_complaints_20210328 %>%
filter_walmart("estab_name") %>%
group_by(site_state) %>%
count() %>%
arrange(desc(n)) %>%
rename(number_complaints = n)
# Result --------------------
# Ranking of walmart complaints by state using OIS data
output_formatted_table(head(walmart_complaints_by_state), "States with the most closed valid Walmart COVID-19 Complaints")
site_state | number_complaints |
---|---|
OR | 104 |
CA | 72 |
IN | 71 |
NC | 42 |
PR | 25 |
NV | 23 |
# Look at Oregon inspection data complaints from OR OSHA --------------------
# Create a new object, filter Oregon inspections to return only Walmart inspections, only one Walmart that appears, not marked as related to a COVID-19 complaints
or_inspections_walmart_state_data <- filter_walmart(or_inspections_20210316, "legal_name")
# Result --------------------
output_formatted_table(or_inspections_walmart_state_data,"Walmart inspection from a Fatality/Catastrophe")
inspection_no | federal_fiscal_year | inspection_open_date | type | category | scope | inspected_naics | description | employer_no | location_no | legal_name | loc_city | loc_county | close_conference_date | citation_issued | initial_penalty | citation_green_card_date | inspection_related_to_a_covid_complaint | inspection_marked_as_covid_related | inspection_marked_as_covid_related_remote | covid_food_processing_emphasis_inspection | name | is_walmart |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
317728094 | 2021 | 2020-12-05 | Fatality/Catastrophe | Health | Partial inspection | 452112 | DISCOUNT DEPARTMENT STORES | 2019842 | 29 | Wal-Mart Associates Inc | Happy Valley | CLACKAMAS | . | NA | NA | . | NA | X | NA | NA | wal-mart associates inc | TRUE |
“As of March 24, at least 10 Walmart locations in the state were linked to outbreaks with over five cases, including a Hermiston distribution center linked to 124 recorded cases.”
We determined the number of unique Walmart locations that appeared in Oregon’s outbreak data. According to the Oregon Health Authority, “OHA is only reporting workplace outbreaks with five or more cases and only for workplaces with at least 30 workers … Case counts include all persons linked to the outbreak, which may include household members and other close contacts.”
# Get Walmart-related outbreaks in OR:
or_walmart_outbreaks_unique <- or_outbreaks_20210324_walmart %>%
# Get rid of extra whitespaces
mutate(address = str_squish(address)) %>%
# Filter to one outbreak report per Walmart.
distinct(address, .keep_all=TRUE) %>%
select(-X1)
# Count of stores
num_or_outbreaks_20210324_walmart_unique = or_walmart_outbreaks_unique %>%
summarize(unique_walmart_facilities_w_outbreaks = n())
# Result --------------------
output_formatted_table(num_or_outbreaks_20210324_walmart_unique, "Count of unique Walmart COVID-19 outbreak locations in Oregon")
unique_walmart_facilities_w_outbreaks |
---|
10 |
# Result --------------------
output_formatted_table(or_walmart_outbreaks_unique, "List of unique Walmart COVID-19 outbreak locations")
facility_name | county | first_reported | total_cases | total_deaths | resolved_status | category | address | investigation_start_date | most_recent_onset | students | staff_volunteer | report_name | report_dates | facility_name_squished | address_squished | begin_date | name | is_walmart |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Walmart | Clackamas | NA | 8 | NA | resolved | workplace | 10000 SW 82nd Ave, Happy Valley, OR 97086 | 2020-12-01 | 2020-12-12 | NA | NA | COVID-19-Weekly-Outbreak-Report-2021-1-13-FINAL.csv | 2021-1-13 | walmart | 10000sw82ndave,happyvalley,or97086 | 2020-12-01 | walmart | TRUE |
Walmart | Linn | NA | 8 | NA | active | workplace | 1330 Goldfish Farm Rd SE, Albany, OR 97322 | 2020-11-26 | 2020-12-01 | NA | NA | COVID-19-Weekly-Outbreak-Report-2020-12-16-FINAL.csv | 2020-12-16 | walmart | 1330goldfishfarmrdse,albany,or97322 | 2020-11-26 | walmart | TRUE |
Walmart | Umatilla | NA | 58 | NA | active | workplace | 1350 N 1st St, Hermiston, OR 97838 | 2020-07-09 | 2020-09-23 | NA | NA | COVID-19-Weekly-Report-2020-09-30-FINAL.xlsx | 2020-09-3 | walmart | 1350n1stst,hermiston,or97838 | 2020-07-09 | walmart | TRUE |
Walmart | Josephine | NA | 18 | NA | active | workplace | 135 NE Terry Ln, Grants Pass, OR 97526 | 2020-12-27 | 2021-02-14 | NA | NA | COVID-19-Weekly-Outbreak-Report-2021-2-24-FINAL.csv | 2021-2-24 | walmart | 135neterryln,grantspass,or97526 | 2020-12-27 | walmart | TRUE |
Walmart | Malheur | NA | 10 | NA | active | workplace | 1775 E Idaho Ave, Ontario, OR 97914 | 2020-07-01 | 2020-08-08 | NA | NA | COVID-19-Weekly-Report-2020-08-19-FINAL.xlsx | 2020-08-19 | walmart | 1775eidahoave,ontario,or97914 | 2020-07-01 | walmart | TRUE |
Walmart Supercenter | Marion | NA | 5 | NA | active | workplace | 2035 Lancaster Dr NE, Salem, OR 97305 | 2021-01-11 | 2021-01-02 | NA | NA | COVID-19-Weekly-Outbreak-Report-2021-1-27-FINAL.csv | 2021-1-27 | walmartsupercenter | 2035lancasterdrne,salem,or97305 | 2021-01-11 | walmart supercenter | TRUE |
Walmart | Coos | NA | 11 | NA | resolved | workplace | 2051 Newmark Ave, Coos Bay, OR 97420 | 2020-10-16 | 2020-10-18 | NA | NA | COVID-19-Weekly-Outbreak-Report-2020-12-16-FINAL.csv | 2020-12-16 | walmart | 2051newmarkave,coosbay,or97420 | 2020-10-16 | walmart | TRUE |
Walmart Distribution Center | Umatilla | NA | 124 | NA | active | workplace | 2650 US-395, Hermiston, OR 97838 | 2020-06-30 | 2021-03-07 | NA | NA | COVID-19-Weekly-Outbreak-Report-2021-3-17-FINAL.csv | 2021-3-17 | walmartdistributioncenter | 2650us-395,hermiston,or97838 | 2020-06-30 | walmart distribution center | TRUE |
Walmart Supercenter | Lane | NA | 13 | NA | active | workplace | 2659 Olympic St, Springfield, OR 97477 | 2021-02-18 | 2021-02-23 | NA | NA | COVID-19-Weekly-Outbreak-Report-2021-3-10-FINAL.csv | 2021-3-1 | walmartsupercenter | 2659olympicst,springfield,or97477 | 2021-02-18 | walmart supercenter | TRUE |
Walmart | Multnomah | NA | 19 | NA | active | workplace | 4200 SE 82nd Ave, Portland, OR 97266 | 2020-11-28 | 2020-12-11 | NA | NA | COVID-19-Weekly-Outbreak-Report-2020-12-23-FINAL.csv | 2020-12-23 | walmart | 4200se82ndave,portland,or97266 | 2020-11-28 | walmart | TRUE |
# Point out the Hermiston outbreak, with 124 cases:
hermiston_outbreak <- or_walmart_outbreaks_unique %>% filter(facility_name == 'Walmart Distribution Center')
# Result --------------------
output_formatted_table(hermiston_outbreak, "Outbreak information on Walmart distribution center in Hermiston")
facility_name | county | first_reported | total_cases | total_deaths | resolved_status | category | address | investigation_start_date | most_recent_onset | students | staff_volunteer | report_name | report_dates | facility_name_squished | address_squished | begin_date | name | is_walmart |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Walmart Distribution Center | Umatilla | NA | 124 | NA | active | workplace | 2650 US-395, Hermiston, OR 97838 | 2020-06-30 | 2021-03-07 | NA | NA | COVID-19-Weekly-Outbreak-Report-2021-3-17-FINAL.csv | 2021-3-17 | walmartdistributioncenter | 2650us-395,hermiston,or97838 | 2020-06-30 | walmart distribution center | TRUE |
“In New Mexico, nearly every one of Walmart’s 54 workplaces has reported cases of employees testing positive for coronavirus, according to data from the state’s Environment Department.”
We filtered New Mexico workplace-level COVID-19 data for Walmart facilities to obtain the number of unique Walmart locations. We then compared that to the overall number of Walmart facilities in the state. The total number of Walmart-owned facilities in New Mexico was extracted from Walmart’s corporate site. There are 53 retail units and one Walmart distribution center. New Mexico defines a “rapid response” COVID-19 event as “one or more employees test[ing] positive for COVID-19.”
## NM -------------------------
nm_covid_occurrences_20210321_walmart_unique = nm_covid_occurrences_20210321_walmart %>%
mutate(address_clean = case_when(
# This is a fuel center, part of a Sam's Club that's already in the dataset.
address == "4400 N MAIN ST # A" ~ "4400 N MAIN ST",
# This is an auto care center, part of a Walmart that's already in the dataset.
address == "460 NEW MEXICO HIGHWAY 528" ~ "460 HIGHWAY 528",
# The Walmart in "Eunice" NM, is actually the one in Hobbs.
address == "3800 LOVINGTON HWY" ~ "3800 N LOVINGTON HWY",
TRUE ~ address))%>%
group_by(address_clean, county ) %>%
summarise(num = n())
# Numbers --------------------
# 52 Walmart workplaces remain:
num_nm_covid_occurrences_20210321_walmart_unique = nrow(nm_covid_occurrences_20210321_walmart_unique)
# Number of walmarts in New Mexico
number_nm_walmarts = 54
# Result ---------------------
pct_nm_walmarts_with_covid = tibble("number_unique_nm_walmarts_with_covid" = num_nm_covid_occurrences_20210321_walmart_unique, "number_newmexico_walmarts" = number_nm_walmarts) %>% mutate("pct_nm_walmarts_with_covid" = round((number_unique_nm_walmarts_with_covid/number_newmexico_walmarts)*100,2))
output_formatted_table(pct_nm_walmarts_with_covid, "Unique New Mexico Walmart facilities with COVID-19 cases")
number_unique_nm_walmarts_with_covid | number_newmexico_walmarts | pct_nm_walmarts_with_covid |
---|---|---|
52 | 54 | 96.3 |
“Attorneys general in 11 states and the District of Columbia expressed concern in a letter to Walmart CEO Doug McMillon, citing reports of symptomatic employees remaining”undiagnosed due to the lack of testing capacity" and sick workers being pressured to work by store managers. There were at least 85 complaints about Walmart or Walmart-owned Sam’s Club stores and distribution centers in those states after the letter was sent, and at least 387 across the country, according to the federal OSHA COVID-19 complaints database."
We filtered the federal database of closed OSHA complaints related to COVID-19 for complaints at Walmart facilities after June 2, 2020, the date of the letter from the attorneys general. We further filtered the data to identify Walmart complaints in the 11 states (and Washington, D.C.) represented by the attorneys general who signed the letter.
# The letter was sent June 2, 2020. Walmart complaints after June 2, 2020
walmart_complaints_after_ag_letter <- ois_closed_complaints_20210328 %>%
filter_walmart("estab_name") %>%
filter(upa_receipt_date >"2020-06-02")
# Walmart complaints from states where attorneys general sent letter to Walmart
walmart_complaints_after_ag_letter_specificStates = walmart_complaints_after_ag_letter %>%
filter(site_state %in% c("CA", "CT", "DE", "DC", "IL", "MA", "MI", "MN", "NM", "PA", "WA", "MD"))
# Result ---------------------
num_walmart_complaints_after_ag_letter = tibble("complaints_from_states_where_AG_sent_letter" = nrow(walmart_complaints_after_ag_letter_specificStates),"complaints_from_across_country" = nrow(walmart_complaints_after_ag_letter))
output_formatted_table(num_walmart_complaints_after_ag_letter, "Walmart Complaints after Attorneys General Letter")
complaints_from_states_where_AG_sent_letter | complaints_from_across_country |
---|---|
85 | 387 |
“From 2020 to 2021, Utah’s state OSHA referred about 20% of people complaining about COVID-19 hazards to local health departments.”
We calculated the percentage of Utah OSHA complaints whose outcome was “Referred Complainant to Health Dept - Outside UOSH Jurisdiction.”
utah_osha_complaints_summary = ut_osha_complaints_202003_to_202103 %>%
# Group complaints by outcome type.
group_by(outcome) %>%
# Calculate number of complaints with each outcome type.
summarise(num_referred = n()) %>%
mutate(total_complaints = sum(num_referred)) %>%
# What percent of the total was each outcome type?
mutate(percent = round(100* num_referred / total_complaints,2)) %>%
filter(outcome == "Referred Complainant to Health Dept - Outside UOSH Jurisdiction")
# Result ---------------------
output_formatted_table(utah_osha_complaints_summary, "Percent of Utah OSHA COVID-19 complaints referred to health departments")
outcome | num_referred | total_complaints | percent |
---|---|---|---|
Referred Complainant to Health Dept - Outside UOSH Jurisdiction | 323 | 1539 | 20.99 |
-30-