Abstract
BACKGROUND: Optimizing the public health response to reduce the burden of COVID-19 necessitates characterizing population-level heterogeneity of risks for the disease. However, heterogeneity in SARS-CoV-2 testing may introduce biased estimates depending on analytic design. We aimed to explore the potential for collider bias in a large study of disease determinants, and evaluate individual, environmental and social determinants associated with SARS-CoV-2 testing and diagnosis among residents of Ontario, Canada.
METHODS: We explored the potential for collider bias and characterized individual, environmental and social determinants of being tested and testing positive for SARS-CoV-2 infection using cross-sectional analyses among 14.7 million community-dwelling people in Ontario, Canada. Among those with a diagnosis, we used separate analytic designs to compare predictors of people testing positive versus negative; symptomatic people testing positive versus testing negative; and people testing positive versus people not testing positive (i.e., testing negative or not being tested). Our analyses included tests conducted between Mar. 1 and June 20, 2020.
RESULTS: Of 14 695 579 people, we found that 758 691 were tested for SARS-CoV-2, of whom 25 030 (3.3%) had a positive test result. The further the odds of testing from the null, the more variability we generally observed in the odds of diagnosis across analytic design, particularly among individual factors. We found that there was less variability in testing by social determinants across analytic designs. Residing in areas with the highest household density (adjusted odds ratio [OR] 1.86, 95% confidence interval [CI] 1.75–1.98), highest proportion of essential workers (adjusted OR 1.58, 95% CI 1.48–1.69), lowest educational attainment (adjusted OR 1.33, 95% CI 1.26–1.41) and highest proportion of recent immigrants (adjusted OR 1.10, 95% CI 1.05–1.15) were consistently related to increased odds of SARS-CoV-2 diagnosis regardless of analytic design.
INTERPRETATION: Where testing is limited, our results suggest that risk factors may be better estimated using population comparators rather than test-negative comparators. Optimizing COVID-19 responses necessitates investment in and sufficient coverage of structural interventions tailored to heterogeneity in social determinants of risk, including household crowding, occupation and structural racism.
The spread of SARS-CoV-2, the virus causing COVID-19, has resulted in a pandemic with heterogeneity in exposure and risk of transmission.1–4
Heterogeneity in social determinants of COVID-19 may exist at the individual and community (e.g., by housing density5–7) levels. In addition, social determinants of health, including barriers to health care, occupation, structural racism and xenophobia, have been implicated in COVID-19 risk.8,9 Environmental determinants such as ambient air pollution may also play a role, as evidence indicates that higher ambient air pollution increases risk for infection with other respiratory viruses10,11 and the development of severe COVID-19.12,13 Environmental factors are linked with structural racism (e.g., in the context of low-quality housing).12,14
Using observational data to identify risk factors for COVID-19 relies on SARS-CoV-2 testing, a service that is not equally distributed.15 Differential testing introduces the potential for selection biases,16,17 including collider bias.17 Collider bias may be introduced into epidemiologic studies of COVID-19 risk factors if the factors under investigation are related both to developing an infection and to the likelihood of being tested.17–19 For example, data suggest that people with diabetes are more likely to develop severe COVID-19 if infected with SARS-CoV-2.20,21 Thus, if infected, people with diabetes may be more likely to be tested, and consequently, diabetes may appear to be associated with a diagnosis of COVID-19 in studies of those tested for SARS-CoV-2, even if diabetes is not a risk factor for infection.17 The opposite may occur with underlying respiratory diseases (e.g., asthma) that have symptoms similar to those caused by SARS-CoV-2, leading to the appearance of potentially “protective” associations with COVID-19.22
Our objectives were to explore the potential for collider bias in a large study of COVID-19 determinants and examine individual, environmental and social determinants associated with testing and diagnosis among 14.7 million people in Ontario, Canada.17
Methods
Study population, setting and design
We conducted an observational study using data from population-based laboratory and health administrative databases in Ontario. Ontario’s health system provides universal access to hospital and physician services23 and laboratory testing.24 We used data from people who were tested between Mar. 1 and June 20, 2020, to identify determinants associated with testing, and then used 3 analytic designs to identify determinants associated with a positive result for SARS-CoV-2 testing.
Data sources, linkages and inclusion criteria
We identified testing status using data from the Ontario Laboratories Information System (OLIS) and linked this information to relevant health-related data sets containing demographic, health care use and area-level information. These data sets were linked using unique encoded identifiers and analyzed at ICES.25
The OLIS captured about 88% of all laboratory-identified SARS-CoV-2 reported by the province during the study period (calculated by cases identified in OLIS divided by the number of cases reported by Ontario’s COVID-19 dashboard in the same time frame). The OLIS records included specimen collection date, results and a text field for symptoms that was completed by health care providers at the time of sampling. We obtained individual- and area-level demographic and environmental information from the Registered Persons Database; the Canadian Institute for Health Information’s Discharge Abstract Database, Same Day Surgery Database and National Ambulatory Care Reporting System; the Ontario Health Insurance Program; the Ontario Mental Health Reporting System; the Ontario Population Health and Environment Cohort; and the 2016 Canada Census26 (Appendix 1, Supplemental Table 1, available at www.cmaj.ca/lookup/doi/10.1503/cmaj.202608/tab-related-content).
For people with more than 1 test in OLIS, we used the first positive or indeterminate test, or the first negative test if all tests during the study period were negative. We included people who were not tested during the study period if they were not recorded as deceased before or born after Mar. 1, 2020. To assess determinants of testing and diagnosis, we included people who underwent polymerase chain reaction testing for SARS-CoV-2 infection and were not residing in a long-term care facility as of Mar. 1, 2020.
Selection and definition of potential determinants of positive results for SARS-CoV-2 testing
We included sex, age group, underlying health conditions and previous use of health care as individual-level determinants. We selected underlying health conditions as those identified in both the peer- and non-peer-reviewed literature as being associated with COVID-19 severity2,27–30 or with symptoms similar to those of COVID-19, because severity and symptoms may lead to differential testing and thus, collider bias;31–36 or conditions that increase the need for personal care support (e.g., dementia), thereby reflecting an intersection with occupational risks among essential care providers.37,38
We hypothesized that health care use would increase access to testing and signal a marker for comorbidities; we measured health care use by the number of hospital admissions in the past 3 years, number of outpatient physician visits in the past year and influenza vaccination in the 2019–2020 season. We also included the Johns Hopkins ACG System39 Aggregated Diagnostic Groups (ADGs)40 as a composite measure of comorbidities.
Environmental determinants included fine particulate matter (PM2.5) using satellite-derived estimates41 and a land-use regression model for NO242 at the postal code level.
We conceptualized social determinants as area-based variables that might signal contact rates in communities (household density; apartment building density; and uncoupled status, e.g., not married);43,44 contact rates at work (“essential workers”);16,45 socioeconomic barriers to health care access or housing (household income and educational attainment);46,47 and factors related to race and ethnicity (visible minority status and recent immigration).8,9 We derived these variables from the 2016 Canada Census at the level of dissemination areas (DAs), the smallest geographic unit at which Census data are collected.48 We ranked dissemination areas at the city level (for median per-person income equivalent) or at the provincial level (for all other social determinants) and then categorized them into quintiles. For apartment building density and recent immigration status, the high frequency of zeros permitted the creation of only 3 categories (i.e., the lower 3 quintiles combined, and the fourth and fifth quintiles).
Statistical analysis
We defined the testing outcome as receipt of at least 1 SARS-CoV-2 test during the study period. The comparator group comprised Ontario residents who had no record of testing during the study period. We evaluated determinants of testing in unadjusted, age- and sex-adjusted, and fully adjusted logistic regression models that included all determinants. The fully adjusted model also included a fixed-effect covariate for public health region. Public health regions are geographic areas in which public health measures were differentially applied,49 and in which there may be variability in measured and unmeasured social determinants.50
To address the potential for collider bias, we compared the odds of a positive test result for SARS-COV-2 derived from unadjusted, age- and sex-adjusted, and fully adjusted logistic regression models (including all determinants and public health regions) using 3 study designs. The “pseudo-test-negative” design compared people who tested positive to people who tested negative; the “true test-negative” design was restricted to tested people who were recorded as having symptomatic illness;51 and the “case–control” design compared all people with a positive test result with all people without a positive test result (i.e., people with a negative test result or who were not tested).
To identify the determinants of being tested for SARS-CoV-2 and being positive for the virus, we focused on the results from fully adjusted logistic regression models from the pseudo-test-negative and case–control designs; the results using the true test-negative design are provided in Appendix 1. We interpreted each set of determinants as independent analyses based on directed acyclic graphs (Appendix 1, Supplemental Figure 1). We believed that the case–control design had the least potential for collider bias.
We conducted the statistical analysis using SAS version 9.4. To assess for collinearity, we evaluated tolerances and variance inflation factors.
Ethical approval
The use of data in this project was authorized under Section 45 of Ontario’s Personal Health Information Protection Act, which does not require review by a Research Ethics Board.
Results
Of 758 691 people tested during the study period, 25 030 (3.3%) tested positive for SARS-CoV-2 (Figure 1). Only 11.8% of those tested had a symptom recorded by the provider, 13.6% were considered asymptomatic, and 74.6% were missing symptom information. Descriptive characteristics of our study population are reported in Table 1 and Appendix 1, Supplemental Table 2.
Determinants of testing for SARS-CoV-2
In the fully adjusted analysis, we found that the odds of being tested increased with age (Table 2 and Appendix 1, Supplemental Table 3). Males had lower odds of testing than females. We also found that nearly every underlying health condition and most measures of previous use of health care were associated with increased odds of testing. In contrast, higher ambient air pollution was associated with reduced odds of testing. There was little variability in the odds of testing by most area-based social determinants of health. However, areas with higher visible minority populations had lower odds of testing, whereas areas with higher household income and greater percentages of uncoupled people had higher odds of testing. The estimates of the odds of being tested for most social determinants of health appeared to be progressively attenuated from unadjusted to age- and sex-adjusted, to fully adjusted regression models. Notably, the direction of the association between testing and income quintile changed direction after adjustment (Figure 2 and Appendix 1, Supplemental Table 3).
Variability in determinants of a positive result for SARS-CoV-2 testing across analytic designs
Our comparison of results using the different analytic designs showed important differences in individual-level determinants and fewer differences in social determinants (Table 2 and Appendix 1, Supplemental Tables 4–6). Variables that were associated with testing tended to show different relations with SARS-CoV-2 positivity across study designs. For example, the adjusted odds of being tested for adults aged 85 years or older compared with those younger than 5 years was 5.60 (95% confidence interval [CI] 5.47–5.73), and the adjusted odds of being positive for SARS-CoV-2 infection was 1.76 (95% CI, 1.51–2.06) with the pseudo-test-negative design and 7.26 (95% CI, 6.23–8.46) for the case–control design (Table 2). Some health conditions associated with higher odds of testing, such as chronic respiratory conditions and indicators of prior health care use, appeared protective against being positive using the pseudo-test-negative design, but showed no association or increased odds of being positive using the case–control design. Our results from the true test-negative design were largely similar to results from the pseudo-test-negative design with wider CIs, with the exceptions that odds of being positive were higher for people of older age using the true test-negative design compared with the pseudo-test-negative design, and lower for higher quintiles of essential workers in the true test-negative design compared with the pseudo-test-negative design (Appendix 1, Supplemental Tables 4 and 5).
Determinants of a positive test result for SARS-CoV-2 using the case–control design
Using the case–control design, we found that older age, certain comorbidities (i.e., hypertension, diabetes, congestive heart failure, dementia, chronic kidney disease and ischemic stroke/transient ischemic attack) and increased previous use of health care were associated with increased odds of a positive SARS-CoV-2 test result. Other comorbidities (i.e., asthma, cancer, ischemic heart disease and substance abuse) and receipt of influenza vaccine in the 2019–2020 season were associated with reduced odds of a positive test result (Table 2 and Appendix 1, Supplemental Table 6).
The 2 highest categories of PM2.5 exposure were associated with increased odds of being positive, whereas no categories of exposure to NO2 were associated with increased odds of positivity.
We also found that higher household density, increased apartment building density, greater percentages of uncoupled people and greater percentages of essential workers were associated with greater odds of a positive SARS-CoV-2 test. Lower educational attainment was related to increased odds, but there was no statistically consistent relation with household income. We also determined that being in the highest quintile of neighbourhoods with visible minorities and greater percentages of recent immigrants were associated with greater odds of a positive SARS-CoV-2 test result. Associations were attenuated after adjustment for all social determinants except for household density and essential work (Figure 3 and Appendix 1, Supplemental Table 6).
Our evaluation of collinearity diagnostics found that all tolerances were below 1 and all variance inflation factors were below 5 (Appendix 1, Supplemental Table 7).
Interpretation
We found that our 3 analytic designs identified different individual determinants of positive test results for SARS-CoV-2, likely because of collider bias. Using the case–control analysis, which we considered the least biased, we identified particular individual, environmental, and social determinants of health as key determinants for testing positive for SARS-CoV-2.
Using the true test-negative and pseudo-test-negative designs we found a high potential for erroneously identifying some individual determinants, such as underlying health conditions as protective against testing positive for SARS-CoV-2, although they were associated with higher rates of being tested. These health conditions are associated with COVID-19 severity2 and may have been prone to collider bias, where the direction of effect measures changes based on model choice. Similar results were found with health care use variables. Thus, assessment of determinants for SARS-CoV-2 positive test results require careful interpretation by evaluation of the reasons for testing.17 In the context of low overall levels of testing, the case–control design appears to have mitigated some potential sources of collider bias, with the assumption that those not tested are similar to those who tested negative.16,17
We found that some underlying health conditions remained associated with diagnoses using the case–control design, reflecting either unmeasured confounding or possible biological susceptibility to infection if exposed.10,11,20,53,54 For example, dementia and frailty remained independently associated with diagnosis, which may have been due to unmeasured confounding such as higher rates of contacts with caregivers or residence in other types of congregate settings such as retirement homes. Thus, underlying health conditions like dementia and frailty represent targets for prevention with strategies tailored to reduce exposures among people characterized by these individual determinants.
During the study period, the testing criteria for SARS-CoV-2 in Ontario shifted from a focus on returning symptomatic travellers to people who were severely symptomatic and those with occupational exposure to additional testing of people who were asymptomatic.31–34,36 These changes may have created differences among people who were tested and had symptoms compared with all people who were tested. In our study, the restriction of the test-negative design to people with symptoms did not yield substantially different results than the test-negative design that included symptomatic and asymptomatic people for most determinants, but this may have been due, in part, to the high proportion of people who were missing symptom information (74.6%).
The independent association between high PM2.5 and diagnosis may reflect unmeasured social determinants of health.55,56 However, studies have also implicated environmental pollution as having a biological relation to the risk and severity of COVID-19.10–12
We identified increased odds of a positive test result for SARS-CoV-2 associated with household density, apartment building percentage, uncoupled status, essential work, educational attainment and recent immigration, consistent with findings from other settings.50,57,58 Household size has been shown to be a consistent risk factor across a broad range of settings.59,60 These higher infection rates are likely due to prolonged and physically closer in-person contacts occurring more frequently within the household.60 Essential services and occupations have also been associated with higher exposure risk,61 either because such jobs cannot be done feasibly with proper protections or because protective policies and materials are not issued, leaving workers at high risk.62,63
We found that higher percentages of recent immigrants in an area were associated with a positive test result for SARS-CoV-2, even after adjustment, although the percentage of visible minorities was not. Both variables might represent residual measures of structural racism, potentiating increased risk of SARS-CoV-2 exposure and COVID-19 severity,64–66 including hospital admission and death related to COVID-19.9,16,28,58 We found the association between visible minority status and diagnosis was attenuated after adjustment for individual, environmental and other social determinants of health. These findings likely reflect what is already known about race and ethnicity as social constructs and social determinants of health.67 Finally, because there was little association between most social determinants and the odds of testing suggests that testing resources may not be adequately prioritized to people at greatest risk.68
Our findings suggest a need to increase and redirect resources that specifically address social determinants such as household density47,69 (e.g., voluntary isolation centres70 and wrap-around services71), occupational risk62,66 (e.g., paid sick leave,72 workplace testing73 and improved ventilation62) and other mediators of structural racism68,74,75 (e.g., community-led outreach testing76). Our findings also suggest prioritizing COVID-19 vaccination strategies that reach communities and workplaces having the highest rates of cases. Although the Chief Public Health Officer of Canada has suggested an equity lens to the public health response to COVID-19,45 much of the response on COVID-19 equity and outreach to marginalized communities to date has been accomplished through smaller independent groups, including volunteer organizations.77–79
Limitations
Our determination of positive test results for SARS-CoV-2 was restricted to laboratory-confirmed cases and to the 88% of total provincial diagnoses that were available via OLIS. We assumed that determinants remained constant across the study period, whereas surveillance data suggest shifts in how infections propagate between social networks.80 Future analysis should evaluate changes in the direction and size of determinants over the course of the outbreak. Our models also adjusted for public health region, within which many social determinants cluster,50 and we cannot infer from our results how social determinants of diagnosis may vary among and within these geographic regions. We measured social determinants at the area level and these determinants were not available at the individual level; however, by describing individuals’ neighbourhoods, our analysis reflected the role of structural and environmental determinants for people living in them. We may have over-adjusted in the fully adjusted models in our analysis because of the large number of covariates. However, the directions of effect estimates generally remained the same after full adjustment, and the sample size of our analyses provided adequate statistical power. Finally, some relevant determinants, such as obesity,22,80 were not available for our study.81
Conclusion
We found that demographic and health-related risks for positive test results for SARS-CoV-2, which generally have been the targets of response strategies against COVID-19 to date, appeared subject to collider bias. However, we observed consistent relations between testing positive for SARS-CoV-2 and key social determinants of health, including essential worker status, number of people living in a household and educational attainment. Effective responses to COVID-19 require that the social determinants associated with access to testing and SARS-CoV-2 transmission risks be characterized and addressed using risk-tailored, community-based interventions.
Acknowledgements
The authors thank IQVIA Solutions Canada for use of their Drug Information File, as well as Owen Langman for technical assistance. Finally, the authors are grateful to the 14.7 million Ontario residents without whom this research would be impossible.
Footnotes
Competing interests: Adrienne Chan is a member of the board of Partners in Health Canada. Mackenzie Hamilton is currently serving an internship at AstraZeneca Canada in support of health research initiatives in lupus and severe asthma. No other competing interests were declared.
This article has been peer reviewed.
Contributors: Jeffrey Kwong, Sharmistha Mishra and Stefan Baral designed the study. Andrew Calzavara conducted all data analyses (data set and variable creation and statistical modelling). Jeffrey Kwong, Sharmistha Mishra, Stefan Baral, Rafal Kustra and Andrew Calzavara designed the analysis plans and conducted variable selection, with input from Hong Chen and Adrienne Chan on variable selection and definitions. Mackenzie Hamilton, Mohamed Djebli, Laura Rosella and Tristan Watson contributed to analytic plans related to collider bias. Branson Chen contributed to data analyses and data preparation for the symptomatic data set. Maria Sundaram, Jeffrey Kwong, Stefan Baral and Sharmistha Mishra wrote the manuscript. All of authors interpreted the data, critically reviewed and edited the manuscript, gave final approval of the version to be published and agreed to be accountable for all aspects of the work. Sharmistha Mishra and Stefan Baral are co–senior authors.
Funding: This study was funded by the St. Michael’s Hospital Research Innovation Council COVID-19 Research Grant. Stefan Baral, Jeffery Kwong, Sharmistha Mishra and Maria Sundaram received a research operating grant (VR5-172683) from the Canadian Institutes of Health Research. Sharmistha Mishra is supported by a Tier 2 Canada Research Chair in Mathematical Modeling and Program Science. Jeffrey Kwong is supported by a Clinician-Scientist Award from the University of Toronto Department of Family and Community Medicine.
Data sharing: The data set from this study is held securely in coded form at ICES. While legal data-sharing agreements between ICES and data providers (e.g., health care organizations and government) prohibit ICES from making the data set publicly available, access may be granted to those who meet prespecified criteria for confidential access, available at https://www.ices.on.ca/DAS (email: das{at}ices.on.ca). The full dataset creation plan and underlying analytic code are available from the authors upon request, understanding that the computer programs may rely upon coding templates or macros that are unique to ICES and are therefore either inaccessible or may require modification.
Disclaimers: The opinions, results, and conclusions reported in this paper are those of the authors and are independent from the funding sources. Parts of this material are based on data and/or information compiled and provided by the Canadian Institute for Health Information (CIHI) and by Cancer Care Ontario (CCO). However, the analyses, conclusions, opinions, and statement expressed herein are those of the authors, and not necessarily those of CIHI or CCO. No endorsement by ICES, the Ministries of Health and Long-Term Care (MOHLTC), CIHI or CCO is intended or should be inferred.
This study was supported by ICES, which is funded by an annual grant from the Ontario MOHLTC. The study sponsors did not participate in the design and conduct of the study; collection, management, analysis and interpretation of the data; preparation, review or approval of the manuscript; or the decision to submit the manuscript for publication
- Accepted April 6, 2021.
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY-NC-ND 4.0) licence, which permits use, distribution and reproduction in any medium, provided that the original publication is properly cited, the use is noncommercial (i.e., research or educational use), and no modifications or adaptations are made. See: https://creativecommons.org/licenses/by-nc-nd/4.0/