Linear and logistic regression models: when to use and how to interpret them?

Castro1,2, Horacio Matias; Ferreira1,3, Juliana Carvalho

doi:10.36416/1806-3756/e20220439

6584
Views

Back to summary

Open Access

Peer-Reviewed
Educação Continuada: Metodologia Científica

Linear and logistic regression models: when to use and how to interpret them?

Modelos de regressão linear e logística: quando utilizá-los e como interpretá-los?

Horacio Matias Castro1,2, Juliana Carvalho Ferreira1,3

DOI: 10.36416/1806-3756/e20220439

PRACTICAL SCENARIO

A secondary analysis(1) of a study designated “Integrating Palliative and Critical Care,” a cluster randomized trial, was conducted to explore differences in receipt of elements of palliative care among patients who died in the ICU with interstitial lung disease (ILD) or COPD in comparison with those who died of cancer. The authors used two methods of multiple regression analysis: linear regression to estimate the impact of COPD and ILD, in comparison with that of cancer, on the length of ICU stay, and logistic regression to evaluate the effects of COPD and ILD on the presence or absence of elements of palliative care. All regression models were adjusted for confounders (age, sex, minority status, education level, among others) of the association between the patient diagnosis and palliative care outcomes.

INTRODUCTION

Linear and logistic regressions are widely used statistical methods to assess the association between variables in medical research. These methods estimate if there is an association between the independent variable (also called predictor, exposure, or risk factor) and the dependent variable (outcome).(2)

The association between two variables is evaluated with simple regression analysis. However, in many clinical scenarios, more than one independent variable may be associated with the outcome, and there may be the need to control for confounder variables. When more than two independent variables are associated with the outcome, multiple regression analysis is used. Multiple regression analysis evaluates the independent effect of each variable on the outcome, adjusting for the effect of the other variables included in the same regression model.

WHEN TO USE LINEAR OR LOGISTIC REGRESSION?

The determinant of the type of regression analysis to be used is the nature of the outcome variable. Linear regression is used for continuous outcome variables (e.g., days of hospitalization or FEV1), and logistic regression is used for categorical outcome variables, such as death. Independent variables can be continuous, categorical, or a mix of both.

In our example, the authors wanted to know if there was a relationship between cancer, COPD, and ILD (baseline disease; the independent variables) with two different outcomes. One outcome was continuous (length of ICU stay) and the other one was categorical (presence or absence of elements of palliative care). Therefore, two models were built: a linear model to examine the association between baseline disease (chronic pulmonary disease or cancer) and length of ICU stay, and a logistic regression analysis to examine the association between the baseline disease and being in receipt of elements of palliative care.

HOW TO INTERPRET RESULTS OF REGRESSION ANALYSIS?

Regression models are performed within statistical packages, and the output results include several parameters, which can be complex to interpret. Clinicians who are learning the basics of regression models should focus on the key parameters presented in Chart 1.

In our example, the baseline disease—COPD, ILD, or cancer (the reference category)—is the independent variable, and length of ICU stay and receipt of palliative care elements are the outcomes of interest. In addition, the regression models also included other independent variables considered as potential confounders, such as age, sex, and minority status. In the linear regression model, the length of ICU stay for patients with ILD was longer than for those with cancer (β = 2.75; 95% CI, 0.52-4.98; p = 0.016), which means that, on average, having ILD increased the length of ICU stay in 2.75 days when compared with the length of ICU stay among cancer patients. In the logistic regression model, the authors found that patients with ILD, when compared with cancer patients, were less likely to have any documentation of their pain assessment in the last 24 h of life (OR = 0.43; 95% CI, 0.19-0.97; p = 0.042), which means that having ILD decreased the odds of documentation of pain assessment by more than half.

KEY POINTS

Linear and logistic regressions are important statistical methods for testing relationships between variables and quantifying the direction and strenght of the association.

Linear regression is used with continuous outcomes, and logistic regression is used with categorical outcomes.

These procedures require expertise in regression model building and typically require the assistance of a biostatis-tician.

REFERENCES

1. Brown CE, Engelberg RA, Nielsen EL, Curtis JR. Palliative Care for Patients Dying in the Intensive Care Unit with Chronic Lung Disease Compared with Metastatic Cancer. Ann Am Thorac Soc. 2016;13(5):684-689. https://doi.org/10.1513/AnnalsATS.201510-667OC
2. Bzovsky S, Phillips MR, Guymer RH, Wykoff CC, Thabane L, Bhandari M, et al. The cli-nician’s guide to interpreting a regression analysis. Eye (Lond). 2022;36(9):1715-1717. https://doi.org/10.1038/s41433-022-01949-z

Linear and logistic regression models: when to use and how to interpret them?

Modelos de regressão linear e logística: quando utilizá-los e como interpretá-los?

Related articles

Indexes

Official publication

Newsletters