Replications for introduction to g-methods: marginal structural models (part 2)

2024-05-31T00:00:00-07:00

This tutorial series aims to replicate g-methods explained in this paper by Naimi, A. I., Cole, S. R., and Kennedy, E. H (2017)¹ using R. Originally, the paper used SAS to demonstrate g-methods.

In this tutorial, we will focus on replicating the results using marginal structural models to estimate the average causal effects of always taking treatment and compared to never-taking treatment. From our knowledge of the data-generating process, we know this average causal effect to be \(50\). In our tutorial, we will pay more attention to computation rather than proofs to perform replciations.

Reminder: Our Settings

Replications for introduction to g-methods: g-formula (part 1)

2024-05-22T00:00:00-07:00

This tutorial series aims to replicate g-methods explained in this paper by Naimi, A. I., Cole, S. R., and Kennedy, E. H (2017)¹ using R. Originally, the paper used SAS to demonstrate g-methods.

In this tutorial, we will focus on replicating the results using g-formula to estimate the average causal effects of always taking treatment compared to never taking treatment. From our knowledge of the data-generating process, we know this average causal effect to be \(50\). In our tutorial, we will pay more attention to computation rather than proofs to perform replications.

Reminder: Our Settings

Replications for introduction to g-methods: standard linear regression (part 0)

2024-05-21T00:00:00-07:00

This tutorial series aim to replicate g-methods explained in this paper by Naimi, A. I., Cole, S. R., and Kennedy, E. H (2017)¹ using R. Originally, the paper used SAS to demonstrate g-methods.

In this tutoiral, we will focus on replicating the data generation and estimates from the standard linear regression model to understand the crude differences between treated and control groups.

The empirical setting is to treat HIV with a therapy regimen (\(A\)) in two time periods (\(t = 0, t = 1\)). Additionally, we measure the time-varying confounder, HIV viral load (\(Z\)) at times \(t = 0\) and \(t = 1\). Note that this time-varying confounder is measured prior to the treatment administered at each time period. Also, we assume \(Z\) at time 0 is 1 (high, bad health condition) for all subjects. Our outcome is the CD4 count (cells/mm\(^3\)) observed at \(t = 2\).

Thus, we have:

Under the identifying assumptions described in the paper, we will estimate the average casual effect of always taking treatment (\(a_0 = 1, a_1 = 1\)), compared to never taking treatment (\(a_0 = 0, a_1 = 0\)) in both time periods. For notation, we are using subscripts to indicate time periods.

# Read packages 
library(tidyverse)
library(ascii)

We will begin with the standard linear regression to peak the crude differences between always-treated and never-treated in their outcome.

# Replication

# Data in Table 1 
# A0: Treatment at time 0 (for HIV)
# Z1: HIV viral load measurement prior to time 1 (Higher: Worse health condition)
# A1: Treatment at time 1
# Y: Mean of CD4 counts 
# N: Number of subjects 


a0 <- c(0, 0, 0, 0, 1, 1, 1, 1)
z1 <- c(0, 0, 1, 1, 0, 0, 1, 1)
a1 <- c(0, 1, 0, 1, 0, 1, 0, 1)
y <- c(87.288, 112.107, 119.654, 144.842, 105.282, 130.184, 137.720, 162.832)
n <- c(209271, 93779, 60657, 136293, 134781, 60789, 93903, 210527)

data <- data.frame(a0, z1, a1, y, n)

Replicates for Table 1

a0	z1	a1	y	n
0	0	0	87.288	209271
0	0	1	112.107	93779
0	1	0	119.654	60657
0	1	1	144.842	136293
1	0	0	105.282	134781
1	0	1	130.184	60789
1	1	0	137.720	93903
1	1	1	162.832	210527

Now, we fit the data with standard linear regression models to replicate Table 2.

Standard Linear Regressions

# Section 1
# Standard linear regression models in Table 2 
# Not considering standard error estimation 

# Crude model 
lm_1 <- lm(y ~ a0 + a1 + a0*a1, data = data, weights = n)

# Z-Adjusted model 
lm_2 <- lm(y ~ a0 + a1 + a0*a1 + z1, data = data, weights = n)

# A0 only model 
lm_3 <- lm(y ~ a0, data = data, weights = n)

# Z1-Adjusted A0 Only Model
lm_4 <- lm(y ~ a0 + z1, data = data, weights = n)

# A1 Only Model
lm_5 <- lm(y ~ a1, data = data, weights = n)

# Z1-Adjusted A1 Only Model
lm_6 <- lm(y ~ a1 + z1, data = data, weights = n)

We now check the crude difference in mean CD4 count for the always treated compared to the never treated. In the first model, \(\hat{\beta} = 60.96\) as reported in the paper. In model two, \(\hat{\beta} = 43.01\), which is slightly larger than as it is reported. This estimate is the \(Z_1\)-adjusted difference in mean CD4 count for the same contrast. The results from other four models are reported in Table below.

In the next post, we will move onto fitting g-methods to the data in Table 1, beginning with g-formula.

table2 <- data.frame(model = c("m1", "m2", "m3", "m4", "m5", "m6"), 
                     beta1 = c(lm_1$coefficients["a0"] + lm_1$coefficients["a1"] + lm_1$coefficients["a0:a1"], 
                               lm_2$coefficients["a0"] + lm_2$coefficients["a1"] + lm_2$coefficients["a0:a1"],
                               lm_3$coefficients["a0"],
                               lm_4$coefficients["a0"],
                               lm_5$coefficients["a1"],
                               lm_6$coefficients["a1"])) %>% 
  mutate(beta1 = round(beta1, 2))

Replicates for Table 2

model	beta1
m1	60.96
m2	43.01
m3	27.08
m4	18.03
m5	38.91
m6	25.01