Local dependence LCA in R

Author

DL Oberski

Local dependence in LCA

The standard latent class “cluster” model specifies that the indicators \(Y_1, Y_2, \ldots, Y_p\) should be conditionally independent, given the latent variable \(X\), say, \[ P(Y_1, Y_2, \ldots, Y_p | X) = \prod_{j = 1}^p P(Y_j | X). \] The product on the right-hand side indicates this conditional independence.

For example, suppose four diagnostic tests \(Y_j\), for \(j = 1, \ldots, p = 4\), have been used to test for a disease \(X \in \{0, 1\}\). Then for all people within the “healthy” class (\(X=0\), say), the four diagnostic tests should be like four independent biased coin flips, and the same should apply in the “diseased” class. This would be violated, for instance, when on diagnostic test depends on the results of another (e.g. the results of one are used to determine another), or when two tests use the same biological technique, thus giving similar errors. The same concern can be found in the SEM literature under the term “error correlation”.

Local dependence is a violation of the assumptions. Instead of the model given above, if, say, indicators \(Y_1\) and \(Y_2\) are locally dependent, we cannot use the simple form given on the right-hand side in that equation, and we are forced to write \[ P(Y_1, Y_2, \ldots, Y_p | X) = P(Y_1, Y_2 | X) \prod_{j = 3}^p P(Y_j | X), \] so we have to have a separate model for the conditional “cross-table” \(P(Y_1, Y_2 | X)\). Ignoring this can have consequences for the solution and your subsequent conclusions. In the first, locally independent, equation above, the sensitivity (\(P(Y_j = 1 | X = 1)\)) and specificity (\(P(Y_j = 0 | X = 0)\)) of two indicators will look high to the model if the indicators are strongly dependent. But when this dependence was due to an error correlation, the sensitivity and specificity will be overestimated. In the extreme case, imagine including the same completely random coin flip under two different names, with two excellent indicators of the latent variable. The latent class model will then output the exact opposite of the truth: the coin flips will look reliable, while the excellent indicators will look worthless, as you can verify for yourself below.

options(warn = 1)
library(poLCA)
set.seed(202302)
n <- 2*50L # Ensure sample size is even
sensitivity <- 0.8
specificity <- 0.9

true_x <- rep(1:2, each = n/2) # Create true classes
probs_indicators <- c(1 - specificity, sensitivity)[true_x]

# The following is a single completely useless noise variable
# (adding 1 is necessary because poLCA expects Y ∈ {1,2})
garbage_coinflip <- rbinom(n, 1, prob = 0.5) + 1

# Here are two excellent indicators of true_x, both with Se=0.8, Sp=0.9
great_indicator1 <- rbinom(n, 1, prob = probs_indicators) + 1
great_indicator2 <- rbinom(n, 1, prob = probs_indicators) + 1

# The data contain two completely worthless indicators, which 
#   are completely dependent (in fact they are identical)
# and two excellent indicators, which do have some small amount of error
made_up_data <- data.frame(noise1 = garbage_coinflip, 
                noise2 = garbage_coinflip, 
                good1 = great_indicator1, 
                good2 = great_indicator2)

# Run the latent class model using poLCA
fit_polca <- poLCA(cbind(noise1, noise2, good1, good2) ~ 1, nclass = 2, data = made_up_data)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$noise1
          Pr(1) Pr(2)
class 1:      1     0
class 2:      0     1

$noise2
          Pr(1) Pr(2)
class 1:      1     0
class 2:      0     1

$good1
           Pr(1)  Pr(2)
class 1:  0.5000 0.5000
class 2:  0.6042 0.3958

$good2
           Pr(1)  Pr(2)
class 1:  0.5192 0.4808
class 2:  0.5417 0.4583

Estimated class population shares 
 0.52 0.48 
 
Predicted class memberships (by modal posterior prob.) 
 0.52 0.48 
 
========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 100 
number of estimated parameters: 9 
residual degrees of freedom: 6 
maximum log-likelihood: -206.6095 
 
AIC(2): 431.2189
BIC(2): 454.6655
G^2(2): 44.2938 (Likelihood ratio/deviance statistic) 
X^2(2): 40.92018 (Chi-square goodness of fit) 
 
library(flexmix)
Loading required package: lattice
fit_fm_ld_direct <- 
  flexmix(~1, data = made_up_data-1, 
          k = 2, 
          model = list(
            FLXMCmvbinary(noise1 ~ 1),
            FLXMRglmfix(formula = cbind(noise2, 1-noise2) ~ 1, 
                        nested = list(k = 2, formula = ~ noise1),
                        family = "binomial"), 
            FLXMCmvbinary(good1 ~ 1),
            FLXMCmvbinary(good2 ~ 1)))

summary(fit_fm_ld_direct)

Call:
flexmix(formula = ~1, data = made_up_data - 1, k = 2, model = list(FLXMCmvbinary(noise1 ~ 
    1), FLXMRglmfix(formula = cbind(noise2, 1 - noise2) ~ 1, 
    nested = list(k = 2, formula = ~noise1), family = "binomial"), 
    FLXMCmvbinary(good1 ~ 1), FLXMCmvbinary(good2 ~ 1)))

       prior size post>0 ratio
Comp.1  0.55   55    100  0.55
Comp.2  0.45   45    100  0.45

'log Lik.' -184.6463 (df=10)
AIC: 389.2926   BIC: 415.3443 
parameters(fit_fm_ld_direct)
[[1]]
Comp.1.center Comp.2.center 
    0.5273048     0.4222115 

[[2]]
                    Comp.1    Comp.2
coef.noise1       53.13213  53.13213
coef.(Intercept) -26.56607 -26.56607

[[3]]
Comp.1.center Comp.2.center 
  0.003545498   0.995397964 

[[4]]
Comp.1.center Comp.2.center 
    0.1794210     0.8249772 

As you can see, according to LCA, the locally dependent random noise items are perfect, while the very good indicators are almost worthless – exactly opposite to the truth. If you were so inclined, you could play around with the following elements to see how results might change:

  • sensitivity and specificity values (currently \(\text{Se}=0.8\), \(\text{Sp}=0.9\));
  • number of “good” indicators (currently two);
  • number of dependent noise items (currently two);
  • the amount of local dependence (currently perfect dependence).

Including an additional class can model this dependence. You can verify this by changing nclass = 2 to nclass = 3 above to see how results change. This may be a satisfactory solution there is no strong substantive reason to prefer a specific number of classes. For example, when mixture modeling is used as a density estimation technique, or when the analysis is exploratory. But you may not want to increase the number of classes when the latent classes are intended to have some predefined meaning, as is the case for our diagnostic testing example. In that case, we would like the two classes to signify “no disease” and “disease”, while possibly accounting for any local dependence.

Example data

Data from Uebersax (2009), https://www.john-uebersax.com/stat/condep.htm. These are data on four diagnostic tests for human HIV virus (Table 1) reported by Alvord et al. (1988).

library(tidyverse)

# https://www.john-uebersax.com/stat/condep.htm
uebersax <- read_table("../Examples/uebersax.tab")

uebersax_01 <- uebersax %>% 
  mutate(across(A:D, `-`, 1)) %>% 
  mutate(Freq = as.integer(Freq))

# Convenience function to convert frequency data to full data 
#.  by replicating rows. This is necessary because poLCA does
#.  not handle frequency data directly.
frequencies_to_fulldata <- function(df_freq) {
  vnames <- names(df_freq)[-which(names(df_freq) == "Freq")]
  
  fulldata <- apply(df_freq, 1, function(row) {
    if(row['Freq'] != 0)
      t(replicate(row['Freq'], row[vnames], simplify = TRUE))
    }) %>% Reduce(rbind, .) %>% as.data.frame
  
  names(fulldata) <- vnames

  fulldata
}

# A dataset in which each row is replicated Freq number of 
#   times. Doing as.data.frame(table(uebersax_fulldata)) should
#.  give back the orginal, uebersax again
uebersax_fulldata <- frequencies_to_fulldata(uebersax)
Test Description
A Radioimmunoassay of antigen ag121
B Radioimmunoassay of HIV p24
C Radioimmunoassay of HIV gp120
D Enzyme-linked immunosorbent assay
knitr::kable(uebersax_01)
A B C D Freq
0 0 0 0 170
0 0 0 1 15
0 0 1 0 0
0 0 1 1 0
0 1 0 0 6
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 4
1 0 0 1 17
1 0 1 0 0
1 0 1 1 83
1 1 0 0 1
1 1 0 1 4
1 1 1 0 0
1 1 1 1 128

Fit the model using poLCA.

flowchart TD
  X((Disease)) --> A
  X --> B
  X --> C
  X --> D
f_ueber <- cbind(A, B, C, D) ~ 1
fit_ueber_polca <- poLCA(f_ueber, data = uebersax_fulldata)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$A
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.9703 0.0297

$B
           Pr(1)  Pr(2)
class 1:  0.4290 0.5710
class 2:  0.9644 0.0356

$C
           Pr(1)  Pr(2)
class 1:  0.0871 0.9129
class 2:  1.0000 0.0000

$D
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.9195 0.0805

Estimated class population shares 
 0.5401 0.4599 
 
Predicted class memberships (by modal posterior prob.) 
 0.5421 0.4579 
 
========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 428 
number of estimated parameters: 9 
residual degrees of freedom: 6 
maximum log-likelihood: -629.8827 
 
AIC(2): 1277.765
BIC(2): 1314.297
G^2(2): 16.22724 (Likelihood ratio/deviance statistic) 
X^2(2): 17.1146 (Chi-square goodness of fit) 
 

Detecting local dependence

Bivariate residuals

We first load a few convenience functions that work with poLCA objects from the poLCA.extras package. You may need to install this using remotes::install_github("daob/poLCA.extras").

# remotes::install_github("daob/poLCA.extras")

library(poLCA.extras)
bvr_ueber <- bvr(fit_ueber_polca) 
bvr_ueber |> round(4) 
       A      B      C
B 0.0243              
C 0.0026 4.0734       
D 0.0308 0.0528 0.0094

Indicators B and C appear to exhibit local dependence. That is the same conclusion reached by Uebersax. However, the BVRs are only approximately chi-square distributed, so they cannot be directly referred to a chi-square distribution. Instead we use a parametric bootstrap. The resulting bootstrapped \(p\)-values are:

pvals_boot <- bootstrap_bvr_pvals(f_ueber, 
                                  data = uebersax_fulldata,
                                  fit_polca = fit_ueber_polca,
                                  nclass = 2, nrep = 3)

pvals_boot
      A     B     C
B 0.610            
C 0.520 0.000      
D 0.265 0.560 0.240

So, the parametric bootstrap of our BVR values confirms there appears to be a local dependence between indicators B and C.

Now let’s look at the bivariate residuals for our extreme earlier example, with perfect error dependence of two noise variables.

bvr(fit_polca)
         noise1   noise2    good1
noise2  0.00200                  
good1   0.00000  0.00000         
good2   0.00000  0.00000 40.44858
# The bootstrap is not really necessary, and gives the 
#   expected result. If you wished to confirm this, you could 
#   uncomment the code below:
#bootstrap_bvr_pvals(cbind(noise1, noise2, good1, good2) ~ 1, 
#                                  data = made_up_data,
#                                  fit_polca = fit_polca,
#                                  nclass = 2, nrep = 5)

Note that the BVR detects local dependence, but the wrong pair of variables is singled out! This is because the latent class variable in the above, extreme, solution has taken over the role of the error dependence, while the “error dependence” is actually the substantively interesting disease status. While this is probably an extreme situation that is not particularly plausible in many applications. Still, it serves as an important warning that error dependencies found might not correspond to the “true” dependencies.

Modeling local dependence

Now that we know:

  • There is strong local dependence, and
  • The local dependence can make a large difference to the results;

what can we do about it? As mentioned above, the simplest solution is always to increase the number of classes. But when this is not desired there are some ways of keeping the number of classes the same, but allowing for dependence within those classes.

library(flexmix)

Joint item method

flowchart TD
  X((Disease)) --> A
  X --> BC
  X --> D

First we reproduce the independence model, this time using flexmix.

fit_fm <- flexmix(cbind(A, B, C, D) ~ 1, k = 2, 
                  weights = ~Freq,
                  data = uebersax_01, 
                  model = FLXMCmvbinary())

round(parameters(fit_fm), 4)
         Comp.1 Comp.2
center.A 1.0000 0.0297
center.B 0.5710 0.0356
center.C 0.9128 0.0000
center.D 1.0000 0.0805
BIC(fit_fm)
[1] 1314.298

Next, we model the indicators B and C jointly, using a multinomial model. The same could be achieved by combining the two items into a single indicator, but here we have done that using the interaction() function within the model formula.

fit_fm_ld <- flexmix(cbind(A, B, C, D) ~ 1, k = 2, 
                  weights = ~Freq,
                  data = uebersax_01, 
                  model = list(
                    FLXMCmvbinary(A ~ 1),
                    FLXMRmultinom(I(interaction(B, C)) ~ .),
                    FLXMCmvbinary(D ~ 1))
)

parameters(fit_fm_ld)
[[1]]
Comp.1.center Comp.2.center 
   0.02762408    1.00000000 

[[2]]
          Comp.1    Comp.2
coef1  -3.295787 -1.426241
coef2 -13.344558  1.610088
coef3 -13.344558  2.043280

[[3]]
Comp.1.center Comp.2.center 
   0.07853393    0.99999989 
BIC(fit_fm_ld)
[1] 1313.246

Item conditional probabilities can still be calculated by summing over the joint crosstable.

est <- fitted(fit_fm_ld)

indices <- with(uebersax_01, 
     str_split(levels(interaction(B, C)), "\\.", 
               simplify = TRUE) |>
       apply(2, as.numeric))

cbind(
  B1 = tapply(est$Comp.1[1, 2:5], indices[, 1], sum),
  B2 = tapply(est$Comp.2[1, 2:5], indices[, 1], sum),
  C1 = tapply(est$Comp.1[1, 2:5], indices[, 2], sum),
  C2 = tapply(est$Comp.2[1, 2:5], indices[, 2], sum)
) |> round(4)
      B1     B2 C1     C2
0 0.9643 0.4301  1 0.0888
1 0.0357 0.5699  0 0.9112

Direct effect method

With local dependence as a direct effect (Hagenaars), equal across classes

flowchart TD
  X((Disease)) --> A
  X --> B
  X --> C
  X --> D
  C --> B
fit_fm_ld_direct <- 
  flexmix(~1, data = uebersax_01, k = 2, 
          weights = ~Freq,
          model = list(
            FLXMCmvbinary(A ~ 1),
            FLXMRglmfix(formula = cbind(B, 1-B) ~ 1, 
                        nested = list(k = 2, formula = ~ C),
                        family = "binomial"), 
            FLXMCmvbinary(C ~ 1),
            FLXMCmvbinary(D ~ 1)))

summary(fit_fm_ld_direct)

Call:
flexmix(formula = ~1, data = uebersax_01, k = 2, model = list(FLXMCmvbinary(A ~ 
    1), FLXMRglmfix(formula = cbind(B, 1 - B) ~ 1, nested = list(k = 2, 
    formula = ~C), family = "binomial"), FLXMCmvbinary(C ~ 1), 
    FLXMCmvbinary(D ~ 1)), weights = ~Freq)

       prior size post>0 ratio
Comp.1 0.459  196    217 0.903
Comp.2 0.541  232    232 1.000

'log Lik.' -623.2972 (df=10)
AIC: 1266.594   BIC: 1307.186 
parameters(fit_fm_ld_direct) |> lapply(round, 4)
[[1]]
Comp.1.center Comp.2.center 
       0.0276        1.0000 

[[2]]
                  Comp.1  Comp.2
coef.C            1.8594  1.8594
coef.(Intercept) -3.2958 -1.4263

[[3]]
Comp.1.center Comp.2.center 
       0.0000        0.9112 

[[4]]
Comp.1.center Comp.2.center 
       0.0785        1.0000 

We can again calculate a conditional probability in each class for B, this time by summing over values of C. Because the model is conditional on C, this time we weight by the marginal of C, i.e. \(P(B | X) = \sum_C P(B | X, C) P(C)\).

P_B_given_XC <- predict(fit_fm_ld_direct, 
                        newdata = data.frame(C=0:1))

P_C <- xtabs(Freq~C, data = uebersax_01) |> prop.table()

c(Comp.1.avg=sum(P_B_given_XC[[1]][, 2] * P_C), 
  Comp.2.avg=sum(P_B_given_XC[[2]][, 2] * P_C))
Comp.1.avg Comp.2.avg 
 0.1128125  0.3972642 

Loglinear formulation

We can use the excellent cvam package to create a loglinear latent class model. This allows for full flexibility, including the addition of local dependence parameters that do not require an arbitrary choice of “dependent” observed variable.

flowchart TD
  X((Disease)) --> A
  X --> B
  X --> C
  X --> D
  C <--> B

This is probably the best solution, if you can make it work. It only tends to work for smaller examples since cvam has trouble with larger complete data cross-tables.

It works great for the small uebersax example.

library(cvam)

options(contrasts = c("contr.sum", "contr.linear"))

df_freq <- uebersax %>% mutate_at(1:4, as.factor)
df_freq$X <- latentFactor(NROW(df_freq), 2)

formula <- 
  paste(names(df_freq)[!names(df_freq) %in% c("Freq", "X")],
        collapse = " + ") %>% 
  sprintf("~ X * (%s)", .) %>%
  as.formula()

system.time(
  fit_cvam <- cvam(formula, data = df_freq,
                   freq = Freq,
                   control = list(startValJitter = 0.1)
))
Note: Estimate at or near boundary
Estimated variances may be unreliable
   user  system elapsed 
  0.029   0.001   0.031 
summary(fit_cvam)
~ X * (A + B + C + D)

Prior:
      Flattening frequency = 0
Total nuggets + flattening = 0
              Ridge factor = 0
          Intensity factor = 1

Sample size:
              total N in supplied data = 428
N from supplied data used in model fit = 428
           prior effective sample size =   0

Degrees of freedom:
    patterns of coarsened data = 16
  cells in complete-data table = 32
cells without latent variables = 16
         structural zero cells =  0
   parameters in Poisson model = 10
                            df =  6

Starting values:
default, center
jitter SD = 0.100000

EM algorithm:
Converged at iteration 17
Gradient length = 0.000002

  Final logP = 1535.422
Final loglik = 1535.422

Estimates from EM, with Hessian-based SEs
                coef       SE zstat   pval
(Intercept) -12.8057 604.0550 -0.02 0.9831
X1           -3.3879 604.0550 -0.01 0.9955
A1           -4.0673 326.4914 -0.01 0.9901
B1            0.7533   0.1018  7.40 0.0000
C1            4.3829 370.1195  0.01 0.9906
D1           -4.2866 348.2777 -0.01 0.9902
X1:A1        -5.8097 326.4914 -0.02 0.9858
X1:B1        -0.8964   0.1019 -8.80 0.0000
X1:C1        -5.5574 370.1195 -0.02 0.9880
X1:D1        -5.5040 348.2777 -0.02 0.9874
what <- paste0("~", LETTERS[1:4], "|X") |> sapply(as.formula)
est_indep <- cvamEstimate(c(~X, what), fit_cvam)
est_indep
Estimates and SE's from EM, linearized
~ X
  X   prob     SE prob.lower prob.upper
1 1 0.5401 0.0242     0.4924     0.5870
2 2 0.4599 0.0242     0.4130     0.5076
~ A | X
  X A   prob     SE prob.lower prob.upper
1 1 1 0.0000 0.0000     0.0000     1.0000
2 1 2 1.0000 0.0000     0.0000     1.0000
3 2 1 0.9703 0.0131     0.9304     0.9876
4 2 2 0.0297 0.0131     0.0124     0.0696
~ B | X
  X B   prob     SE prob.lower prob.upper
1 1 1 0.4290 0.0327     0.3665     0.4938
2 1 2 0.5710 0.0327     0.5062     0.6335
3 2 1 0.9644 0.0132     0.9272     0.9829
4 2 2 0.0356 0.0132     0.0171     0.0728
~ C | X
  X C   prob    SE prob.lower prob.upper
1 1 1 0.0871 0.019     0.0564     0.1323
2 1 2 0.9129 0.019     0.8677     0.9436
3 2 1 1.0000 0.000     0.0000     1.0000
4 2 2 0.0000 0.000     0.0000     1.0000
~ D | X
  X D   prob   SE prob.lower prob.upper
1 1 1 0.0000 0.00     0.0000     1.0000
2 1 2 1.0000 0.00     0.0000     1.0000
3 2 1 0.9195 0.02     0.8706     0.9509
4 2 2 0.0805 0.02     0.0491     0.1294

(Note that the Hessian-based se’s cannot be trusted. We can look at the estimated se’s in the marginal probability tables above, or use cvam’s MCMC option to obtain standard errors on the loglinear parmeters.)

Now we update the model with the local dependence \(B \times C\), held equal across classes.

formula_ld <- update(formula, ~ . + B:C)

system.time(
  fit_cvam_ld <- 
    cvam(formula_ld, data = df_freq, freq = Freq,
         control = list(startValJitter = 0.05)
))
Note: Estimate at or near boundary
Estimated variances may be unreliable
   user  system elapsed 
  0.014   0.000   0.016 
summary(fit_cvam_ld)
~ X + A + B + C + D + X:A + X:B + X:C + X:D + B:C

Prior:
      Flattening frequency = 0
Total nuggets + flattening = 0
              Ridge factor = 0
          Intensity factor = 1

Sample size:
              total N in supplied data = 428
N from supplied data used in model fit = 428
           prior effective sample size =   0

Degrees of freedom:
    patterns of coarsened data = 16
  cells in complete-data table = 32
cells without latent variables = 16
         structural zero cells =  0
   parameters in Poisson model = 11
                            df =  5

Starting values:
default, center
jitter SD = 0.050000

EM algorithm:
Converged at iteration 15
Gradient length = 0.000003

  Final logP = 1542.008
Final loglik = 1542.008

Estimates from EM, with Hessian-based SEs
                coef       SE zstat   pval
(Intercept) -12.6997 590.7805 -0.02 0.9828
X1            3.6793 590.7805  0.01 0.9950
A1           -4.0579 339.0420 -0.01 0.9905
B1            0.7157   0.1025  6.98 0.0000
C1            4.1125 345.8069  0.01 0.9905
D1           -4.3085 338.3633 -0.01 0.9898
X1:A1         5.8384 339.0420  0.02 0.9863
X1:B1         0.4674   0.1702  2.75 0.0060
X1:C1         5.3824 345.8069  0.02 0.9876
X1:D1         5.5397 338.3633  0.02 0.9869
B1:C1         0.4649   0.1444  3.22 0.0013
est_locdep <- cvamEstimate(c(~X, what), fit_cvam_ld)
est_locdep
Estimates and SE's from EM, linearized
~ X
  X   prob     SE prob.lower prob.upper
1 1 0.4589 0.0241     0.4121     0.5065
2 2 0.5411 0.0241     0.4935     0.5879
~ A | X
  X A   prob     SE prob.lower prob.upper
1 1 1 0.9724 0.0122     0.9354     0.9885
2 1 2 0.0276 0.0122     0.0115     0.0646
3 2 1 0.0000 0.0000     0.0000     1.0000
4 2 2 1.0000 0.0000     0.0000     1.0000
~ B | X
  X B   prob     SE prob.lower prob.upper
1 1 1 0.9643 0.0133     0.9270     0.9829
2 1 2 0.0357 0.0133     0.0171     0.0730
3 2 1 0.4301 0.0326     0.3677     0.4947
4 2 2 0.5699 0.0326     0.5053     0.6323
~ C | X
  X C   prob     SE prob.lower prob.upper
1 1 1 1.0000 0.0000     0.0000     1.0000
2 1 2 0.0000 0.0000     0.0000     1.0000
3 2 1 0.0888 0.0189     0.0581     0.1335
4 2 2 0.9112 0.0189     0.8665     0.9419
~ D | X
  X D   prob     SE prob.lower prob.upper
1 1 1 0.9215 0.0195     0.8738     0.9521
2 1 2 0.0785 0.0195     0.0479     0.1262
3 2 1 0.0000 0.0000     0.0000     1.0000
4 2 2 1.0000 0.0000     0.0000     1.0000
anova(fit_cvam, fit_cvam_ld)
Model 1: ~ X * (A + B + C + D)
Model 2: ~ X + A + B + C + D + X:A + X:B + X:C + X:D + B:C
  resid.df -2*loglik df change
1        6   -3070.8          
2        5   -3084.0  1 13.171

The local dependence model fits better, but the estimated (conditional) probabilities do not change much.