Political activism in Greece

Author

DL Oberski

Political activism in ESS

set.seed(202303)

library(tidyverse)
library(broom)
library(haven)
library(poLCA)

Data

Read the data from the European Social Survey, round 4 (Greece).

For each of these survey questions, 1=“Yes” and 2=“No”.

contplt - Contacted politician or government official last 12 months
wrkprty - Worked in political party or action group last 12 months
wrkorg - Worked in another organisation or association last 12 months
badge - Worn or displayed campaign badge/sticker last 12 months
sgnptit - Signed petition last 12 months
pbldmn - Taken part in lawful public demonstration last 12 months
bctprd - Boycotted certain products last 12 months
gndr - Gender
agea - Age of respondent, calculated

ess_greece <- read_csv("https://daob.nl/files/lca/ess_greece.csv.gz") 

ess_greece |> rmarkdown::paged_table()

Sadly, poLCA has no way of dealing with missing values other than “listwise deletion” (na.omit). For later comparability of models with different sets of variables, we create a single dataset without missings.

ess_greece <- na.omit(ess_greece)

Show the data as pattern frequencies.

table(ess_greece) %>% 
  as.data.frame() %>%
  filter(Freq != 0) %>% 
  rmarkdown::paged_table()

Try out models with different number of classes

Create a convenience function that will fit the K-class model to the political participation data.

fitLCA <- function(k) {
  f <- cbind(contplt, wrkprty, wrkorg, badge, 
           sgnptit, pbldmn, bctprd) ~ 1
  
  poLCA(formula = f, data = ess_greece, nclass = k, 
        nrep = 10, verbose = FALSE)
}

Apply the function to successively increasingly classes K = 1, 2, 3, …, 6. (Note: this can take a while!)

MK <- lapply(1:6, fitLCA)

Compare model fit

Possible to look at AIC, BIC, etc.

aic_values <- sapply(MK, `[[`, "aic")
bic_values <- sapply(MK, `[[`, "bic")

plot(seq_along(aic_values), aic_values, type = "b", xlab = "Number of classes", ylab = "AIC", las = 2)

plot(seq_along(aic_values), aic_values, type = "b", xlab = "Number of classes", ylab = "BIC", las = 2)

Select a model

We select the four-class model.

form_activism <- cbind(contplt, wrkprty, wrkorg, 
                       badge, sgnptit, pbldmn, bctprd) ~ 1

fit <- poLCA(form_activism, 
             data = ess_greece, 
             nclass = 4, 
             nrep = 20, verbose = FALSE)

Here is the default plot given by polCA.

plot(fit)

In this case the default plot is still somewhat readable, but in practice it is not the best as data visualizations go. A simple line plot does a better job (in my personal & completely subjective opinion!) and allows you to display confidence intervals to boot. We use tidy from the broom package to extract the results and ggplot to plot.

tidy(fit) %>% # from `broom` package
  filter(outcome == 2) %>% 
  mutate(class = as.factor(class)) %>%
  ggplot(aes(variable, estimate, group = class, color = class)) +
  geom_point() + geom_line() + 
  geom_errorbar(aes(ymin = estimate - 2*std.error, 
                    ymax = estimate + 2*std.error), width = 0.2) +
  theme_bw() + scale_color_brewer(palette = "Set2")

Local fit

library(poLCA.extras)

bvr(fit)

          contplt   wrkprty    wrkorg     badge   sgnptit    pbldmn
wrkprty  3.636971                                                  
wrkorg   5.427425  5.554144                                        
badge    9.174424 11.568911  7.702777                              
sgnptit  3.027574  4.637158  4.743767  5.002371                    
pbldmn   2.748862  5.232943  6.068687  5.147027  3.224378          
bctprd   3.042264  4.717105  5.596894  5.953936  2.472958  3.292210

bootstrap_bvr_pvals(form_activism, fit, ess_greece, R = 200)

        contplt wrkprty wrkorg badge sgnptit pbldmn
wrkprty       1                                    
wrkorg        1       1                            
badge         1       1      1                     
sgnptit       1       1      1     1               
pbldmn        1       1      1     1       1       
bctprd        1       1      1     1       1      1

Classification quality

Create a data frame with the posterior class memberships and predicted class has the actual classification (predclass is the “modal assignment”)

Use the four-class model as the selected model

posteriors <- data.frame(post = fit$posterior,
                         predclass = fit$predclass)

classification_table <- posteriors %>% 
  group_by(predclass) %>% 
  summarize(across(starts_with("post."), ~ sum(.x)))

classification_table <- classification_table[,-1] |> as.matrix()

# Adopt the notation X=true latent class, W=assigned class
colnames(classification_table) <- paste0("X=", 1:4)
rownames(classification_table) <- paste0("W=", 1:4)

classification_table %>% round(1)

     X=1    X=2  X=3  X=4
W=1 19.8    0.0  1.0  0.2
W=2  0.0 1822.0 34.8 11.1
W=3  1.1    7.5 87.4  3.0
W=4  1.4    4.0  8.6 60.1

With column proportions:

classification_table |>
  prop.table(2) |> 
  round(3)

      X=1   X=2   X=3   X=4
W=1 0.887 0.000 0.008 0.003
W=2 0.000 0.994 0.264 0.149
W=3 0.051 0.004 0.663 0.040
W=4 0.062 0.002 0.065 0.808

Entropy \(R^2\) and classification errors

Calculate classification errors from classification table:

1 - sum(diag(classification_table)) / sum(classification_table)

[1] 0.03524703

Entropy \(R^2\):

entropy <- function(p) sum(-p * log(p))

error_prior <- entropy(fit$P) # Class proportions
error_post <- mean(apply(fit$posterior, 1, entropy))
(R2_entropy  <- (error_prior - error_post) / error_prior) # 0.741

[1] 0.7410987

Model with covariates

Now we fit the four-class model, but include covariates that predict the class membership. Class membership is predicted by gender and a quadratic age effect.

We also use the results from the model without covariates as starting values for the solution.

This is where the analyzed data would have been different if we had not already deleted all cases with at least one missing value above using na.omit. In practice this may lead to trouble, especially when there are many variables.

form_activism <- cbind(contplt, wrkprty, wrkorg, 
                       badge, sgnptit, pbldmn, bctprd) ~ 
  gndr + agea + I(agea^2)

ess_greece_poly <- ess_greece %>% 
  mutate(agea = scale(agea))

fit_covariates <-  
  poLCA(form_activism, 
        data = ess_greece_poly, nclass = 4, 
        probs.start = fit$probs, 
        verbose = FALSE, nrep = 50, maxiter = 3e3)

Warning in sqrt(diag(VCE.beta)): NaNs produced

Warning in sqrt(diag(VCE.mix)): NaNs produced

The results now include multinomial regression coefficients in a model predicting class membership.

fit_covariates

Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$contplt
           Pr(1)  Pr(2)
class 1:  0.0521 0.9479
class 2:  0.5629 0.4371
class 3:  0.0857 0.9143
class 4:  0.7123 0.2877

$wrkprty
           Pr(1)  Pr(2)
class 1:  0.0011 0.9989
class 2:  0.5189 0.4811
class 3:  0.0046 0.9954
class 4:  0.5283 0.4717

$wrkorg
           Pr(1)  Pr(2)
class 1:  0.0041 0.9959
class 2:  0.6616 0.3384
class 3:  0.0537 0.9463
class 4:  0.2031 0.7969

$badge
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.4844 0.5156
class 3:  0.0000 1.0000
class 4:  0.3946 0.6054

$sgnptit
           Pr(1)  Pr(2)
class 1:  0.0009 0.9991
class 2:  0.8837 0.1163
class 3:  0.1589 0.8411
class 4:  0.0000 1.0000

$pbldmn
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.6053 0.3947
class 3:  0.2215 0.7785
class 4:  0.2538 0.7462

$bctprd
           Pr(1)  Pr(2)
class 1:  0.0686 0.9314
class 2:  0.7581 0.2419
class 3:  0.4770 0.5230
class 4:  0.2418 0.7582

Estimated class population shares 
 0.7934 0.0288 0.1345 0.0433 
 
Predicted class memberships (by modal posterior prob.) 
 0.8589 0.0262 0.0776 0.0373 
 
========================================================= 
Fit for 4 latent classes: 
========================================================= 
2 / 1 
            Coefficient  Std. error  t value  Pr(>|t|)
(Intercept)    -0.66285     0.02401  -27.605     0.000
gndr           -0.69438     0.22021   -3.153     0.002
agea           -0.04689     0.01964   -2.387     0.019
I(agea^2)       0.00020     0.00029    0.701     0.485
========================================================= 
3 / 1 
            Coefficient  Std. error  t value  Pr(>|t|)
(Intercept)     0.40334     0.02398   16.816     0.000
gndr           -0.41884     0.20307   -2.063     0.042
agea           -0.04345     0.01840   -2.361     0.020
I(agea^2)       0.00016     0.00025    0.624     0.534
========================================================= 
4 / 1 
            Coefficient  Std. error  t value  Pr(>|t|)
(Intercept)    -0.17721     0.04648   -3.813     0.000
gndr           -0.84495     0.24730   -3.417     0.001
agea           -0.06649     0.01986   -3.349     0.001
I(agea^2)       0.00066     0.00024    2.730     0.008
========================================================= 
number of observations: 2062 
number of estimated parameters: 40 
residual degrees of freedom: 87 
maximum log-likelihood: -2784.601 
 
AIC(4): 5649.201
BIC(4): 5874.459
X^2(4): 148.8383 (Chi-square goodness of fit) 
 
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND 
 
ALERT: estimation algorithm automatically restarted with new initial values

The solution may have changed now that covariates are included.

tidy(fit_covariates) %>% 
  filter(outcome == 2) %>% 
  mutate(class = as.factor(class)) %>%
  ggplot(aes(variable, estimate, group = class, color = class)) +
  geom_point() + geom_line() + 
  geom_errorbar(aes(ymin = estimate - 2*std.error, 
                    ymax = estimate + 2*std.error), width = 0.2) +
  theme_bw() + scale_color_brewer(palette = "Set2")

Effects plot

We can easily plot the results of the multinomial model using the effects library.

library(effects)

effect_age <- predictorEffects(fit_covariates, ~agea*gndr)
plot(effect_age, lines=list(multiline=TRUE))

Three-step method using hand-rolled BCH weights

M <- prop.table(classification_table, 2) 
Mi <- solve(M) # Weight matrix
K <- ncol(M) # Number of classes 

# The data used by poLCA (may differ from raw due to missings):
dat_used_by_fit <- ess_greece
# Assigned class membership (by default uses modal assignment):
dat_used_by_fit$W <- fit$predclass 
dat_used_by_fit$id_obs <- seq(NROW(dat_used_by_fit))

# Create expanded dataset with K rows for each observation:
ess_greece_expanded <- replicate(K, dat_used_by_fit, simplify = FALSE) |> 
  reduce(rbind)

# Each of the replicated rows will correspond to one possible
#.  value of the true latent class:
ess_greece_expanded$X <- rep(1:K, each = NROW(dat_used_by_fit))

# Now we assign the BCH weights based on the 
#.   inverse- misclassification matrix Mi
ess_greece_expanded <- ess_greece_expanded %>%
  mutate(w_bch = Mi[cbind(W, X)]) 

# Show a few rows of our constructed data:
ess_greece_expanded %>% arrange(id_obs) %>% head()

# A tibble: 6 × 13
  contplt wrkprty wrkorg badge sgnptit pbldmn bctprd  gndr  agea     W id_obs
    <dbl>   <dbl>  <dbl> <dbl>   <dbl>  <dbl>  <dbl> <dbl> <dbl> <int>  <int>
1       2       2      2     2       2      2      1     2    37     2      1
2       2       2      2     2       2      2      1     2    37     2      1
3       2       2      2     2       2      2      1     2    37     2      1
4       2       2      2     2       2      2      1     2    37     2      1
5       2       2      2     2       2      2      1     2    26     2      2
6       2       2      2     2       2      2      1     2    26     2      2
# … with 2 more variables: X <int>, w_bch <dbl>

The only library I was able to find that is both reliable and also does multinomial regression allowing negative weights is glmnet.

library(glmnet)

# We use glmnet with our BCH weights, w_bch, pretending X
#.  is observed. Set alpha and lambda =
fit_glmnet <- with(
  ess_greece_expanded %>% mutate(age_sq = agea^2),
  glmnet(cbind(gndr, agea, age_sq), X, weights = w_bch, 
         family = "multinomial", alpha = 0, lambda = 0))

# The estimted coefficients
fit_glmnet |> 
  coef() |> 
  reduce(cbind) |>
  round(4)

4 x 4 sparse Matrix of class "dgCMatrix"
            s0      s0      s0      s0
       -0.8129  2.2496 -1.1247 -0.3119
gndr    0.0634  0.2132 -0.0632 -0.2134
agea   -0.0081 -0.0256  0.0566 -0.0228
age_sq  0.0001  0.0003 -0.0008  0.0004

# effects library does not work with glmnet so we have to roll
#.  our own (again)
ef <- ess_greece %>% 
  summarize(agea = sort(unique(agea)), 
            age_sq = agea^2,
            gndr = 1.5) %>% 
  bind_cols(predict(fit_glmnet, newx=as.matrix(.), 
                   type = "response")[,,1])

names(ef)[4:7] <- paste0("X", 1:4)

# Show the effects plot
ef %>% 
  pivot_longer(X1:X4, names_to = "Class", values_to = "Prob") %>%
  ggplot(aes(agea, Prob, colour = Class, group = Class)) +
  geom_line() + theme_bw() + 
  scale_color_brewer(palette = "Set2")

It gives results similar to those obtained with the one-step method.

SessionInfo

sessionInfo()

R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur ... 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] glmnet_4.1-4         Matrix_1.5-1         effects_4.2-2       
 [4] carData_3.0-5        poLCA.extras_0.1.0   poLCA_1.6.0.1       
 [7] MASS_7.3-58.1        scatterplot3d_0.3-42 haven_2.5.1         
[10] broom_1.0.1          forcats_0.5.2        stringr_1.5.0       
[13] dplyr_1.0.9          purrr_0.3.4          readr_2.1.2         
[16] tidyr_1.2.0          tibble_3.1.8         ggplot2_3.3.6       
[19] tidyverse_1.3.2     

loaded via a namespace (and not attached):
 [1] nlme_3.1-160        fs_1.6.1            lubridate_1.8.0    
 [4] bit64_4.0.5         insight_0.19.0      RColorBrewer_1.1-3 
 [7] httr_1.4.5          tools_4.2.2         backports_1.4.1    
[10] utf8_1.2.3          R6_2.5.1            DBI_1.1.3          
[13] colorspace_2.0-3    nnet_7.3-18         withr_2.5.0        
[16] tidyselect_1.1.2    bit_4.0.5           curl_5.0.0         
[19] compiler_4.2.2      cli_3.6.0           rvest_1.0.3        
[22] xml2_1.3.3          labeling_0.4.2      scales_1.2.1       
[25] digest_0.6.31       minqa_1.2.4         rmarkdown_2.16     
[28] pkgconfig_2.0.3     htmltools_0.5.4     lme4_1.1-30        
[31] dbplyr_2.2.1        fastmap_1.1.0       htmlwidgets_1.6.1  
[34] rlang_1.0.6         readxl_1.4.1        rstudioapi_0.14    
[37] shape_1.4.6         farver_2.1.1        generics_0.1.3     
[40] jsonlite_1.8.4      vroom_1.5.7         googlesheets4_1.0.1
[43] magrittr_2.0.3      Rcpp_1.0.10         munsell_0.5.0      
[46] fansi_1.0.4         lifecycle_1.0.3     stringi_1.7.12     
[49] yaml_2.3.7          grid_4.2.2          parallel_4.2.2     
[52] crayon_1.5.1        lattice_0.20-45     splines_4.2.2      
[55] hms_1.1.2           knitr_1.40          pillar_1.8.1       
[58] boot_1.3-28         codetools_0.2-18    reprex_2.0.2       
[61] glue_1.6.2          evaluate_0.16       mitools_2.4        
[64] modelr_0.1.9        foreach_1.5.2       vctrs_0.5.2        
[67] nloptr_2.0.3        tzdb_0.3.0          cellranger_1.1.0   
[70] gtable_0.3.0        assertthat_0.2.1    xfun_0.32          
[73] survey_4.1-1        survival_3.4-0      googledrive_2.0.0  
[76] gargle_1.2.0        iterators_1.0.14    ellipsis_0.3.2