Compare revisions

Roberto Viviani · Roberto Viviani · cc76468c · cc76468c · cc76468c
--- a/Ai_personality_index_FWF/Application_FWF_Feb2024.Rmd
+++ b/Ai_personality_index_FWF/Application_FWF_Feb2024.Rmd
@@ -29,7 +29,7 @@ getDataPartition <- function(p) {
 }

 #create datasets and normalize outcome for interpretability
-set.seed(61543)
+set.seed(61543)  # TO DO: seed keras
 inTrain <- getDataPartition(0.75)

 scales_train <- scales[inTrain,]
@@ -240,3 +240,95 @@ The first box-plot compares the coefficient obtained from the regression of the
 Note, however, that part of this variation may derive from the resampling scheme itself. Because we divided the data into a training and test set, the synthetic regression in the test set gives coefficients whose variability is increased by the small size of the test sample, while the regression on the real coefficients is conducted on the much larger whole sample. In the second boxplot, regression coefficients on the real scores were computed on the bootstrap resampled data. One can see that also the coefficients of regression on real scores show considerable variation when assessed in the smaller test sets. In the final box-plot, we display the variation of the synthetic and real regressions side-by-side. One can see that the variation of the coefficients follow a similar pattern in the synthetic and real regressions. The large variation in the occupation coefficients can be attributed to the small size of the association, and is present in both the synthetic and in the observed models. 

 The table shows the summary data of the absolute error between the synthetic coefficients and those from the regression on the observed personality scores on the whole dataset (corresponding to the data of the first box-plot). The prediction of personality functioning in association with traumas, which was our main clinical outcome, was correct up to about 8%.
+
+
+## Synthetic personality functioning from the NEO subscales
+
+It has been recently noted that criterion A of the AMPD is associated with neuroticism (McCabe, Oltmanns, & Widiger 2021, Kerber et al. 2024). Here, we are predicting personality functioning from the ICD-11. However, these two indices of personality functioning are closely related. We now compose a synthetic personality functioning score from the five subscales of the NEO-FFI using a linear model and the same setup as in the previous section.
+
+```{r, echo = FALSE}
+set.seed(61543)  #TO DO: figure out how to seed keras
+num_epochs <- 90
+
+# A function to partition the data and fit the model.
+getPDSICDfrom5 <- function() {
+  inTrain <- getDataPartition(0.75)
+  all_data <- as.matrix(select(scales, NEOFFI01:NEOFFI60))
+  train_data <- all_data[inTrain,]
+  
+  train_target <- scale(scales$PDSICD11[inTrain])
+  test_data <- all_data[-inTrain,]
+  
+  result <- lm(PDSICD11.z ~ NEOneuro + NEOextra + NEOvertr + NEOgew + NEOoff,
+                data = scales[inTrain,])
+  PDSICD11synth <- predict(result, scales[-inTrain,])
+  list(PDSICD11synth, inTrain)
+}
+
+idx.adj.r.squared <- c()
+mdcoefs <- c()
+coefs <- c()
+aes <- c()
+for (i in 1 : 60) {
+  #compute synthetic PDS score
+  c(PDSICD11synth, inTrain) %<-% getPDSICDfrom5()
+  PDSICD11.zz <- scale(scales[-inTrain,"PDSICD11"])
+  fit <- lm(scale(PDSICD11synth) ~ PDSICD11.zz)
+  idx.adj.r.squared <- c(idx.adj.r.squared, summary(fit)$adj.r.squared)
+  
+  #use synthetic score in fit
+  fit <- lm(scale(PDSICD11synth) ~ log(ACEall+1) + Age.z + Sex + Occupation, data = scales[-inTrain,])
+  mdcoefs <- rbind(mdcoefs, coef(fit))
+
+  #fit with real PDS score in same test subset of data
+  fit <- lm(PDSICD11.zz ~ log(ACEall+1) + Age.z + Sex + Occupation, data = scales[-inTrain,])
+  coefs <- rbind(coefs, coef(fit))
+
+  #compute abs error between synth and real coefs
+  aes <- rbind(aes, abs(mdcoefs - coefs))
+  colnames(aes) <- colnames(coefs)
+  cat(" ", i, " ")
+}
+
+```
+```{r}
+mddata <- mdcoefs |> as.data.frame() |> select(-`(Intercept)`) |> 
+  pivot_longer(cols = everything(), names_to = "predictor", values_to = "coefficient")
+mddata$type <- "asynthetic"
+tdata <- coefs |> as.data.frame() |> select(-`(Intercept)`) |> 
+  pivot_longer(cols = everything(), names_to = "predictor", values_to = "coefficient")
+tdata$type = "resampled"
+
+fit <- lm(PDSICD11.z ~ log(ACEall+1) + Age.z + Sex + Occupation, data = scales)
+trcoef <- coef(fit)
+trdata <- trcoef |> t() |> as.data.frame() |> select(-`(Intercept)`) |>
+    pivot_longer(cols = everything(), names_to = "predictor", values_to = "coefficient")
+
+#synthetic data plot
+ggplot(mddata, aes(x = predictor, y = coefficient)) + 
+  geom_boxplot(aes(group = predictor)) +
+  geom_jitter(width = 0.05, alpha = 0.3) + 
+  geom_point(data = trdata, color = "red", size = 3) + 
+  theme_minimal()
+#real data plot (resampled)
+ggplot(tdata, aes(x = predictor, y = coefficient)) + 
+  geom_boxplot(aes(group = predictor), color = "darkred", alpha = 0.5) + 
+  geom_jitter(width = 0.05, color = "darkred", alpha = 0.3) + 
+  geom_point(data = trdata, color = "red", size = 3) + theme_minimal()
+
+#aes's
+aedata <- aes |> as.data.frame() |> select(-`(Intercept)`) |> 
+  pivot_longer(cols = everything(), names_to = "predictor", values_to = "coefficient")
+ggplot(aedata, aes(x = predictor, y = coefficient)) + 
+  geom_boxplot(aes(group = predictor), color = "darkblue", alpha = 0.5) + 
+  geom_point(color = "darkblue", alpha = 0.3) + 
+  theme_minimal()
+print(aedata |> group_by(predictor) |> summarize(avg = mean(coefficient), se = sd(coefficient) / sqrt(nrow(aedata))))
+
+#comparison plot
+ggplot(rbind(mddata, tdata), aes(x = predictor, y = coefficient, color = type)) + geom_boxplot(aes(group = paste(predictor, type), color = type)) + 
+  theme_minimal() + scale_color_viridis_d(end = 0.5)
+
+```
+
+As one can see, the accuracy of the coefficients obtained from this synthetic index was always inferior to that of the feedforward neural network. Relative to the linear model, the artificial network improved accuracy of coefficients from over 15% (traumas) to 60% (sex). Age gave accuracy values in-between. The accuracy of occupation varied, but it was lower than the other predictors to start with.
\ No newline at end of file
--- a/Ai_personality_index_FWF/Database.txt
+++ b/Ai_personality_index_FWF/Database.txt
--- a/Ai_personality_index_FWF/SocioDemographics.txt
+++ b/Ai_personality_index_FWF/SocioDemographics.txt
No results found