Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • c7201025/ai_personality_index
1 result
Show changes
Commits on Source (2)
......@@ -29,7 +29,7 @@ getDataPartition <- function(p) {
}
#create datasets and normalize outcome for interpretability
set.seed(61543)
set.seed(61543) # TO DO: seed keras
inTrain <- getDataPartition(0.75)
scales_train <- scales[inTrain,]
......@@ -240,3 +240,95 @@ The first box-plot compares the coefficient obtained from the regression of the
Note, however, that part of this variation may derive from the resampling scheme itself. Because we divided the data into a training and test set, the synthetic regression in the test set gives coefficients whose variability is increased by the small size of the test sample, while the regression on the real coefficients is conducted on the much larger whole sample. In the second boxplot, regression coefficients on the real scores were computed on the bootstrap resampled data. One can see that also the coefficients of regression on real scores show considerable variation when assessed in the smaller test sets. In the final box-plot, we display the variation of the synthetic and real regressions side-by-side. One can see that the variation of the coefficients follow a similar pattern in the synthetic and real regressions. The large variation in the occupation coefficients can be attributed to the small size of the association, and is present in both the synthetic and in the observed models.
The table shows the summary data of the absolute error between the synthetic coefficients and those from the regression on the observed personality scores on the whole dataset (corresponding to the data of the first box-plot). The prediction of personality functioning in association with traumas, which was our main clinical outcome, was correct up to about 8%.
## Synthetic personality functioning from the NEO subscales
It has been recently noted that criterion A of the AMPD is associated with neuroticism (McCabe, Oltmanns, & Widiger 2021, Kerber et al. 2024). Here, we are predicting personality functioning from the ICD-11. However, these two indices of personality functioning are closely related. We now compose a synthetic personality functioning score from the five subscales of the NEO-FFI using a linear model and the same setup as in the previous section.
```{r, echo = FALSE}
set.seed(61543) #TO DO: figure out how to seed keras
num_epochs <- 90
# A function to partition the data and fit the model.
getPDSICDfrom5 <- function() {
inTrain <- getDataPartition(0.75)
all_data <- as.matrix(select(scales, NEOFFI01:NEOFFI60))
train_data <- all_data[inTrain,]
train_target <- scale(scales$PDSICD11[inTrain])
test_data <- all_data[-inTrain,]
result <- lm(PDSICD11.z ~ NEOneuro + NEOextra + NEOvertr + NEOgew + NEOoff,
data = scales[inTrain,])
PDSICD11synth <- predict(result, scales[-inTrain,])
list(PDSICD11synth, inTrain)
}
idx.adj.r.squared <- c()
mdcoefs <- c()
coefs <- c()
aes <- c()
for (i in 1 : 60) {
#compute synthetic PDS score
c(PDSICD11synth, inTrain) %<-% getPDSICDfrom5()
PDSICD11.zz <- scale(scales[-inTrain,"PDSICD11"])
fit <- lm(scale(PDSICD11synth) ~ PDSICD11.zz)
idx.adj.r.squared <- c(idx.adj.r.squared, summary(fit)$adj.r.squared)
#use synthetic score in fit
fit <- lm(scale(PDSICD11synth) ~ log(ACEall+1) + Age.z + Sex + Occupation, data = scales[-inTrain,])
mdcoefs <- rbind(mdcoefs, coef(fit))
#fit with real PDS score in same test subset of data
fit <- lm(PDSICD11.zz ~ log(ACEall+1) + Age.z + Sex + Occupation, data = scales[-inTrain,])
coefs <- rbind(coefs, coef(fit))
#compute abs error between synth and real coefs
aes <- rbind(aes, abs(mdcoefs - coefs))
colnames(aes) <- colnames(coefs)
cat(" ", i, " ")
}
```
```{r}
mddata <- mdcoefs |> as.data.frame() |> select(-`(Intercept)`) |>
pivot_longer(cols = everything(), names_to = "predictor", values_to = "coefficient")
mddata$type <- "asynthetic"
tdata <- coefs |> as.data.frame() |> select(-`(Intercept)`) |>
pivot_longer(cols = everything(), names_to = "predictor", values_to = "coefficient")
tdata$type = "resampled"
fit <- lm(PDSICD11.z ~ log(ACEall+1) + Age.z + Sex + Occupation, data = scales)
trcoef <- coef(fit)
trdata <- trcoef |> t() |> as.data.frame() |> select(-`(Intercept)`) |>
pivot_longer(cols = everything(), names_to = "predictor", values_to = "coefficient")
#synthetic data plot
ggplot(mddata, aes(x = predictor, y = coefficient)) +
geom_boxplot(aes(group = predictor)) +
geom_jitter(width = 0.05, alpha = 0.3) +
geom_point(data = trdata, color = "red", size = 3) +
theme_minimal()
#real data plot (resampled)
ggplot(tdata, aes(x = predictor, y = coefficient)) +
geom_boxplot(aes(group = predictor), color = "darkred", alpha = 0.5) +
geom_jitter(width = 0.05, color = "darkred", alpha = 0.3) +
geom_point(data = trdata, color = "red", size = 3) + theme_minimal()
#aes's
aedata <- aes |> as.data.frame() |> select(-`(Intercept)`) |>
pivot_longer(cols = everything(), names_to = "predictor", values_to = "coefficient")
ggplot(aedata, aes(x = predictor, y = coefficient)) +
geom_boxplot(aes(group = predictor), color = "darkblue", alpha = 0.5) +
geom_point(color = "darkblue", alpha = 0.3) +
theme_minimal()
print(aedata |> group_by(predictor) |> summarize(avg = mean(coefficient), se = sd(coefficient) / sqrt(nrow(aedata))))
#comparison plot
ggplot(rbind(mddata, tdata), aes(x = predictor, y = coefficient, color = type)) + geom_boxplot(aes(group = paste(predictor, type), color = type)) +
theme_minimal() + scale_color_viridis_d(end = 0.5)
```
As one can see, the accuracy of the coefficients obtained from this synthetic index was always inferior to that of the feedforward neural network. Relative to the linear model, the artificial network improved accuracy of coefficients from over 15% (traumas) to 60% (sex). Age gave accuracy values in-between. The accuracy of occupation varied, but it was lower than the other predictors to start with.
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.