The rapidly growing number of depressed people increases the burden of clinical diagnosis. Due to the abnormal speech signal of depressed patients, automatic audio-based depression recognition has the potential to become a complementary method for diagnosing. However, recognition performance varies largely with different speech acquisition tasks and classifiers, making results not comparable, and the performance requires further improvement before clinical application. This work extracted high-level statistical acoustic features (prosodic, voice-quality, and spectral features) of 23 depressed patients and 29 healthy subjects under spontaneous pronunciation tasks (interview and picture description) and mechanical pronunciation tasks (story reading and word reading), then applied principal component analysis (PCA) to reduce features dimensions, finally employed multilayer perceptron (MLP) to establish the classification model and compared with traditional classifiers (logistic regression, support vector machine, decision tree, and naive Bayes). The results showed that spontaneous pronunciation induced more significantly discriminative acoustic features and achieved better recognition performance accordingly. And the PCA retained 90% useful information with 50% features. Furthermore, MLP achieved the best performance with the accuracy 0.875 and average F1 score 0.855 under the picture description task. This study provides support for task design and classifier building for audio-based depression recognition, which could assist in mass screening for depression.
More details about this article are available at this link.