Depression recognition using a proposed speech chain model fusing speech production and perception features

Abstract

Audio-based depression recognition is a useful auxiliary tool for early screening, but many existing methods focus mainly on speech perception features and overlook vocal-tract changes. This work proposes a machine speech chain model for depression recognition (MSCDR), which captures text-independent depressive speech representations from speech production to speech perception. Linear predictive coding and Mel-frequency cepstral coefficients are extracted to characterize speech generation and perception, and deep sequential modeling is used to capture intra- and inter-segment depressive features. Experiments on two public datasets show accuracies of 0.77 and 0.86, indicating the complementary value of speech production and perception features for depression analysis.

More details about this article are available at this link.

Next
Previous