This work proposes an affective body expression recognition framework that fuses temporal and spatial features from human body movements. The framework combines a body expression energy model, multiscale SPD-based representation learning, and attentional temporal-spatial fusion to capture interpretable movement cues for affect recognition. Evaluations across multiple datasets show robust performance, with classification accuracy exceeding 90% on four datasets. The results demonstrate that combining temporal dynamics and spatial body-expression structure can improve both recognition accuracy and interpretability in affective computing.
More details about this article are available at this link.