This project develops explainable models for recognizing affective states from full-body movement. It combines temporal-spatial feature fusion, multi-scale spatiotemporal encoding, and language-model-based interpretation to improve both recognition performance and human-readable reasoning.
