PainDiffusion

Abstract

Accurate pain expression synthesis is essential for improving clinical training and human-robot interaction. Current Robotic Patient Simulators (RPSs) lack realistic pain facial expressions, limiting their effectiveness in medical training. In this work, we introduce PainDiffusion, a generative model that synthesizes naturalistic facial pain expressions. Unlike traditional heuristic or autoregressive methods, PainDiffusion operates in a continuous latent space, ensuring smoother and more natural facial motion while supporting indefinite-length generation via diffusion forcing. Our approach incorporates intrinsic characteristics such as pain expressiveness and emotion, allowing for personalized and controllable pain expression synthesis. We train and evaluate our model using the BioVid HeatPain Database. Additionally, we integrate PainDiffusion into a robotic system to assess its applicability in real-time rehabilitation exercises. Qualitative studies with clinicians reveal that PainDiffusion produces realistic pain expressions, with a 31.2% ± 4.8% preference rate against ground-truth recordings. Our results suggest that PainDiffusion can serve as a viable alternative to real patients in clinical training and simulation, bridging the gap between synthetic and naturalistic pain expression.

Methodology

Our methodology consists of three key components:

Continuous Latent Space: We utilize a diffusion-based approach that operates in a continuous space, enabling smoother and more natural facial motion synthesis.
Diffusion Forcing: A novel technique that allows for indefinite-length generation while maintaining temporal coherence.
Controllable Generation: Integration of pain expressiveness and emotion parameters for personalized expression synthesis.

Output Comparison

FSQ-VAE Autoregressive baseline

PainDiffusion w/ Full-seq Diffusion

Pain Diffusion w/ Diffusion Forcing

Ground Truth

Stimuli Signal

This section demonstrates the comparative performance of different approaches:

FSQ-VAE Autoregressive baseline: Traditional sequential generation approach
PainDiffusion w/ Full-seq Diffusion: Our base diffusion model without forcing
PainDiffusion w/ Diffusion Forcing: Our complete model with continuous generation capability
Ground Truth: Actual human expressions from the dataset

The videos showcase randomly sampled examples from our validation set, demonstrating the qualitative improvements achieved by our method.

Video Index: 1

Controllability Experiment

This interactive demo showcases our model's ability to generate controlled facial expressions based on three key parameters:

Stimuli level (1-4): Represents the intensity of pain stimulus
Expressiveness index (5-11): Controls how strongly the pain is expressed
Emotion index: Modulates the emotional component of the expression

Use the controls below to explore different combinations and observe how they affect the generated expressions.

Stimuli Signal

Predicted Facial Expression

Stimuli level

Expressiveness index

Emotion index

Robot Implementation

We implemented our PainDiffusion model on a elbow range-of-motion exercises simulator to demonstrate its real-world applicability. The video below shows an on-screen face expressing pain responses during a simulated nursing assessment scenario.

BibTeX

@article{dam2024paindiffusion,
        title={PainDiffusion: Can robot express pain?}, 
        author={Quang Tien Dam and Tri Tung Nguyen Nguyen and Dinh Tuan Tran and Joo-Ho Lee},
        year={2024},
        url={https://arxiv.org/abs/2409.11635}, 
      }

PainDiffusion: Learning To Express Pain

Abstract

Methodology

Output Comparison

FSQ-VAE Autoregressive baseline

PainDiffusion w/ Full-seq Diffusion

Pain Diffusion w/ Diffusion Forcing

Ground Truth

Stimuli Signal

Controllability Experiment

Stimuli Signal

Predicted Facial Expression

Stimuli level

Expressiveness index

Emotion index

Robot Implementation

BibTeX