PainDiffusion: Learning To Express Pain

Ritsumeikan University

Abstract

Accurate pain expression synthesis is essential for improving clinical training and human-robot interaction. Current Robotic Patient Simulators (RPSs) lack realistic pain facial expressions, limiting their effectiveness in medical training. In this work, we introduce PainDiffusion, a generative model that synthesizes naturalistic facial pain expressions. Unlike traditional heuristic or autoregressive methods, PainDiffusion operates in a continuous latent space, ensuring smoother and more natural facial motion while supporting indefinite-length generation via diffusion forcing. Our approach incorporates intrinsic characteristics such as pain expressiveness and emotion, allowing for personalized and controllable pain expression synthesis. We train and evaluate our model using the BioVid HeatPain Database. Additionally, we integrate PainDiffusion into a robotic system to assess its applicability in real-time rehabilitation exercises. Qualitative studies with clinicians reveal that PainDiffusion produces realistic pain expressions, with a 31.2% ± 4.8% preference rate against ground-truth recordings. Our results suggest that PainDiffusion can serve as a viable alternative to real patients in clinical training and simulation, bridging the gap between synthetic and naturalistic pain expression.

Methodology

Our methodology consists of three key components:

  1. Continuous Latent Space: We utilize a diffusion-based approach that operates in a continuous space, enabling smoother and more natural facial motion synthesis.
  2. Diffusion Forcing: A novel technique that allows for indefinite-length generation while maintaining temporal coherence.
  3. Controllable Generation: Integration of pain expressiveness and emotion parameters for personalized expression synthesis.

Output Comparison

FSQ-VAE Autoregressive baseline

PainDiffusion w/ Full-seq Diffusion

Pain Diffusion w/ Diffusion Forcing

Ground Truth

Stimuli Signal

This section demonstrates the comparative performance of different approaches:

  • FSQ-VAE Autoregressive baseline: Traditional sequential generation approach
  • PainDiffusion w/ Full-seq Diffusion: Our base diffusion model without forcing
  • PainDiffusion w/ Diffusion Forcing: Our complete model with continuous generation capability
  • Ground Truth: Actual human expressions from the dataset

The videos showcase randomly sampled examples from our validation set, demonstrating the qualitative improvements achieved by our method.

Video Index: 1

Controllability Experiment

This interactive demo showcases our model's ability to generate controlled facial expressions based on three key parameters:

  • Stimuli level (1-4): Represents the intensity of pain stimulus
  • Expressiveness index (5-11): Controls how strongly the pain is expressed
  • Emotion index: Modulates the emotional component of the expression

Use the controls below to explore different combinations and observe how they affect the generated expressions.

Stimuli Signal

Predicted Facial Expression

Stimuli level

Expressiveness index

Emotion index

BibTeX

@article{dam2024paindiffusion,
        title={PainDiffusion: Can robot express pain?}, 
        author={Quang Tien Dam and Tri Tung Nguyen Nguyen and Dinh Tuan Tran and Joo-Ho Lee},
        year={2024},
        url={https://arxiv.org/abs/2409.11635}, 
      }