XF-Blog
ProjectMachine LearningdevelopmentAbout
MACHINE LEARNING PAPER NOTE
[Paper Note] TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis

TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis

The goal of TEDi is to address the challenges of long-term motion generation. Traditional methods often suffer from inconsistencies when stitching diffusion-generated clips together, or they face significant latency issues.

TEDi introduces a motion buffer concept: a denoised motion sequence where the frames at the beginning are closest to reality (clean), while the frames at the end are closer to pure noise.

Motion Representation

The model represents motion using the following components for each frame:

Method

During training, TEDi adds temporally varied noise to clean motion sequences. Each frame is assigned a different noise level.

Direct Motion Prediction

Unlike standard diffusion models that predict noise, TEDi predicts the motion itself. This allows the use of regularization and physical losses directly in the motion space.

Loss Functions

Inference Process

At the start of inference, the motion buffer is initialized with a sequence where noise variance increases over time.

Key Capabilities

Controllable Synthesis

TEDi supports various motion tasks, including:

Results and Comparisons

The model is trained for 500k iterations on sequences of 500 frames at 30 fps. It excels at generating very long sequences compared to existing baselines:

reference