PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers

The goal of this paper is to enable characters to walk flexibly on complex terrain while mitigating issues like incorrect contacts or discontinuities.

Current methods usually rely heavily on motion capture data, which requires a large and diverse dataset to be effective. PARC addresses this by using a cyclic workflow:

First, a motion generator synthesizes new terrain traversal motions.
Next, a tracking controller (trained via Reinforcement Learning) refines these motions to ensure they are physically plausible in a simulator.
Finally, the refined motions are added back to the dataset for the next round of motion generation training.

Motion Generator

The motion generator uses a diffusion model with a Transformer encoder architecture. It predicts a motion sequence $X = \{x_1, x_2, ..., x_T\}$ , where each frame $x$ includes:

Root position and rotation
Joint rotations and positions
Contact labels $c \in \{0, 1\}^J$

Conditions
The model is conditioned on several inputs to guide generation:

A height map of the surrounding terrain, encoded using a CNN.
A horizontal target direction $d$ .
The previous 2 frames of motion (optional).

Training Data and Loss

The training uses 0.5-second motion clips (13 future frames and 2 past frames).
A future frame is randomly selected to set the target direction.
The terrain is represented by a 31x31 grid centered at the initial position.

The loss function consists of:

Reconstruction loss and velocity loss.
Joint consistency loss: ensuring predicted joint positions match those calculated via forward kinematics from joint rotations.
Terrain penetration loss: preventing joints from clipping through the ground.

To prevent the model from overfitting to previous frames and ignoring the terrain, there is a 10% chance during training to use random terrain and disable the reconstruction loss.

Additionally, inspired by classifier-free guidance, the model interpolates between predictions with and without previous frame conditions during inference. This balances motion continuity with terrain compliance.

Tracking Controller

The tracking controller uses Reinforcement Learning (RL) to execute generated motions within a physics simulator, which naturally corrects the motion from a physical perspective.

Agent Configuration

Input: Proprioceptive state, local terrain observations, and future target frames from the reference motion.
Output: Target joint positions for PD controllers, which are then converted into joint torques.

Reward Design
The reward function encourages the agent to match the reference motion’s joint positions, velocities, poses, and root orientation.

Specifically, the agent is rewarded for matching the contact labels of the reference motion. This prevents the agent from using incorrect body parts as support points. While there is no explicit reward for balance, forcing the agent to match the reference poses encourages it to find physically stable ways to achieve those poses, effectively performing a correction of the original generated motion.

The policy is implemented as a 3-layer MLP and trained using Proximal Policy Optimization (PPO).

reference

Human dynamics from monocular video with dynamic camera movements
Self-Correcting Self-Consuming Loops for Generative Model Training
Human Motion Diffusion Model
Deepmimic: Example-guided deep reinforcement learning of physics-based character skills