Towards Embodied AI with MuscleMimic: Unlocking full-body musculoskeletal motor learning at scale

An open-source framework for scalable motion imitation learning with physiologically realistic, muscle-actuated humanoids.

Human motor control emerges from hundreds of muscles coordinating in real time, yet most simulated humanoids bypass this complexity entirely, relying on torque-driven joints that ignore the underlying neuromotor dynamics. While musculoskeletal (MSK) models built from cadaver and MRI data have brought us closer to biological realism, they’ve been held back by computational cost (training takes days to weeks on CPUs) and limited to static validation or single-task evaluations. Complex full-body models capable of diverse, dynamic movement have remained largely unexplored and only partially validated.

MuscleMimic is an open-source framework that changes this. By combining full-body muscle-actuated humanoids with massively parallel GPU simulation via MuJoCo Warp, we achieve order-of-magnitude training speedups, enabling a single generalist policy to learn thousands of human motions under full muscular control. The framework provides two validated musculoskeletal embodiments, a retargeting pipeline that maps SMPL-format motion capture onto our models, and pretrained checkpoints that can be fine-tuned for subject-specific biomechanical analysis. Such capabilities are essential for realizing neuromechanical computational models that bridge brain, body, and behavior.

Preprint of this work will be released soon

Code, checkpoints, and retargeted dataset: github.com/amathislab/musclemimic

Try out musculoskeletal models: pip install musclemimic_models

What It Looks Like

Walking & Running
Backwards Walking
Walking & Turning
Dancing
Lifting Box
Waving
Drinking Water
Jumping Jack

Imitation Learning Results

We evaluate the generalist policy on the KINESIS motion dataset using early termination as the primary quality signal: an episode terminates if the mean site deviation across 17 mimic sites relative to the root (pelvis) exceeds 0.3 m, or if the pelvis deviates from the reference by more than 0.5 m in world coordinates. We use relative rather than absolute position error because muscle activation dynamics introduce temporal delays that prevent the musculoskeletal model from perfectly tracking reference velocities.

Metric GMR-Fit (Train) GMR-Fit (Test)
Early termination rate 0.108 0.129
Joint position error 0.129 0.130
Joint velocity error 0.542 0.545
Root position error 0.079 0.138
Root yaw error 0.048 0.047
Relative site position error 0.027 0.027
Absolute site position error 0.144 0.146
Mean episode length 534.1 528.1
Mean episode return 575.9 569.1

Validation metrics on KINESIS training (972 motions) and testing (108 motions) dataset.

The Models

MuscleMimic introduces two complementary musculoskeletal embodiments for motion learning centered on manipulation or locomotion.

Model Type Joints Muscles DoFs Focus
BimanualMuscle Fixed-base 76 (36*) 126 (64*) 54 (14*) Upper-body manipulation
MyoFullBody Free-root 123 (83*) 416 (354*) 72 (32*) Locomotion and manipulation

* denotes configurations with finger muscles disabled for faster convergence. Joints denote articulated connections; DoFs correspond to independently controllable joint coordinates.

Both models are built upon established MyoSuite components, incorporating MyoArm, MyoLegs, and MyoBack models. BimanualMuscle provides a fixed-root upper-body configuration with 76 joints and 126 Hill-type muscle actuators for bimanual manipulation, with collision detection between the thorax and both arms. MyoFullBody extends this to a complete 123-joint system with 416 muscles spanning the full kinematic chain from pelvis to fingertips, supporting comprehensive collision detection for contact-rich locomotion and manipulation. During model development, each muscle-tendon moment arm was cross-validated to ensure continuity. In total, around 150 asymmetries and muscle jumps were fixed compared to the original MyoArm and MyoLegs models.

BimanualMuscle model
Visualization of the BimanualMuscle model, viewed from (A) front, (B) back, and (C) side.
MyoFullBody model
Visualization of the MyoFullBody model, viewed from (A) front, (B) back, and (C) side.

Validation Against Human Data

To verify the biomechanical fidelity of our models during dynamic motion, we conduct population-based evaluations on walking and running against human experimental data on joint kinematics, kinetics, and EMG, as suggested in.

Walking

We evaluate on five AMASS walking sequences, comparing against two experimental datasets (treadmill and level walking at 1.2 m/s, nine participants each of two datasets). Simulated kinematics achieve a mean correlation of 0.92 and 0.94 for treadmill and level walking respectively, with 0.71 for joint dynamics. The lower-limb joints exhibit stereotyped gait patterns consistent with experimental literature: hip flexion at contact progressing to extension before toe-off, knee flexion during early stance for impact absorption, and rapid ankle plantarflexion for propulsion at toe-off.

Walking gait analysis
Representative joint kinematics of the left lower limb (hip, knee, ankle, and foot) over a full walking gait cycle, comparing human experimental data and MyoFullBody-generated motion. Human walking data were collected on a treadmill at 1.2 m/s (orange) and level walking with a mean velocity of 1.2 m/s (purple). Simulated results are evaluated on five AMASS walking sequences, aligned by ground reaction force onset and truncated to one gait cycle.

Running

After fine-tuning the 10-billion-step checkpoint with an additional 50 million steps on 10 running motions, we compare against treadmill-running data at 1.75 m/s from Wang et al.. Hip, knee, and ankle flexion over one gait cycle achieve a mean correlation of 0.79, with kinematic patterns consistent with the higher-energy demands of running, notably stronger plantarflexion at push-off and more pronounced knee flexion during swing.

Running gait analysis
Representative joint kinematics of the left lower limb (hip, knee, ankle, and foot) over a full running gait cycle, comparing human experimental data and MyoFullBody-generated motion. Human running data were collected on a treadmill at 1.8 m/s. Simulated results aligned by ground reaction force onset and truncated to one gait cycle.

EMG

We compare synthetic muscle activations against EMG recordings from two human walking datasets. For three representative leg muscles (Vastus Medialis, Gastrocnemius Lateralis, Soleus Medialis), the synthetic activations capture the main patterns of human EMG signals, achieving correlation values comparable to static optimization. Results are shown alongside inter-subject variability, which represents an upper bound for model-human alignment.

EMG comparison — Boo et al. EMG summary — Wang et al.
(Left) Gait analysis with human data from Boo et al. Individual muscle activation patterns and summary metrics. (Right) Summary metrics with human data from Wang et al.

How It Works

Musculoskeletal Model

Both embodiments use Hill-type muscle actuators following MuJoCo with inelastic tendons, where control signals pass through a first-order nonlinear activation dynamics model that differentiates between activation and deactivation phases. We introduce tunable parameters, including muscle activation time constants and maximum active force per muscle, that can be independently adjusted for upper and lower limbs to accommodate highly dynamic motions. We observed that smaller activation time constants produce faster muscle responses but result in stiffer activations and noticeable jitter in the motion output. In contrast, larger time constants lead to smoother and more stable control, better suited for impulsive behaviors such as jumping, though they deviate further from biologically realistic activation dynamics . The contact geometries consist of capsules and ellipsoids across all body segments, with self-collision explicitly enabled. Both models are carefully fine-tuned to ensure bilateral symmetry in joint constraints, muscle moment arms, and force–length relationships.

Motion Retargeting

Retargeting pipeline
Motion retargeting pipeline. SMPL-format motion capture is first pre-processed via shape fitting and motion scaling, then passed through one of two inverse kinematics branches (MuJoCo Mocap Bodies or Mink-based GMR with equality constraints), and finally post-processed to fix floating and ground penetration artifacts.

We provide two retargeting pipelines that map SMPL-format motion capture data onto the musculoskeletal models. Mocap-Body uses a kinematic body in MuJoCo with a three-stage pipeline: SMPL shape fitting, inverse kinematics, and post-processing to remove artifacts such as floating and ground penetration. GMR-Fit builds on the GMR robotics retargeting framework with our SMPL-fitting stage, enforcing joint constraints and dependencies to produce physiologically realistic trajectories.

Metric Mocap-Body GMR-Fit
Joint limit violation (%) 12.26 0.27
Ground penetration (%) 0.55 0.24
Max penetration (m) 0.002 0.001
Tendon jump rate (%) 30.14 3.20
RMSE (m) 0.039 0.025
Speed per frame (s) 0.076 0.251

GMR-Fit achieves dramatically better joint-limit satisfaction (0.27% vs 12.26% violation) and lower tendon jump rates (3.20% vs 30.14%), while Mocap-Body retains a ~3x speed advantage.

Policy

The policy is an MLP with residual connections that outputs $\pi(a_t \mid s_t)$, a distribution over muscle excitation. The observation $s_t$ includes proprioceptive signals, tendon states, motion targets, and crucially, the previous policy output $a_{t-1}$, making it autoregressive in nature. Both actor and critic use SiLU activations, LayerNorm, and orthogonal initialization. The output is a diagonal Gaussian with learnable state-independent standard deviation.

Policy observation structure
Policy observation structure. The state is decomposed into proprioceptive signals (root height and velocity, joint positions and velocities), tendon states, touch info, mimic site relative positions, and motion phase. A history of 3 stacked states is concatenated with the current goal and future goals at regular lookahead intervals. Each goal is defined by root position and velocity deltas and target mimic site relative positions.

Reward

The reward at each timestep is $r_t = \max(0,\; r_t^{\text{imit}} + P_t)$, combining an imitation term with a penalty. The imitation reward is a weighted sum of six exponential-kernel terms, all computed relative to the pelvis rather than in world frame.

The penalty $P_t = \max(-1,\; -\sum \lambda_p C_p)$ regularizes action bounds violations, action rate, and muscle activation energy.

Training at Scale

MuscleMimic is implemented as a JAX-based framework extending LocoMuJoCo with native MuJoCo Warp support for GPU-accelerated simulation. We train across 8,192 parallel environments for 4.9 billion timesteps using the Muon optimizer for linear layers and Adam for biases and normalization, which yields significantly faster convergence than AdamW. For training on diverse motion datasets, we use the KINESIS dataset (a curated subset of AMASS) and progressively scale to more dynamic motions including Embody3D.

For large scale training, we use the Muon optimizer for linear layers and Adam for biases and normalization, which yields significantly faster and more stable convergence than AdamW.

Single-epoch updates work best. With massively parallel GPU simulation, we can collect fresh data cheaply, so single-epoch updates ($E = 1$) achieve superior asymptotic performance while avoiding pathologies from aggressive sample reuse: expert collapse in Soft MoE routing and severe distribution shift with KL divergence spikes orders of magnitude above the stable baseline.

Effect of gradient epochs on training
Effect of gradient epochs ($E$) on training stability. We compare $E=1$ (truly on-policy), $E=3$, and $E=10$ (aggressive sample reuse). (A) Early training (first 30M steps): higher $E$ accelerates initial learning. (B) Full training trajectory: $E=1$ achieves superior asymptotic performance. (C) KL divergence (log scale): $E>1$ exhibits catastrophic distribution shift with spikes exceeding $10^{10}$, whereas $E=1$ remains stable below $10^{-1}$.

Larger batch sizes improve stability. Larger batch sizes yield higher asymptotic rewards, lower KL divergence, and smoother convergence. Since larger batches require fewer gradient updates per environment step, training is also faster in wall-clock time.

Effect of batch size on training
Effect of minibatch size on training dynamics. We compare minibatch sizes of $32$, $64$, and $128$. (A) Performance: larger batch sizes achieve higher asymptotic rewards. (B) Exploration stability: smaller batches cause the policy standard deviation to overshoot. (C) Policy update magnitude (log scale): larger batches yield lower KL divergence.

Training throughput scales directly with GPU hardware. Newer architectures such as NVIDIA H200 provide significant speedups in both simulation and gradient computation.

Model Validation

Both models were extensively validated for muscle symmetry and biomechanical accuracy. Each muscle-tendon moment arm was cross-validated against its target joint to ensure smooth, continuous profiles, with wrapping geometries manually corrected whenever discontinuities were found. Most refinement concentrated around the shoulder joint, where multiple equality constraints are enforced.

Muscle symmetry — BFLH across Knee Flexion Muscle symmetry — BIClong across Elbow Flexion
Validation of symmetry between left and right muscle-tendon groups of MyoFullBody.

Moment arms were further validated against experimental measurements from cadaver and MRI studies. Despite inter-individual variability in reported values, the simulated profiles remain within experimentally observed ranges.

BRD moment arm validation Biceps femoris moment arm validation
Lat dorsi moment arm validation Rectus femoris moment arm validation
Validation comparing MyoFullBody muscle moment arms against experimental data from prior studies for selected shoulder, elbow, and lower-limb muscles. Despite inter-individual variability, our model's profiles remain within the reported experimental ranges.

Limitations

While our framework demonstrates promising alignment with experimental data, musculoskeletal models remain approximations of biological reality. For example, the Hill-type model in MuJoCo simplifies complex phenomena such as history-dependent force production, heterogeneous fiber recruitment, and tendon elasticity. These assumptions can influence dynamic outcomes and may limit the faithful reproduction of highly explosive or high-impact motions (e.g., martial arts or rapid vertical jumping). Moreover, the current SMPL-based retargeting pipeline assumes generic morphology and matched gender, leaving open questions about how subject-specific anthropometrics affect retargeting accuracy and policy learning. Simulation results at the current stage should therefore be interpreted as model-based predictions and validated against experimental data before clinical applications.

By open-sourcing this framework, we invite the community to iterate on these models: refining muscle parameters, improving joint definitions, and validating against diverse experimental datasets. We also encourage researchers to explore future applications in rehabilitation and human–robot interaction, including training on pathological gait patterns and integration with assistive devices such as exoskeletons.

Preprint of this work will be released soon

Code, checkpoints, and retargeted dataset: github.com/amathislab/musclemimic

Try out musculoskeletal models: pip install musclemimic_models

Acknowledgements

We thank members of the Mathis Group for feedback on the project, and Vittorio Caggiano, James Heald, and Balint K. Hodossy for helpful discussions.

Citation

Please cite this work as

@article{musclemimic2026,
  title   = {Towards Embodied AI with MuscleMimic: Unlocking full-body musculoskeletal motor learning at scale},
  author  = {Chengkun Li and Cheryl Wang and Bianca Ziliotto and Merkourios Simos and Guillaume Durandau and Alexander Mathis},
  year    = {2026},
}