New Gradient-Based Planner GRASP Unlocks Long-Horizon World Model Planning

By ⚡ min read

Breaking: GRASP Solves Long-Horizon Planning Fragility in World Models

Researchers have developed a new gradient-based planner called GRASP that dramatically improves long-horizon planning in learned world models. The method overcomes key fragility issues that have limited AI agents' ability to plan over many future steps using predictive models.

New Gradient-Based Planner GRASP Unlocks Long-Horizon World Model Planning — Source: bair.berkeley.edu

The Innovation

GRASP introduces three core innovations: lifting the trajectory into virtual states for parallel optimization across time, adding stochasticity directly to state iterates for better exploration, and reshaping gradients to avoid brittle signals through high-dimensional vision models. These changes make gradient-based planning much more robust for extended planning horizons.

"World models are becoming general-purpose simulators, but using them for planning has been a stress test," said Mike Rabbat, co-author of the study. "GRASP makes it practical by addressing the ill-conditioned optimization and local minima that plague long-horizon planning."

Background: The Promise and Pain of World Models

World models are learned predictive models that take current state and actions and forecast future observations. They excel at generating long sequences in high-dimensional visual spaces and generalize across tasks better than older task-specific models.

However, planning with these models over long horizons remains fragile. The optimization becomes ill-conditioned, non-greedy structure creates bad local minima, and high-dimensional latent spaces introduce subtle failures.

"Having a powerful predictive model is not the same as being able to use it effectively for control," explained Aditi Krishnapriyan, another co-author. "Long horizons are the real stress test for gradient-based planning."

What This Means for AI and Robotics

With GRASP, AI agents can now plan over many future steps using learned world models without the brittleness that previously limited deployment. This has direct applications in robotics, game AI, and autonomous systems where long-term decision-making is critical.

The method parallelizes optimization across time steps and adds stochasticity to avoid local minima — two upgrades that together make planning more reliable and scalable. Early tests show robust performance on tasks requiring dozens of future actions.

Key Technical Contributions

Virtual state lifting: Enables parallel gradient computation across the entire planning horizon.
Stochastic state iterates: Introduces controlled noise for better exploration during optimization.
Reshaped gradients: Cleans action signals while avoiding brittle gradients through vision models.

The full paper, co-authored with Yann LeCun and Amir Bar, is available online. The researchers emphasize that GRASP works with existing world models without architectural changes.

More broadly, this work moves world models closer to practical use in complex decision-making. What it means is that AI systems can now plan further ahead with greater confidence, an essential step toward general intelligence.