Recursive AI Models Beat Scaling: HRM and TRM Explained
Early research treated recurrent neural networks (RNNs) as a path toward artificial general intelligence, but training them required back‑propagation through time (BPTT). Gradient explosion or vanishing made deep recursion unstable, especially when an input demanded many steps. Transformers avoided BPTT by processing all time steps in parallel with causal masks, achieving “one‑shot” efficiency. However, this parallelism eliminates the latent reasoning that RNNs performed across time, forcing the model to retain the entire context for every decode step.
Reasoning Limitations in Large Language Models
Standard feed‑forward transformers struggle with incompressible tasks such as sorting, Sudoku, or maze navigation. These problems require a number of explicit comparisons that exceed what a single forward pass can encode, and the models lack an external memory tape to store intermediate results. Chain‑of‑Thought prompting and tool‑use act as hacks: they surface human‑derived algorithms from the training corpus but do not enable the model to discover new procedures from first principles. Moreover, reasoning in a discrete token space is less expressive than continuous latent‑space computation.
Hierarchical and Tiny Recursive Architectures
Hierarchical Reasoning Models (HRM) introduce three recursion levels—low‑level, high‑level, and an outer refinement loop. With only 27 million parameters, HRM reached state‑of‑the‑art performance on the ARC Prize benchmark. Tiny Recursive Models (TRM) collapse the low‑ and high‑level networks into a single weight‑shared module, cut transformer layers, and reduce parameters to 7 million. TRM improves ARC accuracy to 87 % (up from HRM’s 70 %). Both architectures train with truncated BPTT (t = 1) and fixed‑point iteration, sidestepping the noise accumulation of long‑sequence gradient propagation. TRM treats the hidden/carry memory as a mini‑batch, constructing “mini‑batches” across the latent space rather than across separate inputs.
Mechanisms Behind Recursive Reasoning
The outer refinement loop repeatedly applies the same weights to the input, updating a latent state Z and local variables ZL until the solution stabilizes. Fixed‑point iteration, often implemented as a Deep Equilibrium (DEQ) model, runs the network 16 times so residuals approach zero, effectively solving the task as a convergence problem. Truncated BPTT limits back‑propagation to a single recursive step, preventing gradient noise and memory bloat while still allowing the model to learn deep iterative behavior. By maintaining a continuous high‑dimensional hidden state, these models use the latent space as a reusable “tape” for computation, contrasting with token‑by‑token decoding in conventional LLMs.
Future Directions
Combining large‑scale LLM embedding spaces with recursive reasoning modules could leverage the broad knowledge of massive transformers while retaining the algorithmic efficiency of HRM and TRM. Such hybrids may achieve strong performance on reasoning benchmarks without the prohibitive parameter counts of ever‑larger language models.
Takeaways
- Recursive inference can improve reasoning performance without increasing model size, addressing tasks that single‑pass transformers cannot solve.
- Standard LLMs lack external memory and latent compression, making incompressible problems like sorting and Sudoku difficult for a one‑shot pass.
- HRM achieves state‑of‑the‑art ARC results with 27 M parameters by using three recursion levels and an outer refinement loop.
- TRM simplifies HRM through weight sharing and deep recursion, reaching 87 % ARC accuracy with only 7 M parameters.
- Future systems may embed large‑scale LLM knowledge into recursive modules, merging broad language understanding with efficient iterative reasoning.
Frequently Asked Questions
Why are recursive models considered more efficient than scaling up language models?
Recursive models reuse the same weights across multiple passes, allowing them to solve incompressible tasks with far fewer parameters. By iteratively refining a latent state, they avoid the need to store the entire context for each token, which reduces memory use and sidesteps gradient explosion.
How does the outer refinement loop work in HRM and TRM?
The outer refinement loop repeatedly applies the model to the input, updating a continuous latent state (Z) and local variables (ZL) until convergence. Fixed‑point iteration runs the network multiple times, treating the hidden state as a mini‑batch and enabling the system to solve tasks that require many sequential steps.
Who is Y Combinator on YouTube?
Y Combinator is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.
Does this page include the full transcript of the video?
Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.
Helpful resources related to this video
If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.
Links may be affiliate links. We only include resources that are genuinely relevant to the topic.