Real-Time Reasoning Agents in Evolving Environments

1Tsinghua University   2Shanghai Jiao Tong University
3Georgia Institute of Technology   4Stanford University
*Co-leading authors
Figure 1 showing the overview of AgileThinker architecture

We create three real-time games, Freeway, Snake and Overcooked, to study the challenge of real-time reasoning. In these games, agents need to deal with dynamic environments smartly and timely to achieve high rewards. Experiments show that under cognitive load and time pressure, AgileThinker (Ours), which engages two LLMs with both System 1 and 2 reasoning, greatly outperforms agents engaging only one LLM. Here scores are normalized to [0, 1] for each game and then taken an average.

Abstract

Agents in the real world need to make not only logical but also timely judgments, which demands continuous awareness of the dynamic environment where hazards emerge, opportunities arise, and other agents act - all while the agent's own reasoning is still unfolding. Despite significant advances in reasoning capabilities of language models, existing approaches fail to account for this dynamic nature. We introduce real-time reasoning as a new problem formulation for bringing reasoning capabilities to agents operating in evolving environments and build a Real-Time Reasoning Gym to demonstrate it. We study two paradigms for deploying reasoning language models in agents: System 1 Agent, which employs reasoning models with bounded computation for rapid responses, and System 2 Agent, which allows extended computation for complex problems. Our experiments reveal that even state-of-the-art language models struggle with making logical and timely judgment via either of the two paradigms. To address this limitation, we propose AgileThinker, a parallel architecture that simultaneously engages both reasoning systems. This approach demonstrates superior performance as task difficulty and time pressure increase, managing the trade-off between reasoning depth and response latency. Our work establishes real-time reasoning as a critical frontier for developing practical reasoning agents and provides a foundation for future research in temporally-constrained artificial intelligence systems, highlighting a path toward real-time capable language agents.

Agent Reasoning Progress Comparison

Choose Your Settings

Using the realtimegym Python Package

Installation

Install the Real-Time Reasoning Gym package using pip:

git clone git@github.com:wenyl22/RealtimeGym.git
cd RealtimeGym
pip install -e .

Quick Start

Get started with a simple example:

import realtimegym

# Create environment
env, seed, renderer = realtimegym.make('Freeway-v0')
obs, done = env.reset()

Available Environments

Freeway

Navigate through dynamic traffic with real-time decision making.

realtimegym.make('Freeway-v0')

Snake

Strategic planning for food collection while avoiding obstacles.

realtimegym.make('Snake-v0')

Overcooked

Cooperative cooking with coordination and task prioritization.

realtimegym.make('Overcooked-v0')

Agent Implementations

Reactive Agent

Fast, intuitive System 1

Always react quickly with bounded compute; no planning thread.


class ReactiveAgent:
  def think(timeout):
    start_reactive_thread(current_observation, "")
    run_reactive_thread(internal_budget)
    if reactive_thread_is_alive():
      s1_budget_forcing()
    action = get_reactive_thread_response()

Planning Agent

Slow, deliberate System 2

Plan first within the full timeout, then execute the first action.


class PlanningAgent:
  def think(timeout):
    if not planning_thread_is_alive():
      start_planning_thread(current_observation)
    run_planning_thread(timeout)
    if not planning_thread_is_alive():
      plan = get_planning_thread_response()
    action = plan[0]; plan = plan[1:]

AgileThinker

Parallel: System 1 + System 2

Plan in parallel with a fast reactive thread; use budget-aware forcing.


class AgileThinker:
  def think(timeout):
    if not planning_thread_is_alive():
      start_planning_thread(current_observation)
    run_planning_thread(timeout - internal_budget)
    plan = get_planning_thread_response()
    start_reactive_thread(current_observation, plan)
    run_reactive_thread(internal_budget)
    if reactive_thread_is_alive(): s1_budget_forcing()
    action = get_reactive_thread_response()

Complete Example

Show complete code example
import realtimegym
from realtimegym.agents.agile import AgileThinker
from realtimegym.prompts import freeway as prompt

env, seed, _ = realtimegym.make_env("Freeway-v0")
obs, done = env.reset()

log_file = "freeway_v0_agile.csv"
agent = AgileThinker(prompt, log_file, 'token')

while not done:
    agent.observe(obs)    # Fast observation
    agent.think(timeout=4096)    # Bounded thinking (token or seconds)
    action = agent.act()
    obs, done, reward, reset = env.step(action)

BibTeX

@article{wen2024realtime,
  title={Real-Time Reasoning Agents in Evolving Environments},
  author={Wen, Yule and Ye, Yixin and Zhang, Yanzhe and Yang, Diyi and Zhu, Hao},
  journal={International Conference on Learning Representations},
  year={2025},
  url={https://bleaves.github.io/real-time-reasoning/}
}