While large language models (LLMs) have shown impressive abilities, their capacities for reasoning (like chain-of-thought) and acting (like action plan generation) have been studied separately. This paper proposes ReAct, a new paradigm where LLMs generate both reasoning traces and task-specific actions in an interleaved fashion.
This approach creates a powerful synergy:
- Reasoning helps the model to create, track, and update action plans, as well as handle exceptions.
- Acting allows the model to connect with external sources, such as a Wikipedia API, to gather additional information and ground its reasoning.
The ReAct framework was tested on a variety of language and decision-making tasks. In question answering (HotpotQA) and fact verification (Fever), it successfully mitigates issues of hallucination and error propagation common in chain-of-thought models. On interactive decision-making benchmarks like ALFWorld and WebShop, ReAct significantly outperformed imitation and reinforcement learning methods. The resulting task-solving trajectories are more human-like, interpretable, and trustworthy.
