beam_size=1
, ReasoningAgent behaves similarly to Chain-of-Thought (CoT) or O1-style reasoning, where only a single reasoning path is explored. This is useful for:
ReasoningAgent
, you may find the LLM’s API expense is really high.
On the other hand, the thought tree is a good training dataset for SFT, DPO, and PPO.
After asking a question to the ReasoningAgent
, you only need to simply call the to_dict
function to save the thought tree as a file.
pickle
directly to save the thought tree.