Regarding LLM dialogue agents, what characteristic defines their action space?

Answer

The entire vocabulary, often tens of thousands of tokens.

When applying reinforcement learning to large language models (LLMs) for dialogue, the action space becomes extraordinarily vast and discrete, comprising the entire vocabulary available for token generation, which can easily number in the tens of thousands. This massive size presents a significant computational hurdle, as traditional exploration methods, such as those used in simpler Q-learning environments, become infeasible. The success of modern alignment techniques like RLHF relies heavily on the fact that the base LLM is already highly capable due to pre-training, allowing the RL fine-tuning step to merely steer this immense action space toward human-aligned objectives rather than teaching language generation from scratch.

Regarding LLM dialogue agents, what characteristic defines their action space?
Artificial Intelligencemachine learningreinforcement learningdialogue