What critical flaw caused rule-based dialog state tracking to suffer when the SLU provided multiple hypotheses?
It was brittle and ignored all but the top SLU result.
Rule-based dialog state tracking utilized manually created update functions, such as $F(s, ilde{u}_0) = s'$, where $ ilde{u}_0$ represented the single best interpretation from the Speech Language Understanding (SLU) module. The inherent limitation was its inability to manage the natural uncertainty present in real-world speech. If the SLU module returned an N-best list containing several plausible interpretations, the rule-based system typically discarded all but the single highest-ranked hypothesis. This lack of robustness meant that even minor errors in the initial interpretation by the ASR or SLU module inevitably led to state tracking errors because the system could not reason over alternative possibilities.