Discriminative models fundamentally changed the objective by directly modeling which probability?
The posterior probability of the state given observations and features $P(s'|\mathbf{f}')$.
Discriminative modeling introduced a major paradigm shift by changing what the model was tasked to compute. Unlike generative models, which modeled *how* an observation was produced from a state (the likelihood), discriminative methods focused directly on predicting the target state. They modeled the posterior probability, specifically $b'(s') = P(s'| ext{features}')$, meaning the model directly calculated the probability of being in a new state ($s'$) given the observed features ($ ext{features}'$) from the current dialogue turn. This shift allowed the model parameters to be learned automatically to maximize prediction accuracy based on observed data.