Multiple Choice

In the RL trajectory example from Machine Learning Yearning, which component serves as the approximate maximization algorithm?

Related