A single agent runs greedy hill-climbing. It tries one change, checks the metric, keeps or reverts, and repeats. This works well for the first few hours. Then it stalls.
The problem is parameter interaction. Batch size, learning rate, and model depth affect each other. Halving the batch size might only help if you also raise the learning rate. A sequential agent tests these one at a time, so it never sees the interaction. It tries the batch change alone, sees no gain, reverts, and moves on. The combined change that would have worked never gets tested.