To investigate the sources of exploration in the modular agent, we performed an ablation where experienced transitions were only saved to memories for a particular module in the following cases: a) The action taken was nongreedy (i.e., random) or b) the action taken was the preferred action of that module. In the monolithic case, in order to control for less transitions being stored overall, 30% of nongreedy actions were randomly selected to not be stored in memory, an amount that was roughly similar to the number of transitions that were not saved for the modular model.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.