publications | Johan Obando-Ceron

2024

arXiv
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

Gada Sokar, Johan Obando-Ceron, Hugo Larochelle, Aaron Courville and Pablo Samuel Castro

In submission, 2024

Abs arXiv Bib

The use of deep neural networks in reinforcement learning (RL) often suffers from performance degradation as model size increases. While soft mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reasons behind their effectiveness remain largely unknown. In this work we provide an in-depth analysis identifying the key factors driving this performance gain. We discover the surprising result that tokenizing the encoder output, rather than the use of multiple experts, is what is behind the efficacy of SoftMoEs. Indeed, we demonstrate that even with an appropriately scaled single expert, we are able to maintain the performance gains, largely thanks to tokenization.
@article{sokar2024dontflattentokenizeunlocking, title = {Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL}, author = {Ghada Sokar and Johan Obando-Ceron and Aaron Courville and Hugo Larochelle and Pablo Samuel Castro}, journal = {arXiv preprint arXiv:2410.01930}, year = {2024}, }
arXiv
Neuroplastic Expansion in Deep Reinforcement Learning

Jiashun Liu, Johan Obando-Ceron, Aaron Courville and Ling Pan

In submission, 2024

Abs arXiv Bib

The loss of plasticity in learning agents, analogous to the solidification of neural pathways in biological brains, significantly impedes learning and adaptation in reinforcement learning due to its non-stationary nature. To address this fundamental challenge, we propose a novel approach, Neuroplastic Expansion (NE), inspired by cortical expansion in cognitive science. NE maintains learnability and adaptability throughout the entire training process by dynamically growing the network from a smaller initial size to its full dimension. Our method is designed with three key components: (1) elastic neuron generation based on potential gradients, (2) dormant neuron pruning to optimize network expressivity, and (3) neuron consolidation via experience review to strike a balance in the plasticity-stability dilemma. Extensive experiments demonstrate that NE effectively mitigates plasticity loss and outperforms state-of-the-art methods across various tasks in MuJoCo and DeepMind Control Suite environments. NE enables more adaptive learning in complex, dynamic environments, which represents a crucial step towards transitioning deep reinforcement learning from static, one-time training paradigms to more flexible, continually adapting models.
ßß
@article{liu2024neuroplasticexpansiondeepreinforcement, title = {Neuroplastic Expansion in Deep Reinforcement Learning}, author = {Jiashun Liu and Johan Obando-Ceron and Aaron Courville and Ling Pan}, journal = {arXiv preprint arXiv:2410.07994}, year = {2024}, }
RLC
Mixture of Experts in a Mixture of RL settings

Timon Willi*, Johan Obando-Ceron*, Jakob Foerster, Karolina Dziugaite and Pablo Samuel Castro

In Reinforcement Learning Conference, 2024

Abs arXiv Bib

Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's learning capacity and ability to deal with non-stationarity. In this work, we shed more light on MoEs' ability to deal with non-stationarity and investigate MoEs in DRL settings with "amplified" non-stationarity via multi-task training, providing further evidence that MoEs improve learning capacity. In contrast to previous work, our multi-task results allow us to better understand the underlying causes for the beneficial effect of MoE in DRL training, the impact of the various MoE components, and insights into how best to incorporate them in actor-critic-based DRL networks. Finally, we also confirm results from previous work.
@article{willi2024mixtureexpertsmixturerl, title = {Mixture of Experts in a Mixture of RL settings}, author = {Timon Willi and Johan Obando-Ceron and Jakob Foerster and Karolina Dziugaite and Pablo Samuel Castro}, journal = {arXiv preprint arXiv:2406.18420}, year = {2024}, }
RLC
On the consistency of hyper-parameter selection in value-based deep reinforcement learning

Johan Obando-Ceron*, João G. M. Araújo*, Aaron Courville and Pablo Samuel Castro

In Reinforcement Learning Conference, 2024

Abs arXiv Bib

Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed technique. Despite their crucial impact on performance, hyper-parameter choices are frequently overshadowed by algorithmic advancements. This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters. Our findings not only help establish which hyper-parameters are most critical to tune, but also help clarify which tunings remain consistent across different training regimes.
@article{obandoceron2024consistencyhyperparameterselectionvaluebased, title = {On the consistency of hyper-parameter selection in value-based deep reinforcement learning}, author = {Johan Obando-Ceron and João G. M. Araújo and Aaron Courville and Pablo Samuel Castro}, journal = {arXiv preprint arXiv:2406.17523}, year = {2024}, url = {https://openreview.net/forum?id=seo9V9QRZp}, }
ICML
In value-based deep reinforcement learning, a pruned network is a good network

Johan Obando-Ceron, Aaron Courville and Pablo Samuel Castro

In Internation Conference on Machine Learning, 2024

Abs arXiv Bib

Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters.
@inproceedings{ceron2024in, title = {In value-based deep reinforcement learning, a pruned network is a good network }, author = {Johan Samir Obando Ceron and Aaron Courville and Pablo Samuel Castro}, booktitle = {Forty-first International Conference on Machine Learning}, year = {2024}, url = {https://openreview.net/forum?id=seo9V9QRZp}, }
ICML
Mixtures of Experts Unlock Parameter Scaling for Deep RL

Johan Obando-Ceron*, Ghada Sokar*, Timon Willi*, Clare Lyle, Jesse Farebrother, Jakob Nicolaus Foerster, Gintare Karolina Dziugaite, Doina Precup and Pablo Samuel Castro

In Internation Conference on Machine Learning, 2024

Abs arXiv Bib

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
@inproceedings{ceron2024mixtures, title = {Mixtures of Experts Unlock Parameter Scaling for Deep RL}, author = {Johan Samir Obando Ceron and Ghada Sokar and Timon Willi and Clare Lyle and Jesse Farebrother and Jakob Nicolaus Foerster and Gintare Karolina Dziugaite and Doina Precup and Pablo Samuel Castro},, booktitle = {Forty-first International Conference on Machine Learning}, year = {2024}, url = {https://openreview.net/forum?id=X9VMhfFxwn}, }
CPAL
JaxPruner: A concise library for sparsity research

Lee, Joo Hyung, Park, Wonpyo, Mitchell, Nicole Elyse and 4 more authors

In Conference on Parsimony and Learning, 2024

Abs arXiv Bib

This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks. Jaxpruner is hosted at github. com/google-research/jaxpruner
@inproceedings{lee2024jaxpruner, title = {JaxPruner: A concise library for sparsity research}, author = {Lee, Joo Hyung and Park, Wonpyo and Mitchell, Nicole Elyse and Pilault, Jonathan and Ceron, Johan Samir Obando and Kim, Han-Byul and Lee, Namhoon and Frantar, Elias and Long, Yun and Yazdanbakhsh, Amir and others}, booktitle = {Conference on Parsimony and Learning}, year = {2024}, }

2023

NeurIPS
Small batch deep reinforcement learning

Johan Obando-Ceron, Marc G. Bellemare, and Pablo Samuel Castro

In Neural Information Processing Systems, 2023

Abs arXiv Bib

In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests reducing the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon..
@inproceedings{ceron2023small, title = {Small batch deep reinforcement learning}, author = {Ceron, Johan Samir Obando and Bellemare, Marc G and Castro, Pablo Samuel}, booktitle = {Thirty-seventh Conference on Neural Information Processing Systems}, year = {2023}, }
ICML
Bigger, better, faster: Human-level atari with human-level efficiency

Max Schwarzer*, Johan Obando-Ceron*, Aaron Courville, Marc G. Bellemare, Rishabh Agarwal* and Pablo Samuel Castro*.

In International Conference on Machine Learning, 2023

Abs arXiv Bib

We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github. com/google-research/google-research/tree/master/bigger_better_faster.
@inproceedings{schwarzer2023bigger, title = {Bigger, better, faster: Human-level atari with human-level efficiency}, author = {Schwarzer, Max and Ceron, Johan Samir Obando and Courville, Aaron and Bellemare, Marc G and Agarwal, Rishabh and Castro, Pablo Samuel}, booktitle = {International Conference on Machine Learning}, pages = {30365--30380}, year = {2023}, organization = {PMLR}, }

2021

ICML
Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research

Johan Obando-Ceron and Pablo Samuel Castro.

In International Conference on Machine Learning, 2021

Abs arXiv Bib

Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators. New methods are typically evaluated on a set of environments that have now become standard, such as Atari 2600 games. While these benchmarks help standardize evaluation, their computational cost has the unfortunate side effect of widening the gap between those with ample access to computational resources, and those without. In this work we argue that, despite the community’s emphasis on large-scale environments, the traditional small-scale environments can still yield valuable scientific insights and can help reduce the barriers to entry for underprivileged communities. To substantiate our claims, we empirically revisit the paper which introduced the Rainbow algorithm [Hessel et al., 2018] and present some new insights into the algorithms used by Rainbow.
@inproceedings{pmlr-v139-ceron21a, title = {Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research}, author = {Ceron, Johan Samir Obando and Castro, Pablo Samuel}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {1373--1383}, year = {2021}, month = {18--24 Jul}, organization = {PMLR}, }