Decision Awareness
in Reinforcement Learning

Workshop at the International Conference on Machine Learning (ICML) 2022

July 22, HALL G T2500 at the Baltimore Convention Center

@DARL_ICML · #DARL_workshop_ICML2022


Recently, there has been a rising interest in the RL community in studying and leveraging the interplay among the moving parts of RL systems. It has been done by designing new loss functions, end-to-end procedures, and meta-learning frameworks. Decision awareness lead not only to a deeper theoretical understanding of the limitations of traditional RL approaches, but also to high-performing algorithms with promising applicability to real-world problems.

This page contains a non-exhaustive list of papers related to decision awareness in RL. We welcome additional resource suggestions via a pull request on GitHub.

References (Alphabetical Order)

Abachi, R., Ghavamzadeh, M., & Farahmand, A. (2020). Policy-aware model learning for policy gradient methods. ArXiv Preprint ArXiv:2003.00030.

Agarwal, R., Liang, C., Schuurmans, D., & Norouzi, M. (2019). Learning to generalize from sparse and underspecified rewards. International Conference on Machine Learning, 130-140.

Amos, B., Rodriguez, I. D. J., Sacks, J., Boots, B., & Kolter, J. Z. (2018). Differentiable mpc for end-to-end planning and control. ArXiv Preprint ArXiv:1810.13400.

Ayoub, A., Jia, Z., Szepesvari, C., Wang, M., & Yang, L. (2020). Model-based reinforcement learning with value-targeted regression. International Conference on Machine Learning, 463–474.

Balduzzi, D., & Ghifary, M. (2015). Compatible value gradients for reinforcement learning of continuous deep policies. ArXiv Preprint ArXiv:1509.03005.

Dabney, W., Barreto, A., Rowland, M., Dadashi, R., Quan, J., Bellemare, M. G., & Silver, D. (2021). The Value-Improvement Path: Towards Better Representations for Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8), 7160-7168.

Donti, P., Amos, B., & Kolter, J. Z. (2017). Task-based End-to-end Model Learning in Stochastic Optimization. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 30).

D’Oro, P., & Jaskowski, W. (2020). How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization. Advances in Neural Information Processing Systems 33.

D’Oro, P., Metelli, A. M., Tirinzoni, A., Papini, M., & Restelli, M. (2020). Gradient-aware model-based policy search. Proceedings of the AAAI Conference on Artificial Intelligence, 34(4), 3801–3808.

East, S., Gallieri, M., Masci, J., Koutnı́k, J., & Cannon, M. (2020). Infinite-Horizon Differentiable Model Predictive Control. 8th International Conference on Learning Representations, ICLR 2020.

Farahmand, A. (2018). Iterative Value-Aware Model Learning. Advances in Neural Information Processing Systems (NeurIPS), 9072–9083.

Farahmand, A., Barreto, A. M. S., & Nikovski, D. N. (2017). Value-Aware Loss Function for Model-based Reinforcement Learning. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 1486–1494.

Farquhar, G., Rocktäschel, T., Igl, M., & Whiteson, S. (2018). TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning. 6th International Conference on Learning Representations, ICLR 2018.

Grimm, C., Barreto, A. M. S., Singh, S., & Silver, D. (2020). The Value Equivalence Principle for Model-Based Reinforcement Learning. Advances in Neural Information Processing Systems (NeurIPS).

Kirsch, L., van Steenkiste, S., & Schmidhuber, J. (2019). Improving generalization in meta reinforcement learning using learned objectives. ArXiv Preprint ArXiv:1910.04098.

Lambert, N., Amos, B., Yadan, O., & Calandra, R. (2020). Objective Mismatch in Model-based Reinforcement Learning. ArXiv Preprint ArXiv:2002.04523.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., & others. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.

Nair, S., Savarese, S., & Finn, C. (2020). Goal-aware prediction: Learning to model what matters. International Conference on Machine Learning, 7207–7219.

Nikishin, E., Abachi, R., Agarwal, R., & Bacon, P.-L. (2021). Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation. ArXiv Preprint ArXiv:2106.03273.

Oh, J., Hessel, M., Czarnecki, W. M., Xu, Z., van Hasselt, H., Singh, S., & Silver, D. (2020). Discovering reinforcement learning algorithms. ArXiv Preprint ArXiv:2007.08794.

Oh, J., Singh, S., & Lee, H. (2017). Value prediction network. Advances in Neural Information Processing Systems, 30.

Rajendran, J., Lewis, R., Veeriah, V., Lee, H., & Singh, S. (2020). How Should an Agent Practice?. Proceedings of the AAAI Conference on Artificial Intelligence, 34(4), 5454-5461.

Rajeswaran, A., Mordatch, I., & Kumar, V. (2020). A game theoretic framework for model based reinforcement learning. International Conference on Machine Learning, 7953–7963.

Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., & others. (2020). Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839), 604–609.

Silver, D., Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., Dulac-Arnold, G., Reichert, D., Rabinowitz, N., Barreto, A., & others. (2017). The predictron: End-to-end learning and planning. International Conference on Machine Learning, 3191–3199.

Silver, D., Singh, S., Precup, D., & Sutton, R. S. (2021). Reward is enough. Artificial Intelligence, 299, 103535.

Tamar, A., Wu, Y., Thomas, G., Levine, S., & Abbeel, P. (2016). Value iteration networks. Advances in Neural Information Processing Systems, 29.

Voelcker, C. A., Liao, V., Garg, A., & Farahmand, A. (2021). Value Gradient weighted Model-Based Reinforcement Learning. International Conference on Learning Representations.

Xu, H., Li, Y., Tian, Y., Darrell, T., & Ma, T. (2018). Algorithmic framework for model-based reinforcement learning with theoretical guarantees. ArXiv Preprint ArXiv:1807.03858.

Xu, Z., van Hasselt, H., Hessel, M., Oh, J., Singh, S., & Silver, D. (2020). Meta-gradient reinforcement learning with an objective discovered online. ArXiv Preprint ArXiv:2007.08433.

Xu, Z., van Hasselt, H., & Silver, D. (2018). Meta-gradient reinforcement learning. ArXiv Preprint ArXiv:1805.09801.