Deep Reinforcement Learning Papers

Deep Reinforcement Learning Papers

A list of recent papers regarding deep reinforcement learning.

The papers are organized based on manually-defined bookmarks.

They are sorted by time to see the recent papers first.

Any suggestions and pull requests are welcome.

Bookmarks

All Papers

Value

Policy

Discrete Control

Continuous Control

Text Domain

Visual Domain

Robotics

Games

Monte-Carlo Tree Search

Inverse Reinforcement Learning

Improving Exploration

Multi-Task and Transfer Learning

Multi-Agent

Hierarchical Learning

All Papers

Model-Free Episodic Control, C. Blundell et al.,arXiv, 2016.

Safe and Efficient Off-Policy Reinforcement Learning, R. Munos et al.,arXiv, 2016.

Deep Successor Reinforcement Learning, T. D. Kulkarni et al.,arXiv, 2016.

Unifying Count-Based Exploration and Intrinsic Motivation, M. G. Bellemare et al.,arXiv, 2016.

Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks, R. Houthooft et al.,arXiv, 2016.

Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al.,ICML, 2016.

Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al.,IJCAI Deep RL Workshop, 2016.

Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks, R. Krishnamurthy et al.,arXiv, 2016.

Benchmarking Deep Reinforcement Learning for Continuous Control, Y. Duan et al.,ICML, 2016.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al.,arXiv, 2016.

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection, S. Levine et al.,arXiv, 2016.

Continuous Deep Q-Learning with Model-based Acceleration, S. Gu et al.,ICML, 2016.

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, C. Finn et al.,arXiv, 2016.

Deep Exploration via Bootstrapped DQN, I. Osband et al.,arXiv, 2016.

Value Iteration Networks, A. Tamar et al.,arXiv, 2016.

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks, J. N. Foerster et al.,arXiv, 2016.

Asynchronous Methods for Deep Reinforcement Learning, V. Mnih et al.,arXiv, 2016.

Mastering the game of Go with deep neural networks and tree search, D. Silver et al.,Nature, 2016.

Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al.,AAAI, 2016.

Memory-based control with recurrent neural networks, N. Heess et al.,NIPS Workshop, 2015.

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. Fran?ois-Lavet et al.,NIPS Workshop, 2015.

Multiagent Cooperation and Competition with Deep Reinforcement Learning, A. Tampuu et al.,arXiv, 2015.

Strategic Dialogue Management via Deep Reinforcement Learning, H. Cuayáhuitl et al.,NIPS Workshop, 2015.

MazeBase: A Sandbox for Learning from Games, S. Sukhbaatar et al.,arXiv, 2016.

Learning Simple Algorithms from Examples, W. Zaremba et al.,arXiv, 2015.

Dueling Network Architectures for Deep Reinforcement Learning, Z. Wang et al.,arXiv, 2015.

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning, E. Parisotto, et al.,ICLR, 2016.

Better Computer Go Player with Neural Network and Long-term Prediction, Y. Tian et al.,ICLR, 2016.

Policy Distillation, A. A. Rusu et at.,ICLR, 2016.

Prioritized Experience Replay, T. Schaul et al.,ICLR, 2016.

Deep Reinforcement Learning with an Action Space Defined by Natural Language, J. He et al.,arXiv, 2015.

Deep Reinforcement Learning in Parameterized Action Space, M. Hausknecht et al.,ICLR, 2016.

Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control, F. Zhang et al.,arXiv, 2015.

Generating Text with Deep Reinforcement Learning, H. Guo,arXiv, 2015.

ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources, J. Rajendran et al.,arXiv, 2015.

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning, S. Mohamed and D. J. Rezende,arXiv, 2015.

Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al.,arXiv, 2015.

Recurrent Reinforcement Learning: A Hybrid Approach, X. Li et al.,arXiv, 2015.

Continuous control with deep reinforcement learning, T. P. Lillicrap et al.,ICLR, 2016.

Language Understanding for Text-based Games Using Deep Reinforcement Learning, K. Narasimhan et al.,EMNLP, 2015.

Giraffe: Using Deep Reinforcement Learning to Play Chess, M. Lai,arXiv, 2015.

Action-Conditional Video Prediction using Deep Networks in Atari Games, J. Oh et al.,NIPS, 2015.

Learning Continuous Control Policies by Stochastic Value Gradients, N. Heess et al.,NIPS, 2015.

Learning Deep Neural Network Policies with Continuous Memory States, M. Zhang et al.,arXiv, 2015.

Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone,arXiv, 2015.

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences, H. Mei et al.,arXiv, 2015.

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, B. C. Stadie et al.,arXiv, 2015.

Maximum Entropy Deep Inverse Reinforcement Learning, M. Wulfmeier et al.,arXiv, 2015.

High-Dimensional Continuous Control Using Generalized Advantage Estimation, J. Schulman et al.,ICLR, 2016.

End-to-End Training of Deep Visuomotor Policies, S. Levine et al.,arXiv, 2015.

DeepMPC: Learning Deep Latent Features for Model Predictive Control, I. Lenz, et al.,RSS, 2015.

Universal Value Function Approximators, T. Schaul et al.,ICML, 2015.

Deterministic Policy Gradient Algorithms, D. Silver et al.,ICML, 2015.

Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al.,ICML Workshop, 2015.

Trust Region Policy Optimization, J. Schulman et al.,ICML, 2015.

Human-level control through deep reinforcement learning, V. Mnih et al.,Nature, 2015.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, X. Guo et al.,NIPS, 2014.

Playing Atari with Deep Reinforcement Learning, V. Mnih et al.,NIPS Workshop, 2013.

Value

Model-Free Episodic Control, C. Blundell et al.,arXiv, 2016.

Safe and Efficient Off-Policy Reinforcement Learning, R. Munos et al.,arXiv, 2016.

Deep Successor Reinforcement Learning, T. D. Kulkarni et al.,arXiv, 2016.

Unifying Count-Based Exploration and Intrinsic Motivation, M. G. Bellemare et al.,arXiv, 2016.

Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al.,ICML, 2016.

Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al.,IJCAI Deep RL Workshop, 2016.

Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks, R. Krishnamurthy et al.,arXiv, 2016.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al.,arXiv, 2016.

Continuous Deep Q-Learning with Model-based Acceleration, S. Gu et al.,ICML, 2016.

Deep Exploration via Bootstrapped DQN, I. Osband et al.,arXiv, 2016.

Value Iteration Networks, A. Tamar et al.,arXiv, 2016.

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks, J. N. Foerster et al.,arXiv, 2016.

Asynchronous Methods for Deep Reinforcement Learning, V. Mnih et al.,arXiv, 2016.

Mastering the game of Go with deep neural networks and tree search, D. Silver et al.,Nature, 2016.

Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al.,AAAI, 2016.

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. Fran?ois-Lavet et al.,NIPS Workshop, 2015.

Multiagent Cooperation and Competition with Deep Reinforcement Learning, A. Tampuu et al.,arXiv, 2015.

Strategic Dialogue Management via Deep Reinforcement Learning, H. Cuayáhuitl et al.,NIPS Workshop, 2015.

Learning Simple Algorithms from Examples, W. Zaremba et al.,arXiv, 2015.

Dueling Network Architectures for Deep Reinforcement Learning, Z. Wang et al.,arXiv, 2015.

Prioritized Experience Replay, T. Schaul et al.,ICLR, 2016.

Deep Reinforcement Learning with an Action Space Defined by Natural Language, J. He et al.,arXiv, 2015.

Deep Reinforcement Learning in Parameterized Action Space, M. Hausknecht et al.,ICLR, 2016.

Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control, F. Zhang et al.,arXiv, 2015.

Generating Text with Deep Reinforcement Learning, H. Guo,arXiv, 2015.

Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al.,arXiv, 2015.

Recurrent Reinforcement Learning: A Hybrid Approach, X. Li et al.,arXiv, 2015.

Continuous control with deep reinforcement learning, T. P. Lillicrap et al.,ICLR, 2016.

Language Understanding for Text-based Games Using Deep Reinforcement Learning, K. Narasimhan et al.,EMNLP, 2015.

Action-Conditional Video Prediction using Deep Networks in Atari Games, J. Oh et al.,NIPS, 2015.

Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone,arXiv, 2015.

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, B. C. Stadie et al.,arXiv, 2015.

Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al.,ICML Workshop, 2015.

Human-level control through deep reinforcement learning, V. Mnih et al.,Nature, 2015.

Playing Atari with Deep Reinforcement Learning, V. Mnih et al.,NIPS Workshop, 2013.

Policy

Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks, R. Houthooft et al.,arXiv, 2016.

Benchmarking Deep Reinforcement Learning for Continuous Control, Y. Duan et al.,ICML, 2016.

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection, S. Levine et al.,arXiv, 2016.

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, C. Finn et al.,arXiv, 2016.

Asynchronous Methods for Deep Reinforcement Learning, V. Mnih et al.,arXiv, 2016.

Mastering the game of Go with deep neural networks and tree search, D. Silver et al.,Nature, 2016.

Memory-based control with recurrent neural networks, N. Heess et al.,NIPS Workshop, 2015.

MazeBase: A Sandbox for Learning from Games, S. Sukhbaatar et al.,arXiv, 2016.

ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources, J. Rajendran et al.,arXiv, 2015.

Continuous control with deep reinforcement learning, T. P. Lillicrap et al.,ICLR, 2016.

Learning Continuous Control Policies by Stochastic Value Gradients, N. Heess et al.,NIPS, 2015.

High-Dimensional Continuous Control Using Generalized Advantage Estimation, J. Schulman et al.,ICLR, 2016.

End-to-End Training of Deep Visuomotor Policies, S. Levine et al.,arXiv, 2015.

Deterministic Policy Gradient Algorithms, D. Silver et al.,ICML, 2015.

Trust Region Policy Optimization, J. Schulman et al.,ICML, 2015.

Discrete Control

Model-Free Episodic Control, C. Blundell et al.,arXiv, 2016.

Safe and Efficient Off-Policy Reinforcement Learning, R. Munos et al.,arXiv, 2016.

Deep Successor Reinforcement Learning, T. D. Kulkarni et al.,arXiv, 2016.

Unifying Count-Based Exploration and Intrinsic Motivation, M. G. Bellemare et al.,arXiv, 2016.

Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al.,ICML, 2016.

Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al.,IJCAI Deep RL Workshop, 2016.

Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks, R. Krishnamurthy et al.,arXiv, 2016.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al.,arXiv, 2016.

Deep Exploration via Bootstrapped DQN, I. Osband et al.,arXiv, 2016.

Value Iteration Networks, A. Tamar et al.,arXiv, 2016.

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks, J. N. Foerster et al.,arXiv, 2016.

Asynchronous Methods for Deep Reinforcement Learning, V. Mnih et al.,arXiv, 2016.

Mastering the game of Go with deep neural networks and tree search, D. Silver et al.,Nature, 2016.

Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al.,AAAI, 2016.

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. Fran?ois-Lavet et al.,NIPS Workshop, 2015.

Multiagent Cooperation and Competition with Deep Reinforcement Learning, A. Tampuu et al.,arXiv, 2015.

Strategic Dialogue Management via Deep Reinforcement Learning, H. Cuayáhuitl et al.,NIPS Workshop, 2015.

Learning Simple Algorithms from Examples, W. Zaremba et al.,arXiv, 2015.

Dueling Network Architectures for Deep Reinforcement Learning, Z. Wang et al.,arXiv, 2015.

Better Computer Go Player with Neural Network and Long-term Prediction, Y. Tian et al.,ICLR, 2016.

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning, E. Parisotto, et al.,ICLR, 2016.

Policy Distillation, A. A. Rusu et at.,ICLR, 2016.

Prioritized Experience Replay, T. Schaul et al.,ICLR, 2016.

Deep Reinforcement Learning with an Action Space Defined by Natural Language, J. He et al.,arXiv, 2015.

Deep Reinforcement Learning in Parameterized Action Space, M. Hausknecht et al.,ICLR, 2016.

Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control, F. Zhang et al.,arXiv, 2015.

Generating Text with Deep Reinforcement Learning, H. Guo,arXiv, 2015.

ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources, J. Rajendran et al.,arXiv, 2015.

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning, S. Mohamed and D. J. Rezende,arXiv, 2015.

Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al.,arXiv, 2015.

Recurrent Reinforcement Learning: A Hybrid Approach, X. Li et al.,arXiv, 2015.

Language Understanding for Text-based Games Using Deep Reinforcement Learning, K. Narasimhan et al.,EMNLP, 2015.

Giraffe: Using Deep Reinforcement Learning to Play Chess, M. Lai,arXiv, 2015.

Action-Conditional Video Prediction using Deep Networks in Atari Games, J. Oh et al.,NIPS, 2015.

Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone,arXiv, 2015.

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences, H. Mei et al.,arXiv, 2015.

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, B. C. Stadie et al.,arXiv, 2015.

Universal Value Function Approximators, T. Schaul et al.,ICML, 2015.

Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al.,ICML Workshop, 2015.

Human-level control through deep reinforcement learning, V. Mnih et al.,Nature, 2015.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, X. Guo et al.,NIPS, 2014.

Playing Atari with Deep Reinforcement Learning, V. Mnih et al.,NIPS Workshop, 2013.

Continuous Control

Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks, R. Houthooft et al.,arXiv, 2016.

Benchmarking Deep Reinforcement Learning for Continuous Control, Y. Duan et al.,ICML, 2016.

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection, S. Levine et al.,arXiv, 2016.

Continuous Deep Q-Learning with Model-based Acceleration, S. Gu et al.,ICML, 2016.

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, C. Finn et al.,arXiv, 2016.

Asynchronous Methods for Deep Reinforcement Learning, V. Mnih et al.,arXiv, 2016.

Memory-based control with recurrent neural networks, N. Heess et al.,NIPS Workshop, 2015.

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning, S. Mohamed and D. J. Rezende,arXiv, 2015.

Continuous control with deep reinforcement learning, T. P. Lillicrap et al.,ICLR, 2016.

Learning Continuous Control Policies by Stochastic Value Gradients, N. Heess et al.,NIPS, 2015.

Learning Deep Neural Network Policies with Continuous Memory States, M. Zhang et al.,arXiv, 2015.

High-Dimensional Continuous Control Using Generalized Advantage Estimation, J. Schulman et al.,ICLR, 2016.

End-to-End Training of Deep Visuomotor Policies, S. Levine et al.,arXiv, 2015.

DeepMPC: Learning Deep Latent Features for Model Predictive Control, I. Lenz, et al.,RSS, 2015.

Deterministic Policy Gradient Algorithms, D. Silver et al.,ICML, 2015.

Trust Region Policy Optimization, J. Schulman et al.,ICML, 2015.

Text Domain

Strategic Dialogue Management via Deep Reinforcement Learning, H. Cuayáhuitl et al.,NIPS Workshop, 2015.

MazeBase: A Sandbox for Learning from Games, S. Sukhbaatar et al.,arXiv, 2016.

Deep Reinforcement Learning with an Action Space Defined by Natural Language, J. He et al.,arXiv, 2015.

Generating Text with Deep Reinforcement Learning, H. Guo,arXiv, 2015.

Language Understanding for Text-based Games Using Deep Reinforcement Learning, K. Narasimhan et al.,EMNLP, 2015.

Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences, H. Mei et al.,arXiv, 2015.

Visual Domain

Model-Free Episodic Control, C. Blundell et al.,arXiv, 2016.

Deep Successor Reinforcement Learning, T. D. Kulkarni et al.,arXiv, 2016.

Unifying Count-Based Exploration and Intrinsic Motivation, M. G. Bellemare et al.,arXiv, 2016.

Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al.,ICML, 2016.

Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al.,IJCAI Deep RL Workshop, 2016.

Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks, R. Krishnamurthy et al.,arXiv, 2016.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al.,arXiv, 2016.

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection, S. Levine et al.,arXiv, 2016.

Deep Exploration via Bootstrapped DQN, I. Osband et al.,arXiv, 2016.

Value Iteration Networks, A. Tamar et al.,arXiv, 2016.

Asynchronous Methods for Deep Reinforcement Learning, V. Mnih et al.,arXiv, 2016.

Mastering the game of Go with deep neural networks and tree search, D. Silver et al.,Nature, 2016.

Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al.,AAAI, 2016.

Memory-based control with recurrent neural networks, N. Heess et al.,NIPS Workshop, 2015.

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. Fran?ois-Lavet et al.,NIPS Workshop, 2015.

Multiagent Cooperation and Competition with Deep Reinforcement Learning, A. Tampuu et al.,arXiv, 2015.

Dueling Network Architectures for Deep Reinforcement Learning, Z. Wang et al.,arXiv, 2015.

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning, E. Parisotto, et al.,ICLR, 2016.

Better Computer Go Player with Neural Network and Long-term Prediction, Y. Tian et al.,ICLR, 2016.

Policy Distillation, A. A. Rusu et at.,ICLR, 2016.

Prioritized Experience Replay, T. Schaul et al.,ICLR, 2016.

Deep Reinforcement Learning in Parameterized Action Space, M. Hausknecht et al.,ICLR, 2016.

Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control, F. Zhang et al.,arXiv, 2015.

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning, S. Mohamed and D. J. Rezende,arXiv, 2015.

Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al.,arXiv, 2015.

Continuous control with deep reinforcement learning, T. P. Lillicrap et al.,ICLR, 2016.

Giraffe: Using Deep Reinforcement Learning to Play Chess, M. Lai,arXiv, 2015.

Action-Conditional Video Prediction using Deep Networks in Atari Games, J. Oh et al.,NIPS, 2015.

Learning Continuous Control Policies by Stochastic Value Gradients, N. Heess et al.,NIPS, 2015.

Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone,arXiv, 2015.

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, B. C. Stadie et al.,arXiv, 2015.

High-Dimensional Continuous Control Using Generalized Advantage Estimation, J. Schulman et al.,ICLR, 2016.

End-to-End Training of Deep Visuomotor Policies, S. Levine et al.,arXiv, 2015.

Universal Value Function Approximators, T. Schaul et al.,ICML, 2015.

Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al.,ICML Workshop, 2015.

Trust Region Policy Optimization, J. Schulman et al.,ICML, 2015.

Human-level control through deep reinforcement learning, V. Mnih et al.,Nature, 2015.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, X. Guo et al.,NIPS, 2014.

Playing Atari with Deep Reinforcement Learning, V. Mnih et al.,NIPS Workshop, 2013.

Robotics

Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks, R. Houthooft et al.,arXiv, 2016.

Benchmarking Deep Reinforcement Learning for Continuous Control, Y. Duan et al.,ICML, 2016.

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection, S. Levine et al.,arXiv, 2016.

Continuous Deep Q-Learning with Model-based Acceleration, S. Gu et al.,ICML, 2016.

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, C. Finn et al.,arXiv, 2016.

Asynchronous Methods for Deep Reinforcement Learning, V. Mnih et al.,arXiv, 2016.

Memory-based control with recurrent neural networks, N. Heess et al.,NIPS Workshop, 2015.

Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control, F. Zhang et al.,arXiv, 2015.

Learning Continuous Control Policies by Stochastic Value Gradients, N. Heess et al.,NIPS, 2015.

Learning Deep Neural Network Policies with Continuous Memory States, M. Zhang et al.,arXiv, 2015.

High-Dimensional Continuous Control Using Generalized Advantage Estimation, J. Schulman et al.,ICLR, 2016.

End-to-End Training of Deep Visuomotor Policies, S. Levine et al.,arXiv, 2015.

DeepMPC: Learning Deep Latent Features for Model Predictive Control, I. Lenz, et al.,RSS, 2015.

Trust Region Policy Optimization, J. Schulman et al.,ICML, 2015.

Games

Model-Free Episodic Control, C. Blundell et al.,arXiv, 2016.

Safe and Efficient Off-Policy Reinforcement Learning, R. Munos et al.,arXiv, 2016.

Deep Successor Reinforcement Learning, T. D. Kulkarni et al.,arXiv, 2016.

Unifying Count-Based Exploration and Intrinsic Motivation, M. G. Bellemare et al.,arXiv, 2016.

Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al.,ICML, 2016.

Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al.,IJCAI Deep RL Workshop, 2016.

Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks, R. Krishnamurthy et al.,arXiv, 2016.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al.,arXiv, 2016.

Deep Exploration via Bootstrapped DQN, I. Osband et al.,arXiv, 2016.

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks, J. N. Foerster et al.,arXiv, 2016.

Asynchronous Methods for Deep Reinforcement Learning, V. Mnih et al.,arXiv, 2016.

Mastering the game of Go with deep neural networks and tree search, D. Silver et al.,Nature, 2016.

Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al.,AAAI, 2016.

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. Fran?ois-Lavet et al.,NIPS Workshop, 2015.

Multiagent Cooperation and Competition with Deep Reinforcement Learning, A. Tampuu et al.,arXiv, 2015.

MazeBase: A Sandbox for Learning from Games, S. Sukhbaatar et al.,arXiv, 2016.

Dueling Network Architectures for Deep Reinforcement Learning, Z. Wang et al.,arXiv, 2015.

Better Computer Go Player with Neural Network and Long-term Prediction, Y. Tian et al.,ICLR, 2016.

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning, E. Parisotto, et al.,ICLR, 2016.

Policy Distillation, A. A. Rusu et at.,ICLR, 2016.

Prioritized Experience Replay, T. Schaul et al.,ICLR, 2016.

Deep Reinforcement Learning with an Action Space Defined by Natural Language, J. He et al.,arXiv, 2015.

Deep Reinforcement Learning in Parameterized Action Space, M. Hausknecht et al.,ICLR, 2016.

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning, S. Mohamed and D. J. Rezende,arXiv, 2015.

Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al.,arXiv, 2015.

Continuous control with deep reinforcement learning, T. P. Lillicrap et al.,ICLR, 2016.

Language Understanding for Text-based Games Using Deep Reinforcement Learning, K. Narasimhan et al.,EMNLP, 2015.

Giraffe: Using Deep Reinforcement Learning to Play Chess, M. Lai,arXiv, 2015.

Action-Conditional Video Prediction using Deep Networks in Atari Games, J. Oh et al.,NIPS, 2015.

Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone,arXiv, 2015.

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, B. C. Stadie et al.,arXiv, 2015.

Universal Value Function Approximators, T. Schaul et al.,ICML, 2015.

Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al.,ICML Workshop, 2015.

Trust Region Policy Optimization, J. Schulman et al.,ICML, 2015.

Human-level control through deep reinforcement learning, V. Mnih et al.,Nature, 2015.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, X. Guo et al.,NIPS, 2014.

Playing Atari with Deep Reinforcement Learning, V. Mnih et al.,NIPS Workshop, 2013.

Monte-Carlo Tree Search

Mastering the game of Go with deep neural networks and tree search, D. Silver et al.,Nature, 2016.

Better Computer Go Player with Neural Network and Long-term Prediction, Y. Tian et al.,ICLR, 2016.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, X. Guo et al.,NIPS, 2014.

Inverse Reinforcement Learning

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, C. Finn et al.,arXiv, 2016.

Maximum Entropy Deep Inverse Reinforcement Learning, M. Wulfmeier et al.,arXiv, 2015.

Multi-Task and Transfer Learning

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning, E. Parisotto, et al.,ICLR, 2016.

Policy Distillation, A. A. Rusu et at.,ICLR, 2016.

ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources, J. Rajendran et al.,arXiv, 2015.

Universal Value Function Approximators, T. Schaul et al.,ICML, 2015.

Improving Exploration

Unifying Count-Based Exploration and Intrinsic Motivation, M. G. Bellemare et al.,arXiv, 2016.

Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks, R. Houthooft et al.,arXiv, 2016.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al.,arXiv, 2016.

Deep Exploration via Bootstrapped DQN, I. Osband et al.,arXiv, 2016.

Action-Conditional Video Prediction using Deep Networks in Atari Games, J. Oh et al.,NIPS, 2015.

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, B. C. Stadie et al.,arXiv, 2015.

Multi-Agent

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks, J. N. Foerster et al.,arXiv, 2016.

Multiagent Cooperation and Competition with Deep Reinforcement Learning, A. Tampuu et al.,arXiv, 2015.

Hierarchical Learning

Deep Successor Reinforcement Learning, T. D. Kulkarni et al.,arXiv, 2016.

Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks, R. Krishnamurthy et al.,arXiv, 2016.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al.,arXiv, 2016.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市竹挡,隨后出現(xiàn)的幾起案子驼鞭,更是在濱河造成了極大的恐慌,老刑警劉巖掌动,帶你破解...
    沈念sama閱讀 218,755評(píng)論 6 507
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異宁玫,居然都是意外死亡粗恢,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,305評(píng)論 3 395
  • 文/潘曉璐 我一進(jìn)店門欧瘪,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)眷射,“玉大人,你說(shuō)我怎么就攤上這事⊙铮” “怎么了涌庭?”我有些...
    開(kāi)封第一講書人閱讀 165,138評(píng)論 0 355
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)嗅绸。 經(jīng)常有香客問(wèn)我脾猛,道長(zhǎng),這世上最難降的妖魔是什么鱼鸠? 我笑而不...
    開(kāi)封第一講書人閱讀 58,791評(píng)論 1 295
  • 正文 為了忘掉前任猛拴,我火速辦了婚禮,結(jié)果婚禮上蚀狰,老公的妹妹穿的比我還像新娘愉昆。我一直安慰自己,他們只是感情好麻蹋,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,794評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布跛溉。 她就那樣靜靜地躺著,像睡著了一般扮授。 火紅的嫁衣襯著肌膚如雪芳室。 梳的紋絲不亂的頭發(fā)上,一...
    開(kāi)封第一講書人閱讀 51,631評(píng)論 1 305
  • 那天刹勃,我揣著相機(jī)與錄音堪侯,去河邊找鬼。 笑死荔仁,一個(gè)胖子當(dāng)著我的面吹牛伍宦,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播乏梁,決...
    沈念sama閱讀 40,362評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼次洼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來(lái)了遇骑?” 一聲冷哼從身側(cè)響起卖毁,我...
    開(kāi)封第一講書人閱讀 39,264評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎落萎,沒(méi)想到半個(gè)月后势篡,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,724評(píng)論 1 315
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡模暗,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,900評(píng)論 3 336
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了念祭。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片兑宇。...
    茶點(diǎn)故事閱讀 40,040評(píng)論 1 350
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖粱坤,靈堂內(nèi)的尸體忽然破棺而出隶糕,到底是詐尸還是另有隱情瓷产,我是刑警寧澤,帶...
    沈念sama閱讀 35,742評(píng)論 5 346
  • 正文 年R本政府宣布枚驻,位于F島的核電站濒旦,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏再登。R本人自食惡果不足惜尔邓,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,364評(píng)論 3 330
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望锉矢。 院中可真熱鬧梯嗽,春花似錦、人聲如沸沽损。這莊子的主人今日做“春日...
    開(kāi)封第一講書人閱讀 31,944評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)绵估。三九已至炎疆,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間国裳,已是汗流浹背形入。 一陣腳步聲響...
    開(kāi)封第一講書人閱讀 33,060評(píng)論 1 270
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留躏救,地道東北人唯笙。 一個(gè)月前我還...
    沈念sama閱讀 48,247評(píng)論 3 371
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像盒使,于是被迫代替她去往敵國(guó)和親崩掘。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,979評(píng)論 2 355

推薦閱讀更多精彩內(nèi)容

  • 我不爭(zhēng)不代表我不想要 我不哭不代表我不難過(guò) 我只是習(xí)慣了你們的不在意少办,習(xí)慣被你們忽視…… 我無(wú)數(shù)次的告...
    百變小叮鐺閱讀 244評(píng)論 0 3
  • 中午吃飯的時(shí)候苞慢,我們幾個(gè)同事不知道從什么話題引到了明星后臺(tái)的話題上。說(shuō)起了馬思純英妓, “她應(yīng)該是一個(gè)不張揚(yáng)的人挽放,從不...
    煙花雨蕁閱讀 401評(píng)論 0 1
  • 最近在給公司沒(méi)上線的項(xiàng)目進(jìn)行xcode7無(wú)證書打包測(cè)試時(shí),偶爾會(huì)出現(xiàn)打包幾天后或一兩個(gè)月內(nèi)點(diǎn)擊APP閃退的情況蔓纠。...
    生產(chǎn)八哥閱讀 341評(píng)論 0 0