View Code? Open in Web Editor
NEW
This project forked from eleurent/phd-bibliography
References on Optimal Control, Reinforcement Learning and Motion Planning
phd-bibliography's Introduction
ExpectiMinimax
Optimal strategy in games with chance nodes, Melkó E., Nagy B. (2007).
Sparse sampling
A sparse sampling algorithm for near-optimal planning in large Markov decision processes, Kearns M. et al. (2002).
MCTS
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Rémi Coulom, SequeL (2006).
UCT
Bandit based Monte-Carlo Planning, Kocsis L., Szepesvári C. (2006).
OPD
Optimistic Planning for Deterministic Systems, Hren J., Munos R. (2008).
OLOP
Open Loop Optimistic Planning, Bubeck S., Munos R. (2010).
LGP
Logic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning, Toussaint M. (2015). 🎞️
AlphaGo
Mastering the game of Go with deep neural networks and tree search, Silver D. et al. (2016).
AlphaGo Zero
Mastering the game of Go without human knowledge, Silver D. et al. (2017).
AlphaZero
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Silver D. et al. (2017).
TrailBlazer
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning, Grill J. B., Valko M., Munos R. (2017).
MCTSnets
Learning to search with MCTSnets, Guez A. et al. (2018).
ADI
Solving the Rubik's Cube Without Human Knowledge, McAleer S. et al. (2018).
- Minimax analysis of stochastic problems, Shapiro A., Kleywegt A. (2002).
Robust DP
Robust Dynamic Programming, Iyengar G. (2005).
- Robust Planning and Optimization, Laumanns M. (2011). (lecture notes)
- Robust Markov Decision Processes, Wiesemann W., Kuhn D., Rustem B. (2012).
Coarse-Id
On the Sample Complexity of the Linear Quadratic Regulator, Dean S., Mania H., Matni N., Recht B., Tu S. (2017).
Tube-MPPI
Robust Sampling Based Model Predictive Control with Sparse Objective Information, Williams G. et al. (2018). 🎞️
Uncertain Dynamical Systems
UCB1/UCB2
Finite-time Analysis of the Multiarmed Bandit Problem, Auer P., Cesa-Bianchi N., Fischer P. (2002).
kl-UCB
The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond, Garivier A., Cappé O. (2011).
KL-UCB
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, Cappé O. et al. (2013).
LUCB
PAC Subset Selection in Stochastic Multi-armed Bandits, Kalyanakrishnan S. et al. (2012).
Track-and-Stop
Optimal Best Arm Identification with Fixed Confidence, Garivier A., Kaufmann E. (2016).
M-LUCB/M-Racing
Maximin Action Identification: A New Bandit Framework for Games, Garivier A., Kaufmann E., Koolen W. (2016).
LUCB-micro
Structured Best Arm Identification with Fixed Confidence, Huang R. et al. (2017).
- Bayesian Optimization in AlphaGo, Chen Y. et al. (2018)
Dyna
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming, Sutton R. (1990).
UCRL2
Near-optimal Regret Bounds for Reinforcement Learning, Jaksch T. (2010).
PILCO
PILCO: A Model-Based and Data-Efficient Approach to Policy Search, Deisenroth M., Rasmussen C. (2011). (talk)
DBN
Probabilistic MDP-behavior planning for cars, Brechtel S. et al. (2011).
GPS
End-to-End Training of Deep Visuomotor Policies, Levine S. et al. (2015). 🎞️
DeepMPC
DeepMPC: Learning Deep Latent Features for Model Predictive Control, Lenz I. et al. (2015). 🎞️
SVG
Learning Continuous Control Policies by Stochastic Value Gradients, Heess N. et al. (2015). 🎞️
- Optimal control with learned local models: Application to dexterous manipulation, Kumar V. et al. (2016). 🎞️
BPTT
Long-term Planning by Short-term Prediction, Shalev-Shwartz S. et al. (2016). 🎞️ 1 | 2
- Deep visual foresight for planning robot motion, Finn C., Levine S. (2016). 🎞️
VIN
Value Iteration Networks, Tamar A. et al (2016). 🎞️
VPN
Value Prediction Network, Oh J. et al. (2017).
- An LSTM Network for Highway Trajectory Prediction, Altché F., de La Fortelle A. (2017).
DistGBP
Model-Based Planning with Discrete and Continuous Actions, Henaff M. et al. (2017). 🎞️ 1 | 2
- Prediction and Control with Temporal Segment Models, Mishra N. et al. (2017).
Predictron
The Predictron: End-To-End Learning and Planning, Silver D. et al. (2017). 🎞️
MPPI
Information Theoretic MPC for Model-Based Reinforcement Learning, Williams G. et al. (2017). 🎞️
- Learning Real-World Robot Policies by Dreaming, Piergiovanni A. et al. (2018).
- Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems, Albrecht S., Stone P. (2017).
MILP
Time-optimal coordination of mobile robots along specified paths, Altché F. et al. (2016). 🎞️
MIQP
An Algorithm for Supervised Driving of Cooperative Semi-Autonomous Vehicles, Altché F. et al. (2017). 🎞️
SA-CADRL
Socially Aware Motion Planning with Deep Reinforcement Learning, Chen Y. et al. (2017). 🎞️
- Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment, Galceran E. et al. (2017).
- Online decision-making for scalable autonomous systems, Wray K. et al. (2017).
MAgent
MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence, Zheng L. et al. (2017). 🎞️
- Cooperative Motion Planning for Non-Holonomic Agents with Value Iteration Networks, Rehder E. et al. (2017).
COMA
Counterfactual Multi-Agent Policy Gradients, Foerster J. et al. (2017).
FTW
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, Jaderberg M. et al. (2018). 🎞️
DeepDriving
DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving, Chen C. et al. (2015). 🎞️
- On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training, Shalev-Shwartz S. et al. (2016).
- Learning sparse representations in reinforcement learning with sparse coding, Le L., Kumaraswamy M., White M. (2017).
- World Models, Ha D., Schmidhuber J. (2018). 🎞️
- Learning to Drive in a Day, Kendall A. et al. (2018). 🎞️
MERLIN
Unsupervised Predictive Memory in a Goal-Directed Agent, Wayne G. et al. (2018). 🎞️ 1 | 2 | 3 | 4 | 5 | 6
- Variational End-to-End Navigation and Localization, Amini A. et al. (2018). 🎞️
- Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks, Lee M. et al. (2018). 🎞️
Learning from Demonstrations
QMDP-RCNN
Reinforcement Learning via Recurrent Convolutional Neural Networks, Shankar T. et al. (2016). (talk)
DQfD
Learning from Demonstrations for Real World Reinforcement Learning, Hester T. et al. (2017). 🎞️
- Find Your Own Way: Weakly-Supervised Segmentation of Path Proposals for Urban Autonomy, Barnes D., Maddern W., Posner I. (2016). 🎞️
GAIL
Generative Adversarial Imitation Learning, Ho J., Ermon S. (2016).
- From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots, Pfeiffer M. et al. (2017). 🎞️
Branched
End-to-end Driving via Conditional Imitation Learning, Codevilla F. et al. (2017). 🎞️ | talk
UPN
Universal Planning Networks, Srinivas A. et al. (2018). 🎞️
DeepMimic
DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills, Peng X. B. et al. (2018). 🎞️
R2P2
Deep Imitative Models for Flexible Inference, Planning, and Control, Rhinehart N. et al. (2018). 🎞️
Applications to Autonomous Driving
Inverse Reinforcement Learning
Projection
Apprenticeship learning via inverse reinforcement learning, Abbeel P., Ng A. (2004).
MMP
Maximum margin planning, Ratliff N. et al. (2006).
BIRL
Bayesian inverse reinforcement learning, Ramachandran D., Amir E. (2007).
MEIRL
Maximum Entropy Inverse Reinforcement Learning, Ziebart B. et al. (2008).
LEARCH
Learning to search: Functional gradient techniques for imitation learning, Ratliff N., Siver D. Bagnell A. (2009).
CIOC
Continuous Inverse Optimal Control with Locally Optimal Examples, Levine S., Koltun V. (2012). 🎞️
MEDIRL
Maximum Entropy Deep Inverse Reinforcement Learning, Wulfmeier M. (2015).
GCL
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, Finn C. et al. (2016). 🎞️
RIRL
Repeated Inverse Reinforcement Learning, Amin K. et al. (2017).
- Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning, Piot B. et al. (2017).
Applications to Autonomous Driving
- Apprenticeship Learning for Motion Planning, with Application to Parking Lot Navigation, Abbeel P. et al. (2008).
- Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior, Ziebart B. et al. (2008).
- Planning-based Prediction for Pedestrians, Ziebart B. et al. (2009). 🎞️
- Learning for autonomous navigation, Bagnell A. et al. (2010).
- Learning Autonomous Driving Styles and Maneuvers from Expert Demonstration, Silver D. et al. (2012).
- Learning Driving Styles for Autonomous Vehicles from Demonstration, Kuderer M. et al. (2015).
- Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks, Sharifzadeh S. et al. (2016).
- Watch This: Scalable Cost-Function Learning for Path Planning in Urban Environments, Wulfmeier M. (2016). 🎞️
- Planning for Autonomous Cars that Leverage Effects on Human Actions, Sadigh D. et al. (2016).
- A Learning-Based Framework for Handling Dilemmas in Urban Automated Driving, Lee S., Seo S. (2017).
Dijkstra
A Note on Two Problems in Connexion with Graphs, Dijkstra E. W. (1959).
A*
A Formal Basis for the Heuristic Determination of Minimum Cost Paths , Hart P. et al. (1968).
- Planning Long Dynamically-Feasible Maneuvers For Autonomous Vehicles, Likhachev M., Ferguson D. (2008).
- Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenet Frame, Werling M., Kammel S. (2010). 🎞️
- 3D perception and planning for self-driving and cooperative automobiles, Stiller C., Ziegler J. (2012).
- Motion Planning under Uncertainty for On-Road Autonomous Driving, Xu W. et al. (2014).
- Monte Carlo Tree Search for Simulated Car Racing, Fischer J. et al. (2015). 🎞️
Architecture and applications
phd-bibliography's People
Contributors
Watchers