TY - GEN
T1 - Blending autonomous exploration and apprenticeship learning
AU - Walsh, Thomas J.
AU - Hewlett, Daniel
AU - Morrison, Clayton T.
PY - 2011
Y1 - 2011
N2 - We present theoretical and empirical results for a framework that combines the benefits of apprenticeship and autonomous reinforcement learning. Our approach modifies an existing apprenticeship learning framework that relies on teacher demonstrations and does not necessarily explore the environment. The first change is replacing previously used Mistake Bound model learners with a recently proposed framework that melds the KWIK and Mistake Bound supervised learning protocols. The second change is introducing a communication of expected utility from the student to the teacher. The resulting system only uses teacher traces when the agent needs to learn concepts it cannot efficiently learn on its own.
AB - We present theoretical and empirical results for a framework that combines the benefits of apprenticeship and autonomous reinforcement learning. Our approach modifies an existing apprenticeship learning framework that relies on teacher demonstrations and does not necessarily explore the environment. The first change is replacing previously used Mistake Bound model learners with a recently proposed framework that melds the KWIK and Mistake Bound supervised learning protocols. The second change is introducing a communication of expected utility from the student to the teacher. The resulting system only uses teacher traces when the agent needs to learn concepts it cannot efficiently learn on its own.
UR - http://www.scopus.com/inward/record.url?scp=84860644652&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84860644652&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84860644652
SN - 9781618395993
T3 - Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
BT - Advances in Neural Information Processing Systems 24
T2 - 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
Y2 - 12 December 2011 through 14 December 2011
ER -