Reinforcement Learning1 Offline RL Without Off-Policy Evaluation, NIPS’21 논문 링크 : https://proceedings.neurips.cc/paper_files/paper/2021/file/274a10ffa06e434f2a94df765cac6bf4-Paper.pdfAbstractMost prior approaches to offline reinforcement learning (RL) have taken an iterative actor-critic approach involving off-policy evaluation. In this paper we show that simply doing one step of constrained/regularized policy improvement using an on-policy Q estimate of the behavior .. 2023. 7. 21. 이전 1 다음