https://arxiv.org/abs/2303.05479
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization from existing datasets followed by fast online fine-tuning with limited interaction. However, existing offline RL methods tend to behave poorly during fine-tu
arxiv.org
soft actor critic은 자신을 너무 과대 평가하고
CQL은 자신을 너무 과소 평가한다.
baseline을 만들어서 이를 활용하는 방식ㄷ
'강화학습' 카테고리의 다른 글
Diffusion for World Modeling: Visual Details Matter in Atari (3) | 2024.11.30 |
---|---|
Voyager: An Open-Ended Embodied Agent with Large Language Models (0) | 2024.02.05 |
Chapter 12. Model-based Reinforcement Learning (0) | 2023.06.05 |
Chapter 11. Imitation Learning (0) | 2023.06.02 |
Chapter 10. Exploration (1) | 2023.05.30 |