본문 바로가기

강화학습

cal q learning

https://arxiv.org/abs/2303.05479

 

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization from existing datasets followed by fast online fine-tuning with limited interaction. However, existing offline RL methods tend to behave poorly during fine-tu

arxiv.org

soft actor critic은 자신을 너무 과대 평가하고

CQL은 자신을 너무 과소 평가한다.

baseline을 만들어서 이를 활용하는 방식ㄷ