YSDA RL course: https://github.com/yandexdataschool/Practical_RL
Многорукий бандит: https://docs.google.com/presentation/d/1LGC4nuzd4_MjNsmZsER7jffFmnOar7lpZF2X-Y46uNE/edit?usp=sharing
RL for Sequence Models / Self-Critical Sequence Training: https://docs.google.com/presentation/d/1ubeRDDjOYo7LoFnXoNGDxCAMC5kszSQlRXEaWwBoMl8/edit?usp=sharing
PyTorch implementation of SCST for image captioning: https://github.com/ruotianluo/self-critical.pytorch
Обзор статьи про CodeRL:
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning