YSDA RL course: https://github.com/yandexdataschool/Practical_RL

Многорукий бандит: https://docs.google.com/presentation/d/1LGC4nuzd4_MjNsmZsER7jffFmnOar7lpZF2X-Y46uNE/edit?usp=sharing

RL for Sequence Models / Self-Critical Sequence Training: https://docs.google.com/presentation/d/1ubeRDDjOYo7LoFnXoNGDxCAMC5kszSQlRXEaWwBoMl8/edit?usp=sharing

PyTorch implementation of SCST for image captioning: https://github.com/ruotianluo/self-critical.pytorch

Обзор статьи про CodeRL:

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

Tasks