TikTok interview question

Describe GRPO loss and other RL algorithm