Show HN: RULER – Easily apply RL to any agent

Using GRPO to Beat o1, o3-mini and R1 at “Temporal Clue”

Using reinforcement learning and $4.80 of GPU time to find the best HN post

OpenPipe Mixture of Agents: Outperform GPT-4 at 1/25th the Cost

Mistral 7B Fine-Tune Optimized

Is AI the next crypto? Insights from HN comments