Training a Rust 1.5B Coder LM with Reinforcement Learning (GRPO)