Even (very) noisy LLM evaluators are useful for improving AI agents

Fine-tuned small LLMs can beat large ones with programmatic data curation

Reverse Engineering Cursor's LLM Client