Terminal-Bench: a benchmark for AI agents in terminal environments