Experimentation Matters: Why Nuenki isn't using pairwise evaluations

Evaluating the difficulty of a sentence in mere microseconds

Which LLMs are best at low-latency translation?