Many SWE-bench-Passing PRs would not be merged

We are changing our developer productivity experiment design

Measuring AI Ability to Complete Long Tasks

Measuring AI Ability to Complete Long Tasks

Measuring the impact of AI on experienced open-source developer productivity

Measuring the Impact of AI on Experienced Open-Source Developer Productivity [pdf]