Why SWE-bench Verified no longer measures frontier coding capabilities

(openai.com)

7 points | by tedsanders a day ago ago

No comments yet.