DeepSWE: A contamination-free benchmark for long-horizon coding agents

(deepswe.datacurve.ai)

62 points | by ammar_x 3 days ago ago

22 comments