Artificial Analysis Coding Agent Benchmarks

Artificial Analysis' Coding Agent Benchmarks is a good addition to their benchmarks. It shows the influence of harnesses, which is significant (e.g. Opus 4.7 medium in different harnesses below). It also shows the work Cursor have done with theirs!

I wish there was a bit more variety, however:

Additional harnesses, e.g. Pi
Additional models: e.g., MiMo, Grok

But, great start!

LinkedIn post