AI coding agent benchmark — SWE-Lancer & Exercism TypeScript tasks
| Agent | Model | Provider | Tier | Solved | Total Time | Date | |
|---|---|---|---|---|---|---|---|
| GitHub Copilot CLI | claude-opus-4.6 | anthropic | C | 2/5 | 1.1h | 2026-04-07 | Details |
| GitHub Copilot CLI | claude-sonnet-4.6 | anthropic | A | 4/5 | 1.1h | 2026-04-07 | Details |
| Claude Code | claude-opus-4-6 | anthropic | D | 1/5 | 31.2m | 2026-04-05 | Details |
| Gemini CLI | gemini-2.5-flash | C | 2/5 | 18.8m | 2026-04-05 | Details | |
| Codex CLI | gpt-5.4 | openai | B | 3/5 | 26.2m | 2026-04-05 | Details |
| Codex CLI | gpt-5.4-mini | openai | F | 0/5 | 28.1m | 2026-04-05 | Details |
| Gemini CLI | gemini-3.1-pro-preview | B | 3/5 | 1.2h | 2026-04-05 | Details | |
| Gemini CLI | gemini-3-flash-preview | D | 1/5 | 53.4m | 2026-04-05 | Details | |
| Claude Code | claude-haiku-4-5 | openai | D | 1/5 | 34.4m | 2026-04-04 | Details |
| Claude Code | claude-sonnet-4-6 | openai | B | 3/5 | 48.1m | 2026-04-04 | Details |
| Agent / Model | 14958 | 15815_1 | 15193 | 14268 | 20079 |
|---|---|---|---|---|---|
| GitHub Copilot CLI claude-sonnet-4.6 | Pass | Pass | Fail | Pass | Pass |
| Codex CLI gpt-5.4 | Pass | Pass | Fail | Fail | Pass |
| Claude Code claude-sonnet-4-6 | Pass | Fail | Fail | Pass | Pass |
| Gemini CLI gemini-3.1-pro-preview | Pass | Pass | Fail | Pass | Fail |
| Gemini CLI gemini-2.5-flash | Pass | Pass | Fail | Fail | Fail |
| GitHub Copilot CLI claude-opus-4.6 | Fail | Fail | Fail | Pass | Pass |
| Claude Code claude-opus-4-6 | Fail | Fail | Fail | Pass | Fail |
| Claude Code claude-haiku-4-5 | Pass | Fail | Fail | Fail | Fail |
| Gemini CLI gemini-3-flash-preview | Fail | Fail | Fail | Pass | Fail |
| Codex CLI gpt-5.4-mini | Fail | Fail | Fail | Fail | Fail |