ts-bench

AI coding agent benchmark — SWE-Lancer & Exercism TypeScript tasks

Last updated: 2026-04-07
Tier List
Historical Runs
Task Breakdown
AgentModelProviderTierSolvedTotal TimeDate
GitHub Copilot CLIclaude-opus-4.6anthropicC2/51.1h2026-04-07Details
GitHub Copilot CLIclaude-sonnet-4.6anthropicA4/51.1h2026-04-07Details
Claude Codeclaude-opus-4-6anthropicD1/531.2m2026-04-05Details
Gemini CLIgemini-2.5-flashgoogleC2/518.8m2026-04-05Details
Codex CLIgpt-5.4openaiB3/526.2m2026-04-05Details
Codex CLIgpt-5.4-miniopenaiF0/528.1m2026-04-05Details
Gemini CLIgemini-3.1-pro-previewgoogleB3/51.2h2026-04-05Details
Gemini CLIgemini-3-flash-previewgoogleD1/553.4m2026-04-05Details
Claude Codeclaude-haiku-4-5openaiD1/534.4m2026-04-04Details
Claude Codeclaude-sonnet-4-6openaiB3/548.1m2026-04-04Details
Agent / Model1495815815_1151931426820079
GitHub Copilot CLI
claude-sonnet-4.6
PassPassFailPassPass
Codex CLI
gpt-5.4
PassPassFailFailPass
Claude Code
claude-sonnet-4-6
PassFailFailPassPass
Gemini CLI
gemini-3.1-pro-preview
PassPassFailPassFail
Gemini CLI
gemini-2.5-flash
PassPassFailFailFail
GitHub Copilot CLI
claude-opus-4.6
FailFailFailPassPass
Claude Code
claude-opus-4-6
FailFailFailPassFail
Claude Code
claude-haiku-4-5
PassFailFailFailFail
Gemini CLI
gemini-3-flash-preview
FailFailFailPassFail
Codex CLI
gpt-5.4-mini
FailFailFailFailFail