mgechev/revive — ProgramBench

← Back to leaderboard · Show all task instances

🔥 ~6x faster, stricter, configurable, extensible, and beautiful drop-in replacement for golint

5,486 go medium

727

Generated Behavioral Tests

48.3%

Best Score

Results by Model

#		Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1		GPT 5.5 (high) OpenAI	48.3%	$3.70	56
2		GPT 5.5 OpenAI	47.3%	$1.24	22
3		Claude Opus 4.7 (xhigh) Anthropic	46.4%	$2.67	70
4		Claude Opus 4.6 Anthropic	46.4%	$10.31	254
5		GPT 5.5 (xhigh) OpenAI	45.9%	$6.53	64
6		Claude Opus 4.7 Anthropic	43.1%	$2.52	62
7		Claude Sonnet 4.6 Anthropic	34.1%	$17.26	476
8		Claude Haiku 4.5 Anthropic	25.9%	$0.55	149
9		Gemini 3 Flash Google	25.0%	$0.24	61
10		Gemini 3.1 Pro Google	24.8%	$1.12	75
11		GPT 5.4 OpenAI	21.3%	$0.38	15
12		GPT 5.4 mini OpenAI	6.2%	$0.02	8
13		GPT 5 mini OpenAI	4.8%	$0.01	8

Click row to see model details