universal-ctags/ctags — ProgramBench

← Back to leaderboard · Show all task instances

universal-ctags/ctags

A maintained ctags implementation

2,258

Generated Behavioral Tests

13.3%

Best Score

Results by Model

#		Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1		Claude Sonnet 4.6 Anthropic	13.3%	$58.28	659
2		GPT 5.5 OpenAI	9.3%	$0.92	13
3		Claude Opus 4.7 (xhigh) Anthropic	8.2%	$14.35	173
4		GPT 5.5 (high) OpenAI	4.4%	$3.65	31
5		GPT 5.4 OpenAI	3.6%	$0.20	8
6		Gemini 3 Flash Google	1.7%	$0.27	58
7		Gemini 3.1 Pro Google	1.6%	$1.44	118
8		Claude Haiku 4.5 Anthropic	1.0%	$0.71	110
9		GPT 5 mini OpenAI	0.9%	$0.03	16
10		Claude Opus 4.7 Anthropic	0.5%	$0.29	22
11		GPT 5.4 mini OpenAI	0.4%	$0.03	16
12		GPT 5.5 (xhigh) OpenAI	0.0%	$3.75	34
13		Claude Opus 4.6 Anthropic	0.0%	$2.49	226

Click row to see model details