segmentio/chamber — ProgramBench

← Back to leaderboard · Show all task instances

segmentio/chamber

CLI for managing secrets

2,588 go medium

1,748

Generated Behavioral Tests

88.0%

Best Score

Results by Model

#		Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1		GPT 5.5 (xhigh) OpenAI	88.0%	$15.51	99
2		Claude Opus 4.6 Anthropic	82.0%	$11.99	232
3		Claude Opus 4.7 (xhigh) Anthropic	79.3%	$7.80	140
4		GPT 5.5 (high) OpenAI	79.0%	$3.86	46
5		GPT 5.5 OpenAI	71.3%	$1.20	18
6		Claude Sonnet 4.6 Anthropic	62.1%	$14.17	359
7		GPT 5.4 OpenAI	56.2%	$0.24	9
8		Claude Haiku 4.5 Anthropic	43.0%	$0.78	117
9		Gemini 3 Flash Google	40.3%	$0.21	77
10		GPT 5 mini OpenAI	20.9%	$0.04	24
11		Gemini 3.1 Pro Google	19.9%	$1.34	66
12		GPT 5.4 mini OpenAI	14.5%	$0.03	9
13		Claude Opus 4.7 Anthropic	9.4%	$3.95	92

Click row to see model details