sirwart/ripsecrets — ProgramBench

← Back to leaderboard · Show all task instances

sirwart/ripsecrets

A command-line tool to prevent committing secret keys into your source code

611

Generated Behavioral Tests

91.2%

Best Score

Results by Model

#		Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1		GPT 5.5 (high) OpenAI	91.2%	$4.05	52
2		GPT 5.5 OpenAI	88.7%	$0.74	16
3		GPT 5.5 (xhigh) OpenAI	87.7%	$6.58	77
4		Claude Opus 4.7 (xhigh) Anthropic	77.4%	$15.13	210
5		Claude Opus 4.7 Anthropic	72.8%	$7.44	156
6		Claude Opus 4.6 Anthropic	63.8%	$16.13	268
7		Gemini 3 Flash Google	59.9%	$0.39	129
8		GPT 5.4 OpenAI	46.3%	$0.24	12
9		GPT 5 mini OpenAI	42.9%	$0.01	16
10		Gemini 3.1 Pro Google	19.6%	$2.14	115
11		Claude Haiku 4.5 Anthropic	18.0%	$1.51	186
12		Claude Sonnet 4.6 Anthropic	7.4%	$65.81	845
13		GPT 5.4 mini OpenAI	0.0%	$0.04	10

Click row to see model details