cheat/cheat — ProgramBench

← Back to leaderboard · Show all task instances

cheat allows you to create and view interactive cheatsheets on the command-line. It was designed to help remind *nix system administrators of options for commands that they use frequently, but not frequently enough to remember.

13,278 go

297

Generated Behavioral Tests

71.7%

Best Score

Results by Model

#	Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1	GPT 5.5 (xhigh) OpenAI	71.7%	$9.93	81
2	GPT 5.5 (high) OpenAI	66.7%	$4.69	60
3	Claude Opus 4.6 Anthropic	59.9%	$5.80	198
4	GPT 5.5 OpenAI	57.6%	$0.95	14
5	Claude Sonnet 4.6 Anthropic	50.5%	$14.36	373
6	Claude Opus 4.7 Anthropic	49.2%	$3.11	86
7	GPT 5.4 OpenAI	45.1%	$0.40	10
8	Gemini 3.1 Pro Google	22.9%	$1.20	70
9	Claude Haiku 4.5 Anthropic	22.6%	$0.91	159
10	Claude Opus 4.7 (xhigh) Anthropic	14.1%	$16.68	242
11	GPT 5.4 mini OpenAI	9.4%	$0.04	7
12	GPT 5 mini OpenAI	5.7%	$0.03	14
13	Gemini 3 Flash Google	3.0%	$0.33	110

Click row to see model details