lz4/lz4 — ProgramBench

← Back to leaderboard · Show all task instances

Extremely Fast Compression algorithm

11,781 c medium

1,496

Generated Behavioral Tests

87.9%

Best Score

Results by Model

#		Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1		GPT 5.5 (xhigh) OpenAI	87.9%	$8.10	84
2		GPT 5.5 (high) OpenAI	82.7%	$2.92	42
3		Claude Sonnet 4.6 Anthropic	82.7%	$25.44	501
4		Claude Opus 4.7 (xhigh) Anthropic	80.6%	$4.56	100
5		Claude Opus 4.7 Anthropic	80.0%	$5.56	155
6		Claude Opus 4.6 Anthropic	78.7%	$9.86	216
7		GPT 5.5 OpenAI	74.5%	$0.88	15
8		GPT 5.4 OpenAI	46.6%	$0.24	10
9		Gemini 3.1 Pro Google	36.4%	$0.84	57
10		Claude Haiku 4.5 Anthropic	22.1%	$0.84	97
11		GPT 5 mini OpenAI	14.4%	$0.01	9
12		GPT 5.4 mini OpenAI	9.6%	$0.04	10
13		Gemini 3 Flash Google	5.2%	$0.20	46

Click row to see model details