quinn-rs/quinn — ProgramBench

← Back to leaderboard · Show all task instances

Async-friendly QUIC implementation in Rust

522

Generated Behavioral Tests

84.5%

Best Score

Results by Model

#		Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1		GPT 5.5 (xhigh) OpenAI	84.5%	$3.83	40
2		GPT 5.5 (high) OpenAI	73.0%	$3.09	34
3		Claude Opus 4.7 (xhigh) Anthropic	63.4%	$5.00	117
4		GPT 5.5 OpenAI	62.3%	$1.31	18
5		Claude Opus 4.6 Anthropic	61.7%	$7.44	147
6		Claude Sonnet 4.6 Anthropic	57.3%	$16.65	375
7		Gemini 3.1 Pro Google	56.5%	$1.99	88
8		Claude Opus 4.7 Anthropic	53.8%	$2.27	54
9		GPT 5.4 OpenAI	46.9%	$0.31	10
10		Gemini 3 Flash Google	26.2%	$0.32	88
11		Claude Haiku 4.5 Anthropic	26.2%	$0.51	92
12		GPT 5.4 mini OpenAI	20.9%	$0.04	9
13		GPT 5 mini OpenAI	19.3%	$0.02	9

Click row to see model details