abishekvashok/cmatrix — ProgramBench

← Back to leaderboard · Show all task instances

abishekvashok/cmatrix

Terminal based "The Matrix" like implementation

507

Generated Behavioral Tests

100.0%

Best Score

Results by Model

#		Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1		GPT 5.5 (xhigh) OpenAI	100.0%	$4.84	40
2		GPT 5.5 (high) OpenAI	100.0%	$3.17	34
3		GPT 5.5 OpenAI	99.2%	$1.04	17
4		Claude Opus 4.6 Anthropic	97.0%	$5.99	139
5		Claude Opus 4.7 (xhigh) Anthropic	96.3%	$10.74	178
6		Claude Sonnet 4.6 Anthropic	95.9%	$13.49	292
7		Claude Opus 4.7 Anthropic	94.9%	$4.44	130
8		Gemini 3.1 Pro Google	93.9%	$2.94	84
9		GPT 5.4 OpenAI	91.7%	$0.37	8
10		Claude Haiku 4.5 Anthropic	85.6%	$0.44	72
11		GPT 5.4 mini OpenAI	79.7%	$0.03	28
12		GPT 5 mini OpenAI	72.2%	$0.01	7
13		Gemini 3 Flash Google	65.5%	$0.17	51

Click row to see model details