gromacs/gromacs — ProgramBench

← Back to leaderboard · Show all task instances

Public/backup repository of the GROMACS molecular simulation toolkit. Please do not mine the metadata blindly; we use https://gitlab.com/gromacs/gromacs for code review and issue tracking.

901 cpp

1,245

Generated Behavioral Tests

9.3%

Best Score

Results by Model

#	Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1	Gemini 3.1 Pro Google	9.3%	$1.82	100
2	GPT 5.5 (high) OpenAI	7.1%	$5.07	51
3	Claude Sonnet 4.6 Anthropic	5.0%	$32.36	562
4	Claude Opus 4.7 Anthropic	4.4%	$6.17	153
5	Claude Opus 4.6 Anthropic	4.3%	$14.33	298
6	GPT 5.5 OpenAI	4.3%	$0.80	11
7	GPT 5.4 OpenAI	3.2%	$0.32	9
8	GPT 5.5 (xhigh) OpenAI	3.1%	$7.86	62
9	GPT 5.4 mini OpenAI	2.3%	$0.04	8
10	Claude Haiku 4.5 Anthropic	1.4%	$1.04	97
11	Claude Opus 4.7 (xhigh) Anthropic	1.1%	$7.41	151
12	GPT 5 mini OpenAI	0.9%	$0.02	11
13	Gemini 3 Flash Google	0.0%	$0.34	71

Click row to see model details