yoav-lavi/melody — ProgramBench

← Back to leaderboard · Show all task instances

yoav-lavi/melody

Melody is a language that compiles to regular expressions and aims to be more readable and maintainable

4,748 rs medium

1,205

Generated Behavioral Tests

87.4%

Best Score

Results by Model

#		Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1		Claude Opus 4.7 (xhigh) Anthropic	87.4%	$17.22	215
2		GPT 5.5 (xhigh) OpenAI	84.9%	$4.22	56
3		Claude Opus 4.7 Anthropic	78.9%	$5.37	135
4		Claude Opus 4.6 Anthropic	74.4%	$12.05	325
5		GPT 5.5 (high) OpenAI	71.5%	$3.79	54
6		GPT 5.5 OpenAI	62.3%	$1.25	21
7		Gemini 3 Flash Google	62.3%	$0.28	138
8		GPT 5.4 OpenAI	34.2%	$0.45	19
9		Claude Haiku 4.5 Anthropic	31.4%	$0.98	131
10		GPT 5.4 mini OpenAI	6.4%	$0.03	9
11		Gemini 3.1 Pro Google	6.1%	$2.10	203
12		Claude Sonnet 4.6 Anthropic	3.4%	$16.67	502
13		GPT 5 mini OpenAI	3.4%	$0.04	25

Click row to see model details