rhysd/kiro-editor — ProgramBench

← Back to leaderboard · Show all task instances

rhysd/kiro-editor

A small terminal UTF-8 text editor written in Rust 📝🦀

595

Generated Behavioral Tests

95.3%

Best Score

Results by Model

#		Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1		GPT 5.5 (xhigh) OpenAI	95.3%	$9.89	82
2		Claude Opus 4.6 Anthropic	93.3%	$15.07	329
3		GPT 5.5 (high) OpenAI	90.1%	$3.54	36
4		Claude Sonnet 4.6 Anthropic	86.1%	$33.62	628
5		Claude Opus 4.7 (xhigh) Anthropic	82.2%	$2.60	89
6		GPT 5.5 OpenAI	80.5%	$1.09	19
7		Gemini 3 Flash Google	69.7%	$0.35	64
8		Gemini 3.1 Pro Google	65.9%	$0.84	45
9		Claude Opus 4.7 Anthropic	40.7%	$0.71	46
10		Claude Haiku 4.5 Anthropic	36.1%	$0.87	115
11		GPT 5.4 OpenAI	34.3%	$0.13	9
12		GPT 5.4 mini OpenAI	26.1%	$0.04	8
13		GPT 5 mini OpenAI	13.6%	$0.01	15

Click row to see model details