facebookresearch/fastText — ProgramBench

← Back to leaderboard · Show all task instances

facebookresearch/fastText

Library for fast text representation and classification.

312

Generated Behavioral Tests

80.4%

Best Score

Results by Model

#		Model	Score help_outline Percentage of hidden behavioral tests passed.	Cost help_outline Total API cost in USD for this task instance.	Calls help_outline Number of LLM API calls for this task instance.
1		GPT 5.5 (xhigh) OpenAI	80.4%	$8.25	80
2		Claude Opus 4.6 Anthropic	75.6%	$15.58	242
3		Claude Sonnet 4.6 Anthropic	68.6%	$13.78	294
4		GPT 5.5 (high) OpenAI	62.8%	$2.49	27
5		GPT 5.5 OpenAI	61.5%	$1.02	14
6		Claude Opus 4.7 Anthropic	38.8%	$1.59	57
7		GPT 5.4 OpenAI	37.5%	$0.24	9
8		Claude Opus 4.7 (xhigh) Anthropic	26.9%	$2.35	60
9		Gemini 3 Flash Google	15.4%	$0.22	47
10		Claude Haiku 4.5 Anthropic	14.7%	$1.02	106
11		GPT 5 mini OpenAI	8.3%	$0.01	8
12		Gemini 3.1 Pro Google	5.4%	$1.54	63
13		GPT 5.4 mini OpenAI	0.3%	$0.02	11

Click row to see model details