knwler – Benchmark Report

knwler Benchmark Report

Generated: 2026-04-03 08:35
Document: HumanRights.pdf
Runs: 15

Summary

Pipelines Evaluated

anthropic, gemini, ollama, openai

Best KYS

0.751

gemini-3.1-flash-lite-preview / gemini-3.1-flash-lite-preview

Most Knowledge

204

entities + relations

Fastest Pipeline

18.2s

end-to-end wall time

Avg Knowledge Rate

0.96

nodes+edges / second

Total Entities (all)

767

across all pipelines

Leaderboard

Sorted by Knowledge Yield Score (KYS) — higher is better

#	Backend	Discovery Model	Extraction Model	KYS ↓	Entities	Relations	Graph Size	Knowledge Rate	Total Time	Quality Score	Speed Score
1	gemini	gemini-3.1-flash-lite-preview	gemini-3.1-flash-lite-preview	0.751	62	53	115	6.33 /s	18.2s	0.564	1.000
2	anthropic	claude-sonnet-4-6	claude-haiku-4-5-20251001	0.395	87	73	160	1.75 /s	91.3s	0.784	0.199
3	openai	gpt-4o	gpt-4o-mini	0.332	47	52	99	1.24 /s	79.9s	0.485	0.228
4	ollama	gemma4:e2b	gemma4:e2b	0.245	34	55	89	0.67 /s	132.4s	0.436	0.137
5	ollama	mistral:latest	mistral:latest	0.244	54	126	180	0.67 /s	268.2s	0.882	0.068
6	ollama	gemma3:12b	gemma3:4b	0.222	62	45	107	0.55 /s	194.0s	0.524	0.094
7	ollama	gemma4:latest	gemma4:latest	0.217	85	72	157	0.53 /s	297.4s	0.770	0.061
8	ollama	qwen3.5:9b	qwen3.5:9b	0.213	99	105	204	0.51 /s	400.7s	1.000	0.045
9	ollama	qwen2.5:14b	qwen2.5:3b	0.191	37	30	67	0.41 /s	163.3s	0.328	0.111
10	ollama	llama3.2:latest	llama3.2:latest	0.188	29	36	65	0.40 /s	163.4s	0.319	0.111
11	ollama	qwen3:8b	qwen3:8b	0.183	58	66	124	0.38 /s	329.4s	0.608	0.055
12	ollama	llama3.1:8b	llama3.1:8b	0.170	38	46	84	0.33 /s	258.5s	0.412	0.070
13	gemini	gemini-3-flash-preview	gemini-3-flash-preview	0.163	13	10	23	0.30 /s	76.7s	0.113	0.237
14	ollama	glm-4.7-flash:latest	glm-4.7-flash:latest	0.156	62	66	128	0.27 /s	466.1s	0.627	0.039
15	gemini	gemini-3.1-pro-preview	gemini-3.1-pro-preview	0.000	0	0	0	0.00 /s	204.3s	0.000	0.089

Visual Analysis

Knowledge Yield Score (KYS)

Pipeline Time Breakdown (seconds)

Knowledge Graph Output

Knowledge Rate (graph elements / second)

Pipeline Details

gemini-3.1-flash-lite-preview

extraction: gemini-3.1-flash-lite-preview

gemini

0.751

Knowledge Yield Score

Entities

Relations

Total Time

18.2s

Knowledge Rate

6.33/s

Schema

1.4s

Rephrase

11.7s

Extraction

5.2s

claude-sonnet-4-6

extraction: claude-haiku-4-5-20251001

anthropic

0.395

Knowledge Yield Score

Entities

Relations

Total Time

91.3s

Knowledge Rate

1.75/s

Schema

7.6s

Rephrase

23.6s

Extraction

60.1s

gpt-4o

extraction: gpt-4o-mini

openai

0.332

Knowledge Yield Score

Entities

Relations

Total Time

79.9s

Knowledge Rate

1.24/s

Schema

4.8s

Rephrase

48.3s

Extraction

26.8s

gemma4:e2b

extraction: gemma4:e2b

ollama

0.245

Knowledge Yield Score

Entities

Relations

Total Time

132.4s

Knowledge Rate

0.67/s

Schema

8.5s

Rephrase

30.3s

Extraction

93.6s

mistral:latest

extraction: mistral:latest

ollama

0.244

Knowledge Yield Score

Entities

Relations

126

Total Time

268.2s

Knowledge Rate

0.67/s

Schema

0.0s

Rephrase

0.0s

Extraction

268.2s

gemma3:12b

extraction: gemma3:4b

ollama

0.222

Knowledge Yield Score

Entities

Relations

Total Time

194.0s

Knowledge Rate

0.55/s

Schema

22.4s

Rephrase

36.2s

Extraction

135.3s

gemma4:latest

extraction: gemma4:latest

ollama

0.217

Knowledge Yield Score

Entities

Relations

Total Time

297.4s

Knowledge Rate

0.53/s

Schema

17.1s

Rephrase

53.7s

Extraction

226.6s

qwen3.5:9b

extraction: qwen3.5:9b

ollama

0.213

Knowledge Yield Score

Entities

Relations

105

Total Time

400.7s

Knowledge Rate

0.51/s

Schema

0.0s

Rephrase

0.0s

Extraction

400.7s

qwen2.5:14b

extraction: qwen2.5:3b

ollama

0.191

Knowledge Yield Score

Entities

Relations

Total Time

163.3s

Knowledge Rate

0.41/s

Schema

22.5s

Rephrase

36.4s

Extraction

104.5s

llama3.2:latest

extraction: llama3.2:latest

ollama

#10

0.188

Knowledge Yield Score

Entities

Relations

Total Time

163.4s

Knowledge Rate

0.40/s

Schema

8.1s

Rephrase

32.1s

Extraction

123.3s

qwen3:8b

extraction: qwen3:8b

ollama

#11

0.183

Knowledge Yield Score

Entities

Relations

Total Time

329.4s

Knowledge Rate

0.38/s

Schema

11.9s

Rephrase

61.7s

Extraction

255.8s

llama3.1:8b

extraction: llama3.1:8b

ollama

#12

0.170

Knowledge Yield Score

Entities

Relations

Total Time

258.5s

Knowledge Rate

0.33/s

Schema

18.5s

Rephrase

39.7s

Extraction

200.3s

gemini-3-flash-preview

extraction: gemini-3-flash-preview

gemini

#13

0.163

Knowledge Yield Score

Entities

Relations

Total Time

76.7s

Knowledge Rate

0.30/s

Schema

8.3s

Rephrase

47.2s

Extraction

21.2s

glm-4.7-flash:latest

extraction: glm-4.7-flash:latest

ollama

#14

0.156

Knowledge Yield Score

Entities

Relations

Total Time

466.1s

Knowledge Rate

0.27/s

Schema

22.0s

Rephrase

36.7s

Extraction

407.4s

gemini-3.1-pro-preview

extraction: gemini-3.1-pro-preview

gemini

#15

0.000

Knowledge Yield Score

Entities

Relations

Total Time

204.3s

Knowledge Rate

0.00/s

Schema

19.4s

Rephrase

145.7s

Extraction

39.2s

Methodology – Knowledge Yield Score (KYS)

The Knowledge Yield Score (KYS) is a composite metric that balances output richness (how much the pipeline extracted) against efficiency (how fast it ran). It is the geometric mean of two normalized sub-scores, analogous to the F₁ score — it penalises runs that excel in only one dimension.

graph_size = num_entities + num_relations
total_time = schema_time + rephrase_time + extraction_time

quality_norm = graph_size / max(graph_size) # [0, 1]
speed_norm = min(total_time) / total_time # [0, 1] — fastest run scores 1.0

KYS = √(quality_norm × speed_norm) # geometric mean

quality_norm rewards pipelines that produce large, dense knowledge graphs. speed_norm rewards pipelines that finish quickly; the fastest run scores 1.0 and slower runs are penalised proportionally. The geometric mean ensures a pipeline cannot compensate for poor speed with high quality alone — both dimensions must be strong for a high KYS.

Knowledge Rate (graph_size / total_time) is a complementary raw throughput metric expressed in graph elements per second, useful for absolute comparisons independent of the normalization range.