LLM Benchmarking Results

Detailed results from our knowledge graph analysis benchmark of various LLM models. These results are part of a research project on benchmarking LLM comprehension capabilities through graph connectedness analysis. View the knowledge graph visualizations or read the full blog post for complete methodology and analysis.

Final Model Rankings

Comparative performance of various LLM models based on knowledge graph metrics. Click on any image to view it in full size.

Final ranking of LLM models based on knowledge graph metrics

Correlation with LMSys Arena

Comparing our knowledge graph analysis results with LMSys Arena rankings:

Ranking Methodology

The evaluation system ranks models based on several graph metrics:

Metric	Description	Weight in Ranking
Average Node Degree	Average number of connections per node	2.0 (Primary)
Node Count	Total number of concepts/entities	1.0
Edge Count	Total number of relationships	1.0
Graph Density	Ratio of actual connections to possible connections	0.8
Connected Components	Number of disconnected subgraphs (lower is better)	1.0
Largest Component Ratio	Size of largest component relative to total graph	0.8

The overall ranking is calculated using a weighted average of normalized metrics. The evaluation gives double weight to average node degree, as this metric best captures the knowledge graph's interconnectedness and usefulness.

For each subject and model combination:

Each metric is normalized to a [0,1] scale.
Metrics are weighted according to importance.
A composite score is calculated.
Models are ranked by their average scores across all subjects.