Read our
2024 Summary
blog post!
[[ visible ? '▲ HIDE' : '▼ SHOW BANNER' ]]
|||
Quick search
Introduction
Installation
Loading Datasets
Evaluating Datasets
Use Cases
Evaluate models on existing tasks and data
Evaluate standard tasks with my data
Evaluate with special processing or metrics
Create a Benchmark from Existing Datasets
Craft and use LLMs as a Judges
Evaluate different modalities and data types
Guides β¨
Examples
Blog π£
Code Documentation
π Catalog
Use Cases
ΒΆ
Evaluate models on existing tasks and data
Evaluate standard tasks with my data
Evaluate with special processing or metrics
Create a Benchmark from Existing Datasets
Craft and use LLMs as a Judges
Evaluate different modalities and data types
<
Evaluating Datasets
Evaluate models on existing tasks and data
>