Read our 2024 Summary blog post!

[[ visible ? '▲ HIDE' : '▼ SHOW BANNER' ]]

|||

Quick search

  • Introduction
  • Installation
  • Loading Datasets
  • Evaluating Datasets
  • Use Cases
    • Evaluate models on existing tasks and data
    • Evaluate standard tasks with my data
    • Evaluate with special processing or metrics
    • Create a Benchmark from Existing Datasets
    • Craft and use LLMs as a Judges
    • Evaluate different modalities and data types
  • Guides ✨
  • Examples
  • Blog πŸ“£
  • Code Documentation
  • πŸ“ Catalog

Use CasesΒΆ

  • Evaluate models on existing tasks and data
  • Evaluate standard tasks with my data
  • Evaluate with special processing or metrics
  • Create a Benchmark from Existing Datasets
  • Craft and use LLMs as a Judges
  • Evaluate different modalities and data types
<Evaluating Datasets
Evaluate models on existing tasks and data>
© Copyright 2023, IBM Research.