Read our 2024 Summary blog post!

[[ visible ? '▲ HIDE' : '▼ SHOW BANNER' ]]

|||

Quick search

  • Introduction
  • Installation
  • Loading Datasets
  • Evaluating Datasets
  • Use Cases
  • Guides ✨
  • Examples
  • Blog πŸ“£
  • Code Documentation
  • πŸ“ Catalog
    • πŸ“ Augmentors
    • πŸ“ Benchmarks
    • πŸ“ Cards
    • πŸ“ Engines
    • πŸ“ Formats
    • πŸ“ Metrics
      • πŸ“ Bert Score
      • πŸ“ Granite Guardian
      • πŸ“ Key Value Extraction
      • πŸ“ Llm As Judge
      • πŸ“ Perplexity
      • πŸ“ Perplexity A
      • πŸ“ Perplexity Chat
      • πŸ“ Perplexity Nli
      • πŸ“ Perplexity Q
      • πŸ“ Qa
      • πŸ“ Rag
      • πŸ“ Reward
      • πŸ“ Robustness
      • πŸ“ Sentence Bert
      • πŸ“ Text2Sql
      • πŸ“„ Accuracy
      • πŸ“„ Accuracy Binary
      • πŸ“„ Anls
      • πŸ“„ Bleu
      • πŸ“„ Char Edit Dist Accuracy
      • πŸ“„ Char Edit Distance
      • πŸ“„ Exact Match Mm
      • πŸ“„ F1 Binary
      • πŸ“„ F1 Macro
      • πŸ“„ F1 Macro Multi Label
      • πŸ“„ F1 Micro
      • πŸ“„ F1 Micro Multi Label
      • πŸ“„ F1 Strings
      • πŸ“„ F1 Weighted
      • πŸ“„ Fin Qa Metric
      • πŸ“„ Fuzzyner
      • πŸ“„ Is Code Mixed
      • πŸ“„ Jaccard Index
      • πŸ“„ Kendalltau B
      • πŸ“„ Kpa
      • πŸ“„ Map
      • πŸ“„ Matthews Correlation
      • πŸ“„ Max Accuracy Binary
      • πŸ“„ Max F1 Binary
      • πŸ“„ Meteor
      • πŸ“„ Meteor Hf
      • πŸ“„ Mrr
      • πŸ“„ Ndcg
      • πŸ“„ Ner
      • πŸ“„ Normalized Sacrebleu
      • πŸ“„ Precision Binary
      • πŸ“„ Precision Macro Multi Label
      • πŸ“„ Precision Micro Multi Label
      • πŸ“„ Prediction Length
      • πŸ“„ Recall Binary
      • πŸ“„ Recall Macro Multi Label
      • πŸ“„ Recall Micro Multi Label
      • πŸ“„ Regard Metric
      • πŸ“„ Relaxed Correctness
      • πŸ“„ Rerank Recall
      • πŸ“„ Retrieval At K
      • πŸ“„ Roc Auc
      • πŸ“„ Rouge
      • πŸ“„ Rouge With Confidence Intervals [deprecated]
      • πŸ“„ Sacrebleu
      • πŸ“„ Safety Metric
      • πŸ“„ Spearman
      • πŸ“„ Squad
      • πŸ“„ String Containment
      • πŸ“„ String Containment Ratio
      • πŸ“„ Token Overlap
      • πŸ“„ Token Overlap With Context
      • πŸ“„ Unsorted List Exact Match
      • πŸ“„ Vectara Groundedness Hhem 2 1
      • πŸ“„ Websrc Squad F1
      • πŸ“„ Weighted Win Rate Correlation
      • πŸ“„ Wer
    • πŸ“ Operators
    • πŸ“ Processors
    • πŸ“ Recipes
    • πŸ“ Serializers
    • πŸ“ Splitters
    • πŸ“ System Prompts
    • πŸ“ Tasks
    • πŸ“ Templates

πŸ“ CriteriaΒΆ

  • πŸ“„ Adherence With Format
  • πŸ“„ Answer Completeness
  • πŸ“„ Answer Relevance
  • πŸ“„ Assistant Message Answer Relevance
  • πŸ“„ Assistant Message General Harm
  • πŸ“„ Assistant Message Groundedness
  • πŸ“„ Assistant Message Profanity
  • πŸ“„ Assistant Message Social Bias
  • πŸ“„ Assistant Message Unethical Behavior
  • πŸ“„ Assistant Message Violence
  • πŸ“„ Coherence
  • πŸ“„ Conciseness
  • πŸ“„ Consistency
  • πŸ“„ Context Context Relevance
  • πŸ“„ Conversational
  • πŸ“„ Correctness Based On Ground Truth
  • πŸ“„ Email Effectiveness
  • πŸ“„ Email Structure
  • πŸ“„ Empathy
  • πŸ“„ Engagement
  • πŸ“„ Examples And Details
  • πŸ“„ Fluency
  • πŸ“„ Grammar And Punctuation
  • πŸ“„ Harmfulness
  • πŸ“„ Information From Reference
  • πŸ“„ Information Outside Reference
  • πŸ“„ Insensitivity
  • πŸ“„ Irrelevant Information
  • πŸ“„ Manipulative Email
  • πŸ“„ Naturalness
  • πŸ“„ Objectivity
  • πŸ“„ Professional Tone
  • πŸ“„ Question Answer Quality
  • πŸ“„ Reference Document Faithfulness
  • πŸ“„ Relevance
  • πŸ“„ Summarization Preference
  • πŸ“„ Temperature In Celsius And Fahrenheit
  • πŸ“„ Truthfulness
  • πŸ“„ User Message General Harm
  • πŸ“„ User Message Jailbreak
  • πŸ“„ User Message Profanity
  • πŸ“„ User Message Social Bias
  • πŸ“„ User Message Unethical Behavior
  • πŸ“„ User Message Violence

Read more about catalog usage here.

<πŸ“„ O1 Preview
πŸ“„ Adherence With Format>
© Copyright 2023, IBM Research.