πŸ“„ Safety MetricΒΆ

metrics.safety_metric

Explanation about SafetyMetricΒΆ

The Safety Metric from the paper Unveiling Safety Vulnerabilities of Large Language Models.

The metric is described in the paper: Unveiling Safety Vulnerabilities of Large Language Models. As detailed in the paper, automatically evaluating the potential harm by LLMs requires a harmlessness metric. The model under test should be prompted by each question in the dataset, and the corresponding responses undergo evaluation using a metric that considers both the input and output. Our paper utilizes the β€œOpenAssistant/reward-model-deberta-v3-large-v2” Reward model, though other models such as β€œsileod/deberta-v3-large-tasksource-rlhf-reward-model” can also be employed.

Read more about catalog usage here.