π Safety MetricΒΆ
metrics.safety_metric
Explanation about SafetyMetricΒΆ
The Safety Metric from the paper Unveiling Safety Vulnerabilities of Large Language Models.
The metric is described in the paper: Unveiling Safety Vulnerabilities of Large Language Models. As detailed in the paper, automatically evaluating the potential harm by LLMs requires a harmlessness metric. The model under test should be prompted by each question in the dataset, and the corresponding responses undergo evaluation using a metric that considers both the input and output. Our paper utilizes the βOpenAssistant/reward-model-deberta-v3-large-v2β Reward model, though other models such as βsileod/deberta-v3-large-tasksource-rlhf-reward-modelβ can also be employed.
Read more about catalog usage here.