[19/11/2024] Unitxt Embraces Rich Chat Format and Cross API Inference: Simplifying LLM Evaluation¶

Authors: Elron Bandel

19/11/2024

Preparing data for training and testing language models is a complex task. It involves handling various data formats, preprocessing, and ways of verbalizing tasks. Ensuring reproducibility and compatibility across platforms further adds to the complexity.

Recognizing these challenges, Unitxt has always aimed to simplify data preparation. Today, we are introducing two major updates to redefine our support for LLM workflows.

Introducing Two Major Enhancements¶

Producing Data in Chat API Format Unitxt can produce data in the widely adopted Chat API format. This ensures compatibility with popular LLM Provider APIs and avoid the need for custom per model formatting. Additionally, the format supports multiple modalities such as text, images, and videos.
A Comprehensive Array of Inference Engines We added wrappers for local inference platforms like Llama and Hugging Face as well as remote APIs such as LiteLLM, OpenAI, Watsonx, and more.

These wrappers make executing evaluation and inference tasks seamless and platform-agnostic, in just a few lines of code.

# Illustration of rich chat api ready for inference:

[
    {
        "role": "system",
        "content": "You are an assistant that helps classify images."
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What does this image depict?"
            },
            {
                "type": "image",
                "image": {
                    "mime_type": "image/jpeg",
                    "data": <ENCODED_IMAGE>
                }
            }
        ]
    }
]

Expanding Opportunities for the Community¶

These updates unlock significant opportunities, including:

Full Evaluation Pipelines: Design and execute end-to-end workflows directly in the Unitxt framework. For example, evaluate the impact of different templates, in-context example selection, answering multiple questions in one inference, and more.
Multi-Modality Evaluation: Evaluate models with diverse inputs, from text to images and beyond.
Easy Assembly of LLM Judges: Quickly set up LLMs as evaluators using Unitxt inference engines.

Our Commitment to Collaboration¶

Although you can now run end to end evaluation in Unitxt, Unitxt is still a general data preparation library. That means we remain committed to partnerships with other evaluation platforms such as HELM, LM Eval Harness, and others. Our Chat API format and inference engine support enhance accessibility and compatibility. These updates empower our partners to adopt the latest standards seamlessly.

Conclusion¶

Unitxt is adapting to the evolving landscape of language models and their capabilities. By supporting the Chat API format and inference engines, we simplify model workflows. These updates position Unitxt as the premier platform for LLM evaluation and integration.

We invite you to explore these features and join us in advancing model capabilities.

—

For more information, visit the inference engines guide or see many of our code examples.