📄 Llama 3 405B Instruct Wml¶

engines.classification.llama_3_405b_instruct_wml

type: WMLInferenceEngineGeneration
model_name: meta-llama/llama-3-405b-instruct
max_new_tokens: 5
random_seed: 42
decoding_method: greedy
[source]

Explanation about WMLInferenceEngineGeneration¶

Generates text for textual inputs.

If you want to include images in your input, please use ‘WMLInferenceEngineChat’ instead.

Args:

concurrency_limit (int):: Number of concurrent requests sent to a model. Default is 10, which is also the maximum value.

Examples:

from .api import load_dataset

wml_credentials = {
    "url": "some_url", "project_id": "some_id", "api_key": "some_key"
}
model_name = "google/flan-t5-xxl"
wml_inference = WMLInferenceEngineGeneration(
    credentials=wml_credentials,
    model_name=model_name,
    data_classification_policy=["public"],
    top_p=0.5,
    random_seed=123,
)

dataset = load_dataset(
    dataset_query="card=cards.argument_topic,template_card_index=0,loader_limit=5"
)
results = wml_inference.infer(dataset["test"])