unitxt.stream module

class unitxt.stream.DynamicStream(generator: Callable = __required__, gen_kwargs: Dict[str, Any] = {}, caching: bool = False, copying: bool = False)[source]

Bases: Stream

peek()[source]
set_copying(copying: bool)[source]
take(n)[source]
exception unitxt.stream.EmptyStreamError[source]

Bases: FaultyStreamError

Raised when a stream is unexpectedly empty.

exception unitxt.stream.FaultyStreamError[source]

Bases: Exception

Base class for all stream-related exceptions.

class unitxt.stream.GeneratorStream(generator: Callable = __required__, gen_kwargs: Dict[str, Any] = {}, caching: bool = False, copying: bool = False)[source]

Bases: Stream

A class for handling streaming data in a customizable way.

This class provides methods for generating, caching, and manipulating streaming data.

Parameters:
  • generator (function) – A generator function for streaming data.

  • gen_kwargs (dict, optional) – A dictionary of keyword arguments for the generator function.

  • caching (bool) – Whether the data is cached or not.

peek()[source]
set_copying(copying: bool)[source]
take(n)[source]
class unitxt.stream.ListStream(instances_list: List[Dict[str, Any]] = __required__, copying: bool = False)[source]

Bases: Stream

peek()[source]
set_copying(copying: bool)[source]
take(n) Generator[source]
exception unitxt.stream.MissingStreamError[source]

Bases: FaultyStreamError

Raised when a required stream is missing.

class unitxt.stream.MultiStream(data=None)[source]

Bases: dict

A class for handling multiple streams of data in a dictionary-like format.

This class extends dict and its values should be instances of the Stream class.

data

A dictionary of Stream objects.

Type:

dict

classmethod from_generators(generators: Dict[str, ReusableGenerator], caching=False, copying=False)[source]

Creates a MultiStream from a dictionary of ReusableGenerators.

Parameters:
  • generators (Dict[str, ReusableGenerator]) – A dictionary of ReusableGenerators.

  • caching (bool, optional) – Whether the data should be cached or not. Defaults to False.

  • copying (bool, optional) – Whether the data should be copied or not. Defaults to False.

Returns:

A MultiStream object.

Return type:

MultiStream

classmethod from_iterables(iterables: Dict[str, Iterable[Dict[str, Any]]], caching=False, copying=False)[source]

Creates a MultiStream from a dictionary of iterables.

Parameters:
  • iterables (Dict[str, Iterable]) – A dictionary of iterables.

  • caching (bool, optional) – Whether the data should be cached or not. Defaults to False.

  • copying (bool, optional) – Whether the data should be copied or not. Defaults to False.

Returns:

A MultiStream object.

Return type:

MultiStream

get_generator(key) Generator[source]

Gets a generator for a specified key.

Parameters:

key (str) – The key for the generator.

Yields:

object – The next value in the stream.

set_caching(caching: bool)[source]
set_copying(copying: bool)[source]
to_dataset(disable_cache=True, cache_dir=None, features=None) DatasetDict[source]
to_iterable_dataset(features=None) IterableDatasetDict[source]
class unitxt.stream.Stream[source]

Bases: Dataclass

abstract peek()[source]
abstract set_copying(copying: bool)[source]
abstract take(n)[source]
to_dataset(disable_cache=False, cache_dir=None, features=None)[source]
to_iterable_dataset(features=None)[source]
unitxt.stream.eager_failed()[source]