Skip to content

Data

Mimics torch.data.Dataset for ray.data integration

RayDataset (IterableDataset)

map_(self, func, *args, **kwargs)

Inplace Map for ray.data Time complexity: O(dataset size / parallelism)

See https://docs.ray.io/en/latest/data/dataset.html#transforming-datasets

map_batch_(self, func, batch_size=2, **kwargs)

Inplace Map for ray.data Time complexity: O(dataset size / parallelism) See https://docs.ray.io/en/latest/data/dataset.html#transforming-datasets

reinforce_type(self, expected_type)

Reinforce the type for DataPipe instance. And the 'expected_type' is required to be a subtype of the original type hint to restrict the type requirement of DataPipe instance.

RayImageFolder (RayDataset)

Read image datasets

    root/dog/xxx.png
    root/dog/xxy.png
    root/dog/[...]/xxz.png

    root/cat/123.png
    root/cat/nsdf3.png
    root/cat/[...]/asd932_.png

reinforce_type(self, expected_type)

Reinforce the type for DataPipe instance. And the 'expected_type' is required to be a subtype of the original type hint to restrict the type requirement of DataPipe instance.


Data loader for image dataset

image_dataset_from_directory(directory, transform=None, image_size=(224, 224), batch_size=1, shuffle=False, pin_memory=True, num_workers=None, ray_data=False)

Create Dataset and Dataloader for image folder dataset.

Parameters:

Name Type Description Default
directory Union[List[str], pathlib.Path, str] required
transform None
image_size (224, 224)
batch_size int 1
shuffle bool False
pin_memory bool True
num_workers Optional[int] None

Returns:

Type Description
Data

A dictionary containing dataset and dataloader.



Provide some common functionalities/utilities for Datasets

random_split_dataset(data, pct=0.9)

Randomly splits dataset into two sets. Length of first split is len(data) * pct.

Parameters:

Name Type Description Default
data Dataset

pytorch Dataset object with __len__ implementation.

required
pct

percentage of split.

0.9

Last update: October 13, 2021