Data

Mimics torch.data.Dataset for ray.data integration

`RayDataset (IterableDataset)` ¶

`map_(self, func, *args, **kwargs)` ¶

Inplace Map for ray.data Time complexity: O(dataset size / parallelism)

See https://docs.ray.io/en/latest/data/dataset.html#transforming-datasets

`map_batch_(self, func, batch_size=2, **kwargs)` ¶

Inplace Map for ray.data Time complexity: O(dataset size / parallelism) See https://docs.ray.io/en/latest/data/dataset.html#transforming-datasets

`reinforce_type(self, expected_type)` ¶

Reinforce the type for DataPipe instance. And the 'expected_type' is required to be a subtype of the original type hint to restrict the type requirement of DataPipe instance.

`RayImageFolder (RayDataset)` ¶

Read image datasets

    root/dog/xxx.png
    root/dog/xxy.png
    root/dog/[...]/xxz.png

    root/cat/123.png
    root/cat/nsdf3.png
    root/cat/[...]/asd932_.png

`reinforce_type(self, expected_type)` ¶

Reinforce the type for DataPipe instance. And the 'expected_type' is required to be a subtype of the original type hint to restrict the type requirement of DataPipe instance.

Data loader for image dataset

`image_dataset_from_directory(directory, transform=None, image_size=(224, 224), batch_size=1, shuffle=False, pin_memory=True, num_workers=None, ray_data=False)` ¶

Create Dataset and Dataloader for image folder dataset.

Parameters:

Name	Type	Default
`directory`	`Union[List[str], pathlib.Path, str]`	required
`transform`		`None`
`image_size`		`(224, 224)`
`batch_size`	`int`	`1`
`shuffle`	`bool`	`False`
`pin_memory`	`bool`	`True`
`num_workers`	`Optional[int]`	`None`

Returns:

Type	Description
`Data`	A dictionary containing dataset and dataloader.

Provide some common functionalities/utilities for Datasets

`random_split_dataset(data, pct=0.9)` ¶

Randomly splits dataset into two sets. Length of first split is len(data) * pct.

Parameters:

Name	Type	Description	Default
`data`	`Dataset`	pytorch Dataset object with `__len__` implementation.	required
`pct`		percentage of split.	`0.9`

Last update: October 13, 2021

Data

RayDataset (IterableDataset) ¶

map_(self, func, *args, **kwargs) ¶

map_batch_(self, func, batch_size=2, **kwargs) ¶

reinforce_type(self, expected_type) ¶

RayImageFolder (RayDataset) ¶

reinforce_type(self, expected_type) ¶

image_dataset_from_directory(directory, transform=None, image_size=(224, 224), batch_size=1, shuffle=False, pin_memory=True, num_workers=None, ray_data=False) ¶

random_split_dataset(data, pct=0.9) ¶

`RayDataset (IterableDataset)` ¶

`map_(self, func, *args, **kwargs)` ¶

`map_batch_(self, func, batch_size=2, **kwargs)` ¶

`reinforce_type(self, expected_type)` ¶

`RayImageFolder (RayDataset)` ¶

`reinforce_type(self, expected_type)` ¶

`image_dataset_from_directory(directory, transform=None, image_size=(224, 224), batch_size=1, shuffle=False, pin_memory=True, num_workers=None, ray_data=False)` ¶

`random_split_dataset(data, pct=0.9)` ¶