Data
Mimics torch.data.Dataset for ray.data integration
RayDataset (IterableDataset)
¶
map_(self, func, *args, **kwargs)
¶
Inplace Map for ray.data Time complexity: O(dataset size / parallelism)
See https://docs.ray.io/en/latest/data/dataset.html#transforming-datasets
map_batch_(self, func, batch_size=2, **kwargs)
¶
Inplace Map for ray.data Time complexity: O(dataset size / parallelism) See https://docs.ray.io/en/latest/data/dataset.html#transforming-datasets
RayImageFolder (RayDataset)
¶
Read image datasets
root/dog/xxx.png
root/dog/xxy.png
root/dog/[...]/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/[...]/asd932_.png
Data loader for image dataset
image_dataset_from_directory(directory, transform=None, image_size=(224, 224), batch_size=1, shuffle=False, pin_memory=True, num_workers=None, ray_data=False)
¶
Create Dataset and Dataloader for image folder dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
directory |
Union[List[str], pathlib.Path, str] |
required | |
transform |
None |
||
image_size |
(224, 224) |
||
batch_size |
int |
1 |
|
shuffle |
bool |
False |
|
pin_memory |
bool |
True |
|
num_workers |
Optional[int] |
None |
Returns:
Type | Description |
---|---|
Data |
A dictionary containing dataset and dataloader. |
Provide some common functionalities/utilities for Datasets
random_split_dataset(data, pct=0.9)
¶
Randomly splits dataset into two sets. Length of first split is len(data) * pct.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Dataset |
pytorch Dataset object with |
required |
pct |
percentage of split. |
0.9 |
Last update:
October 13, 2021