准备数据
准备 COCO128[1] 数据集,其是 COCO[2] train2017 前 128 个数据。按 YOLOv5 组织的目录:
nbsp;tree ~/datasets/coco128 -L 2
/home/john/datasets/coco128
├── images
│ └── train2017
│ ├── ...
│ └── 000000000650.jpg
├── labels
│ └── train2017
│ ├── ...
│ └── 000000000650.txt
├── LICENSE
└── README.txt
详见 Train Custom Data[3]。
定义 Dataset
torch.utils.data.Dataset 是一个数据集的抽象类。自定义数据集时,需继承 Dataset 并覆盖如下方法:
- __len__: len(dataset) 获取数据集大小。
- __getitem__: dataset[i] 访问第 i 个数据。
详见:
- torch.utils.data.Dataset[4]
- torchvision.datasets.vision.VisionDataset[5]
自定义实现 YOLOv5 数据集的例子:
import os
from pathlib import Path
from typing import Any, Callable, Optional, Tuple
import numpy as np
import torch
import torchvision
from PIL import Image
class YOLOv5(torchvision.datasets.vision.VisionDataset):
def __init__(
self,
root: str,
name: str,
transform: Optional[Callable] = None,
target_transform: Optional[Callable] = None,
transforms: Optional[Callable] = None,
) -> None:
super(YOLOv5, self).__init__(root, transforms, transform, target_transform)
images_dir = Path(root) / 'images' / name
labels_dir = Path(root) / 'labels' / name
self.images = [n for n in images_dir.iterdir()]
self.labels = []
for image in self.images:
base, _ = os.path.splitext(os.path.basename(image))
label = labels_dir / f'{base}.txt'
self.labels.append(label if label.exists() else None)
def __getitem__(self, idx: int) -> Tuple[Any, Any]:
img = Image.open(self.images[idx]).convert('RGB')
label_file = self.labels[idx]
if label_file is not None: # found
with open(label_file, 'r') as f:
labels = [x.split() for x in f.read().strip().splitlines()]
labels = np.array(labels, dtype=np.float32)
else: # missing
labels = np.zeros((0, 5), dtype=np.float32)
boxes = []
classes = []
for label in labels:
x, y, w, h = label[1:]
boxes.append([
(x - w/2) * img.width,
(y - h/2) * img.height,
(x + w/2) * img.width,
(y + h/2) * img.height])
classes.append(label[0])
target = {}
target["boxes"] = torch.as_tensor(boxes, dtype=torch.float32)
target["labels"] = torch.as_tensor(classes, dtype=torch.int64)
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self) -> int:
return len(self.images)
以上实现,继承了 VisionDataset 子类。其 __getitem__ 返回了:
- image: PIL Image, 大小为 (H, W)
- target: dict, 含以下字段: boxes (FloatTensor[N, 4]): 真实标注框 [x1, y1, x2, y2], x 范围 [0,W], y 范围 [0,H]labels (Int64Tensor[N]): 上述标注框的类别标识
读取 Dataset
dataset = YOLOv5(Path.home() / 'datasets/coco128', 'train2017')
print(f'dataset: {len(dataset)}')
print(f'dataset[0]: {dataset[0]}')
输出:
dataset: 128
dataset[0]: (<PIL.Image.Image image mode=RGB size=640x480 at 0x7F6F9464ADF0>, {'boxes': tensor([[249.7296, 200.5402, 460.5399, 249.1901],
[448.1702, 363.7198, 471.1501, 406.2300],
...
[ 0.0000, 188.8901, 172.6400, 280.9003]]), 'labels': tensor([44, 51, 51, 51, 51, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45,
45, 50, 50, 50, 51, 51, 60, 42, 44, 45, 45, 45, 50, 51, 51, 51, 51, 51,
51, 44, 50, 50, 50, 45])})
预览:
使用 DataLoader
训练需要批量提取数据,可以使用 DataLoader :
dataset = YOLOv5(Path.home() / 'datasets/coco128', 'train2017',
transform=torchvision.transforms.Compose([
torchvision.transforms.ToTensor()
]))
dataloader = DataLoader(dataset, batch_size=64, shuffle=True,
collate_fn=lambda batch: tuple(zip(*batch)))
for batch_i, (images, targets) in enumerate(dataloader):
print(f'batch {batch_i}, images {len(images)}, targets {len(targets)}')
print(f' images[0]: shape={images[0].shape}')
print(f' targets[0]: {targets[0]}')
输出:
batch 0, images 64, targets 64
images[0]: shape=torch.Size([3, 480, 640])
targets[0]: {'boxes': tensor([[249.7296, 200.5402, 460.5399, 249.1901],
[448.1702, 363.7198, 471.1501, 406.2300],
...
[ 0.0000, 188.8901, 172.6400, 280.9003]]), 'labels': tensor([44, 51, 51, 51, 51, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45,
45, 50, 50, 50, 51, 51, 60, 42, 44, 45, 45, 45, 50, 51, 51, 51, 51, 51,
51, 44, 50, 50, 50, 45])}
batch 1, images 64, targets 64
images[0]: shape=torch.Size([3, 248, 640])
targets[0]: {'boxes': tensor([[337.9299, 167.8500, 378.6999, 191.3100],
[383.5398, 148.4501, 452.6598, 191.4701],
[467.9299, 149.9001, 540.8099, 193.2401],
[196.3898, 142.7200, 271.6896, 190.0999],
[134.3901, 154.5799, 193.9299, 189.1699],
[ 89.5299, 162.1901, 124.3798, 188.3301],
[ 1.6701, 154.9299, 56.8400, 188.3700]]), 'labels': tensor([20, 20, 20, 20, 20, 20, 20])}
源码
- datasets.py[6]
- test_datasets.py[7]
参考
- Loading data in PyTorch[8]
- Datasets & Dataloaders[9]
- Writing Custom Datasets, DataLoaders and Transforms[10]
APIs:
- torch.utils.data[11]
- torchvision.datasets[12]
GoCoding 个人实践的经验分享,可关注公众号!
脚注
[1] COCO128: https://www.kaggle.com/ultralytics/coco128
[2] COCO: https://cocodataset.org
[3] Train Custom Data: https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data
[4] torch.utils.data.Dataset: https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset
[5] torchvision.datasets.vision.VisionDataset: https://github.com/pytorch/vision/blob/master/torchvision/datasets/vision.py
[6] datasets.py: https://github.com/ikuokuo/start-pytorch/tree/master/utils/datasets.py
[7] test_datasets.py: https://github.com/ikuokuo/start-pytorch/tree/master/tests/test_datasets.py
[8] Loading data in PyTorch: https://pytorch.org/tutorials/recipes/recipes/loading_data_recipe.html
[9] Datasets & Dataloaders: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
[10] Writing Custom Datasets, DataLoaders and Transforms: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
[11] torch.utils.data: https://pytorch.org/docs/stable/data.html
[12] torchvision.datasets: https://pytorch.org/vision/stable/datasets.html
本文暂时没有评论,来添加一个吧(●'◡'●)