torchvision指南
本筆記引用自PyTorch中文文檔
1. torchvision介紹
torchvision
包含了目前流行的數(shù)據(jù)集,模型結(jié)構(gòu)和常用的圖片轉(zhuǎn)換工具愕贡。
1.1 torchvision.datasets
-
torchvision.datasets
中包含以下數(shù)據(jù)集:MNIST
COCO
LSUN Classification
ImageFolder
Imagenet-12
CIFAR10 and CIFAR100
STL10
-
dataset
擁有以下API:__getitem__
,__len__
- 這些
datasets
都是torch.utils.data.Dataset
的子類
1.1.1 MNIST
-
dest.MNIST(root, train=True, transform=None, target_transform=None, download=False)
-download: 是否聯(lián)網(wǎng)下載
1.1.2 COCO
- 圖像標(biāo)注
dest.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform, transform])
# sample
import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',
annFile = 'json annotation file',
transform=transforms.ToTensor())
print(Number of sample: ', len(cap))
img, target = cap[3]
print('Image size:', img.size())
print(target)
'''
output
Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']
'''
- 檢測
- `dset.CocoDetection(root='dir where images are', annFile='json annotation file', [transform, target_transform])
1.1.3 ImageFolder
- 一個通用的數(shù)據(jù)加載器孵运,數(shù)據(jù)集中數(shù)據(jù)以以下方式組織:
root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png
-
dset.ImageFolder(root='root folder path', [transform, target_transform])
-
self.classes
: 用一個list保存類名 -
self.class_to_idx
: 類名對應(yīng)的索引 -
self.imgs
: 保存(img_path, class)tuple的list
-
1.1.4 CIFAR
dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)
dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)
1.2 torchvision.models
-
torchvision.models
模塊的子模塊中包含以下模型結(jié)構(gòu):AlexNet
VGG
ResNet
SqueezeNet
DenseNet
- 可使用預(yù)訓(xùn)練模型螟够,如:
torchvision.models.alexnet(pretrained=False, **kwargs)
torchvision.models.resnet18(pretrained=False, **kwargs)
torchvision.models.resnet34(pretrained=False, ** kwargs)
torchvision.models.resnet50(pretrained=False, ** kwargs)
torchvision.models.resnet101(pretrained=False, ** kwargs)
torchvision.models.resnet152(pretrained=False, ** kwargs)
torchvision.models.vgg11(pretrained=False, ** kwargs)
torchvision.models.vgg11_bn(** kwargs)
torchvision.models.vgg13(pretrained=False, ** kwargs)
torchvision.models.vgg13_bn(** kwargs)
torchvision.models.vgg16(pretrained=False, ** kwargs)
torchvision.models.vgg16_bn(** kwargs)
torchvision.models.vgg19(pretrained=False, ** kwargs)
torchvision.models.vgg19_bn(** kwargs)
1.3 torchvision.transforms
- 對
PIL.Image
進(jìn)行變換 - 使用
torchvision.transforms.Compose(transforms)
將多個transform
組合起來使用 -
transforms.CenterCrop(size)
:將給定PIL.Image
進(jìn)行中心切割彤断,得到給定的size
怕品,size
可以是tuple
或Integer
-
transforms.RandomCrop(size, padding=0)
:切割中心點(diǎn)位置隨機(jī)選取妇垢,size
可以是tuple
或Integer
-
transforms.RandomHorizontalFlip(p=0.5)
:隨機(jī)水平翻轉(zhuǎn) -
transforms.RandomSizedCrop(size, interpolation=2)
:先隨機(jī)切,再resize
成給定size
大小肉康。 -
transforms.Pad(padding, fill=0)
:給所有邊用給定的值填充闯估。padding
:要填充多少像素 -
transforms.ToTensor()
:將一個取值范圍是[0, 255]
的PIL.Image
或shape
為(H, W, C)
的numpy.ndarray
,轉(zhuǎn)換成形狀為[C, H, W]
吼和,取值范圍是[0, 1.0]
的torch.FloatTensor
-
transforms.Normalize(mean, std)
:給定均值與方差睬愤,正則化,即Normalized_image=(image-mean)/std
- 通用變換:使用
lambda
作為轉(zhuǎn)換器纹安,transforms.Lambda(lambda)
1.4 torchvision.utils
-
utils.make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False)
給定4D-mini-batch Tensor
尤辱,形狀為(B*C*H*W)
,或者一個a list of image
厢岂,做成一個size
為(B / nrow, nrow)
的子圖集- normalize=True, 對圖像像素歸一化
- range=(min, max)光督,min和max是數(shù)字,則min, max用來規(guī)范化
image
- scale_each=True, 每個圖片獨(dú)立規(guī)范化塔粒。
-
utils.save_image(tensor, filename, nrow=8, padding=2, normalize=False, range=None, scale_each=False)
將給定的Tensor
保存成image文件结借,如果是mini-batch tensor
,就用make-grid
做成子圖集再保存卒茬。