torchvision是獨(dú)立于pytorch的關(guān)于圖像操作的一些方便工具庫铝宵。
torchvision主要包括一下幾個(gè)包:
vision.datasets: 幾個(gè)常用視覺數(shù)據(jù)集,可以下載和加載庇谆,這里主要的高級(jí)用法就是可以看源碼如何自己寫自己的Dataset的子類
vision.models: 流行的模型厉亏,例如 AlexNet, VGG, ResNet 和 Densenet 以及 與訓(xùn)練好的參數(shù)企巢。
vision.transforms: 常用的圖像操作,例如:隨機(jī)切割丑念,旋轉(zhuǎn)涡戳,數(shù)據(jù)類型轉(zhuǎn)換,圖像到tensor ,numpy 數(shù)組到tensor , tensor 到 圖像等脯倚。
vision.utils: 用于把形似 (3 x H x W) 的張量保存到硬盤中渔彰,給一個(gè)mini-batch的圖像可以產(chǎn)生一個(gè)圖像格網(wǎng)。
數(shù)據(jù)集 torchvision.datasets
包括以下數(shù)據(jù)集:
數(shù)據(jù)集有 API: -__getitem__-__len__他們都是torch.utils.data.Dataset的子類挠将。這樣我們?cè)趯?shí)現(xiàn)我們自己的Dataset數(shù)據(jù)集的時(shí)候至少要實(shí)現(xiàn)上邊兩個(gè)方法胳岂。
因此,他們可以使用torch.utils.data.DataLoader里的多線程 (python multiprocessing) 舔稀。
MNIST
dset.MNIST(root, train=True, transform=None, target_transform=None, download=False)
root:數(shù)據(jù)的目錄乳丰,里邊有processed/training.pt和processed/test.pt的內(nèi)容
train:True-使用訓(xùn)練集,False-使用測試集.
transform: 給輸入圖像施加變換
target_transform:給目標(biāo)值(類別標(biāo)簽)施加的變換
download: 是否下載mnist數(shù)據(jù)集
COCO
This requires theCOCO API to be installed
Captions:
dset.CocoCaptions(root="dir where images are",annFile="json annotation file", [transform, target_transform])
Example:
importtorchvision.datasetsasdsetimporttorchvision.transformsastransformscap=dset.CocoCaptions(root='dir where images are',annFile='json annotation file',transform=transforms.ToTensor())print('Number of samples: ',len(cap))img,target=cap[3]# load 4th sampleprint("Image Size: ",img.size())print(target)
Output:
Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']
Detection:
dset.CocoDetection(root="dir where images are",annFile="json annotation file", [transform, target_transform])
LSUN
dset.LSUN(db_path,classes='train', [transform, target_transform])
db_path= root directory for the database files
classes=
'train'- all categories, training set
'val'- all categories, validation set
'test'- all categories, test set
['bedroom_train','church_train', …] : a list of categories to load
CIFAR
dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)
dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)
root: root directory of dataset where there is foldercifar-10-batches-py
train:True= Training set,False= Test set
download:True= downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, does not do anything.
STL10
dset.STL10(root,split='train', transform=None, target_transform=None, download=False)
root: root directory of dataset where there is folderstl10_binary
split:'train'= Training set,'test'= Test set,'unlabeled'= Unlabeled set,
'train+unlabeled'= Training + Unlabeled set (missing label marked as-1)
download:True= downloads the dataset from the internet and
puts it in root directory. If dataset is already downloaded, does not do anything.
SVHN
dset.SVHN(root,split='train', transform=None, target_transform=None, download=False)
root: root directory of dataset where there is folderSVHN
split:'train'= Training set,'test'= Test set,'extra'= Extra training set
download:True= downloads the dataset from the internet and
puts it in root directory. If dataset is already downloaded, does not do anything.
ImageFolder
一個(gè)通用的數(shù)據(jù)加載器,圖像應(yīng)該按照以下方式放置:
root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png
dset.ImageFolder(root="root folder path", [transform, target_transform])
ImageFolder有以下成員:
self.classes- 類別名列表
self.class_to_idx- 類別名到標(biāo)簽内贮,例如 “狗”-->[1,0,0]
self.imgs- 一個(gè)包括 (image path, class-index) 元組的列表产园。
Imagenet-12
This is simply implemented with an ImageFolder dataset.
The data is preprocessedas described here
PhotoTour
Learning Local Image Descriptors Datahttp://phototour.cs.washington.edu/patches/default.htm
importtorchvision.datasetsasdsetimporttorchvision.transformsastransformsdataset=dset.PhotoTour(root='dir where images are',name='name of the dataset to load',transform=transforms.ToTensor())print('Loaded PhotoTour: {} with {} images.'.format(dataset.name,len(dataset.data)))
模型
models 子包含了以下的模型框架:
這里對(duì)于每種模型里可能包含很多子模型,比如Resnet就有 34夜郁,51什燕,101,152不同層數(shù)竞端。
這些成熟的模型的意義就是你可以在torchvision的安裝路徑下找到 可以通過命令
print(torchvision.models.__file__)??
#'d:\\Anaconda3\\lib\\site-packages\\torchvision\\models\\__init__.py'
學(xué)習(xí)這些優(yōu)秀的模型是如何搭建的屎即。
你可以用隨機(jī)參數(shù)初始化一個(gè)模型:
importtorchvision.modelsasmodelsresnet18=models.resnet18()alexnet=models.alexnet()vgg16=models.vgg16()squeezenet=models.squeezenet1_0()
我們提供了預(yù)訓(xùn)練的ResNet的模型參數(shù),以及 SqueezeNet 1.0 and 1.1, and AlexNet, 使用 PyTorchmodel zoo. 可以在構(gòu)造函數(shù)里添加pretrained=True:
importtorchvision.modelsasmodelsresnet18=models.resnet18(pretrained=True)alexnet=models.alexnet(pretrained=True)squeezenet=models.squeezenet1_0(pretrained=True)
所有的預(yù)訓(xùn)練模型期待輸入同樣標(biāo)準(zhǔn)化的數(shù)據(jù),例如mini-baches 包括形似(3*H*W)的3通道的RGB圖像技俐,H,W最少是224乘陪。
圖像的范圍必須在[0,1]之間,然后使用mean=[0.485, 0.456, 0.406]andstd=[0.229, 0.224, 0.225]? 進(jìn)行標(biāo)準(zhǔn)化雕擂。
相關(guān)的例子在:the imagenet example here<https://github.com/pytorch/examples/blob/42e5b996718797e45c46a25c55b031e6768f8440/imagenet/main.py#L89-L101>
變換
變換(Transforms)是常用的圖像變換啡邑。可以通過transforms.Compose進(jìn)行連續(xù)操作:
transforms.Compose
你可以組合幾個(gè)變換在一起井赌,例如:
transform=transforms.Compose([transforms.RandomSizedCrop(224),transforms.RandomHorizontalFlip(),transforms.ToTensor(),transforms.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225]),])
PIL.Image支持的變換
Scale(size, interpolation=Image.BILINEAR)
縮放輸入的 PIL.Image到給定的“尺寸”谤逼。 ‘尺寸’ 指的是較短邊的尺寸.
例如,如果 height > width, 那么圖像將被縮放為 (size * height / width, size) - size: 圖像較短邊的尺寸- interpolation: Default: PIL.Image.BILINEAR
CenterCrop(size)- 從中間裁剪圖像到指定大小
從中間裁剪一個(gè) PIL.Image 到給定尺寸. 尺寸可以是一個(gè)元組 (target_height, target_width) 或一個(gè)整數(shù),整數(shù)將被認(rèn)為是正方形的尺寸 (size, size)
RandomCrop(size, padding=0)
Crops the given PIL.Image at a random location to have a region of the given size. size can be a tuple (target_height, target_width) or an integer, in which case the target will be of a square shape (size, size) Ifpaddingis non-zero, then the image is first zero-padded on each side withpaddingpixels.
RandomHorizontalFlip()
隨機(jī)進(jìn)行PIL.Image圖像的水平翻轉(zhuǎn)仇穗,概率是0.5.
RandomSizedCrop(size, interpolation=Image.BILINEAR)
Random crop the given PIL.Image to a random size of (0.08 to 1.0) of
the original size and and a random aspect ratio of 3/4 to 4/3 of the
original aspect ratio
This is popularly used to train the Inception networks - size: size
of the smaller edge - interpolation: Default: PIL.Image.BILINEAR
Pad(padding, fill=0)
Pads the given image on each side withpaddingnumber of pixels, and the padding pixels are filled with pixel valuefill. If a5x5image is padded withpadding=1then it becomes7x7
對(duì)于 torch.*Tensor 的變換
Normalize(mean, std)
Given mean: (R, G, B) and std: (R, G, B), will normalize each channel
of the torch.*Tensor, i.e. channel = (channel - mean) / std
轉(zhuǎn)換變換
ToTensor()- Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
ToPILImage()- Converts a torch.*Tensor of range [0, 1] and shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and shape H x W x C to a PIL.Image of range [0, 255]
廣義變換
Lambda(lambda)
Given a Python lambda, applies it to the inputimgand returns it. For example:
transforms.Lambda(lambdax:x.add(10))
便利函數(shù)
make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False)
Given a 4D mini-batch Tensor of shape (B x C x H x W), or a list of images all of the same size, makes a grid of images
normalize=True will shift the image to the range (0, 1), by subtracting the minimum and dividing by the maximum pixel value.
if range=(min, max) where min and max are numbers, then these numbers are used to normalize the image.
scale_each=True will scale each image in the batch of images separately rather than computing the (min, max) over all images.
Example usage is given in this notebook<https://gist.github.com/anonymous/bf16430f7750c023141c562f3e9f2a91>
save_image(tensor, filename, nrow=8, padding=2, normalize=False, range=None, scale_each=False)
Saves a given Tensor into an image file.
If given a mini-batch tensor, will save the tensor as a grid of images.
All options afterfilenameare passed through tomake_grid. Refer to it’s documentation for more details
用以輸出圖像的拼接流部,很方便。
沒想到這篇文章閱讀量這么大仪缸,考慮跟新下贵涵。
圖像引擎:由于需要讀取處理圖片所以需要相關(guān)的圖像庫×兄現(xiàn)在torchvision可以支持多個(gè)圖像讀取庫恰画,可以切換。
使用的函數(shù)是:
torchvision.get_image_backend()#獲取圖像存取引擎
torchvision.set_image_backend(backend)?? #改變圖像讀取引擎
#backend(string) –圖像引擎的名字:是? {‘PIL’, ‘a(chǎn)ccimage’}其中之一瓷马。accimage包使用的是因特爾(Intel) IPP 庫拴还。它的速度快于PIL,但是并不支持很多的圖像操作。
由于這個(gè)是后邊的欧聘,普通用處不大片林,知道即可。