本文將介紹在英特爾銳炫顯卡上訓練ResNet(PyTorch模型)的全流程萧朝,在下一篇中將介紹基于OpenVINO在AIxBoard上部署訓練好的模型权旷。在閱讀本文前,請先在Ubuntu22.04上安裝英特爾銳炫?獨立顯卡驅(qū)動程序
1. 搭建基于英特爾銳炫?獨立顯卡訓練PyTorch模型的開發(fā)環(huán)境
1.1 環(huán)境要求:
在Ubuntu22.04上基于英特爾獨立顯卡訓練PyTorch模型隘击,需要依次安裝:
- 英特爾獨立顯卡的驅(qū)動程序
- Intel? oneAPI Base Toolkit 2023.0
- torch 1.13.0a0和torchvision 0.14.1a0
- intel-extension-for-pytorch
1.2 安裝英特爾獨立顯卡的驅(qū)動程序
請參考《在Ubuntu22.04上安裝英特爾銳炫?獨立顯卡驅(qū)動程序》完成英特爾獨立顯卡的驅(qū)動安裝。安裝成功后地沮,可以在About窗口Graphics一欄看到英特爾獨立顯卡的型號攀例。
1.3 下載并安裝Intel? oneAPI Base Toolkit
第一步,通過下面的命令下載Intel? oneAPI Base Toolkit并啟動安裝程序:
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/19079/l_BaseKit_p_2023.0.0.25537.sh
sudo sh ./l_BaseKit_p_2023.0.0.25537.sh
1.4 安裝torch屋休、torchvision和intel-extension-for-pytorch
使用命令安裝torch坞古、torchvision和intel-extension-for-pytorch:
python -m pip install torch==1.13.0a0 torchvision==0.14.1a0 intel_extension_for_pytorch==1.13.10+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
到此劫樟,在Ubuntu平臺上基于Intel Extension for PyTorch和銳炫顯卡訓練PyTorch模型的開發(fā)環(huán)境配置完畢;痉恪!
2. 訓練 PyTorch ResNet模型
第一步毅哗,請通過以下命令激活oneAPI環(huán)境听怕,DPC++ 編譯器和 oneMKL 環(huán)境:
source /opt/intel/oneapi/setvars.sh
source /opt/intel/oneapi/compiler/latest/env/vars.sh
source /opt/intel/oneapi/mkl/latest/env/vars.sh
第二步,請下載training_on_Intel_dGPU_bf16_ipex.py并運行虑绵,該范例代碼使用了PyTorch自帶的Food101數(shù)據(jù)集和resnet50預訓練模型參數(shù)
核心代碼片段:
model = torchvision.models.resnet50(weights='IMAGENET1K_V2',num_classes=101)
model = model.to('xpu')
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
model.train()
model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16)
# 訓練循環(huán)
for epoch in range(epochs):
tloss,vloss = 0.0, 0.0
top1,top5 = 0.0, 0.0
pbar = tqdm(enumerate(train_loader),total=len(train_loader), bar_format=TQDM_BAR_FORMAT)
for i, (data, target) in pbar:
model.train()
data = data.to('xpu')
target = target.to('xpu')
with torch.xpu.amp.autocast():
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
optimizer.zero_grad()
tloss = (tloss*i + loss.item()) / (i+1)
if i == len(pbar) - 1:
pred,targets,vloss = [], [], 0
n = len(val_loader)
# 評估訓練精度
model.eval()
with torch.xpu.amp.autocast():
for d, (images, labels) in enumerate(val_loader):
images = images.to('xpu')
labels = labels.to('xpu')
y = model(images)
pred.append(y.argsort(1, descending=True)[:, :5])
targets.append(labels)
vloss += criterion(y, labels).item()
運行結(jié)果:3 結(jié)論:
與傳統(tǒng)FP32精度訓練模型相比尿瞭,Intel Extension for PyTorch支持的BF16精度能更加高效利用獨立顯卡,基于單卡英特爾A750和Ubuntu22.04的環(huán)境翅睛,筆者還分別測試了基于Food101數(shù)據(jù)集的Resnet50模型和Resnet101模型各自的BF16格式和FP32格式的最大batch_size及其訓練時的最大顯存使用率声搁。
BF16 | FP32 | |
---|---|---|
ResNet50 | 最大batch_size:128 | 最大batch_size:64 |
ResNet101 | 最大batch_size:96 | 最大batch_size:48 |