Pytorch
Pytorch is likely the most popular neural network framework given its ease of use compared to Tensorflow, and having more options then the high level framework Keras. It is developed by Facebook's AI Research lab.
Tensors
Tensors are used in place of numpy
in Pytorch. This allows faster processing using GPUs.
If as use torch.as_tensor
or torch.tensor
, it will infer the datatype from the original array and assign it as such.
# convert array/list to pytorch tensor, retains a link to the array
x = torch.as_tensor(arr)
# convert array to tensor, no linkage, just a copy
x = torch.tensor(arr)
# check datatype
x.dtype
...torch.int32
If we want to convert the tensor to specific datatypes, we can refer to the table below.
Data type | dtype | CPU tensor | GPU tensor | |
---|---|---|---|---|
32-bit floating point | torch.float32 or torch.float | torch.FloatTensor | torch.cuda.FloatTensor | |
64-bit floating point | torch.float64 or torch.double | torch.DoubleTensor | torch.cuda.DoubleTensor | |
16-bit floating point 1 | torch.float16 or torch.half | torch.HalfTensor | torch.cuda.HalfTensor | |
16-bit floating point 2 | torch.bfloat16 | torch.BFloat16Tensor | torch.cuda.BFloat16Tensor | |
8-bit integer (unsigned) | torch.uint8 | torch.ByteTensor | torch.cuda.ByteTensor | |
8-bit integer (signed) | torch.int8 | torch.CharTensor | torch.cuda.CharTensor | |
16-bit integer (signed) | torch.int16 or torch.short | torch.ShortTensor | torch.cuda.ShortTensor | |
32-bit integer (signed) | torch.int32 or torch.int | torch.IntTensor | torch.cuda.IntTensor | |
64-bit integer (signed) | torch.int64 or torch.long | torch.LongTensor | torch.cuda.LongTensor | |
Boolean | torch.bool | torch.BoolTensor | torch.cuda.BoolTensor |
GPU
Pytorch is able to use GPU to accelerate its processing speed.
We can set cuda by writing an if-else clause. Sometimes, just adding cuda
will not work, but we have to specify the device id, i.e. cuda:0
import torch
print('PyTorch Version:', torch.__version__)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if device.type == 'cuda':
print('Number of GPUs:', torch.cuda.device_count())
print('Device properties:', torch.cuda.get_device_properties(0))
print('Device ID:', torch.cuda.current_device())
print('Device Name:', torch.cuda.get_device_name(0))
Number of GPUs: 1
Device properties: _CudaDeviceProperties(name='Quadro P1000', major=6, minor=1, total_memory=4040MB, multi_processor_count=4)
Device ID: 0
Device Name: Quadro P1000
We can set the model to run in GPU, ideally by placing the device variable model.to(device)
.
# check if using gpu
next(model.parameters()).is_cuda
# use gpu
model.cuda()
# or
model.to(device)
We can do the same for the tensors, to specify them to use the GPU.
a = torch.FloatTensor([1.0,2.0])
# check if using gpu
a.device
# use gpu
a.cuda()
# or
a.to(device)
Data Loader
Here's a more specify example on a train-test split dataset.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=33)
X_train = torch.FloatTensor(X_train).to(device)
X_test = torch.FloatTensor(X_test).to(device)
y_train = torch.LongTensor(y_train).to(device)
y_test = torch.LongTensor(y_test).to(device)
An easier way is to use TensorDataset
& DataLoader
.
from torch.utils.data import TensorDataset, DataLoader
iris = TensorDataset(torch.FloatTensor(data),torch.LongTensor(labels))
iris_loader = DataLoader(iris, batch_size=105, shuffle=True)
Pytorch also have another dataset libraries, including torchvision, torchtext, torchaudio.
Modelling
Model Class
To build the model architecture, we need to assign it within a class, with the __init__
with the layers, and forward
with the activation functions.
We then initiate the model, and define the loss function & optimizer.
import torch
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self, in_features=4, h1=8, h2=9, out_features=3):
super().__init__() # initiate nn.Module
self.fc1 = nn.Linear(in_features,h1) # input layer
self.fc2 = nn.Linear(h1, h2) # hidden layer
self.out = nn.Linear(h2, out_features) # output layer
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.out(x)
return x
# Instantiate model class
torch.manual_seed(32) # so that initial weight & bias are same
model = Model()
# define loss & optimizer
criterion = nn.CrossEntropyLoss()
# model parameters are the layers' parameters of the NN
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
Train
To train the model, we need to iterate the number of epochs, do a feed forward pass, then followed by a back-propagation. In Pytorch, we need to set the gradients to zero for each epoch optimizer.zero_grad()
as it accumulates the gradients on each pass.
epochs = 100
losses = []
for i in range(epochs):
# feed forward
y_pred = model.forward(X_train)
loss = criterion(y_pred, y_train)
losses.append(loss)
print(f'epoch: {i:2} loss: {loss.item():10.8f}')
# backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
After training, using the validation set, we evaluate the model to see if there is good generalization.
with torch.no_grad(): # not to update gradients
y_val = model.forward(X_test)
loss = criterion(y_val, y_test)
print(loss)
Save, Load & Predict
We can either save the model only with the learnt paramters, using state_dict
, or save both the learnt parameters & model class. Pytorch convention is to save the models with either .pt
or .pth
format.
# save learnt parameters (biases & weights) only, but not model class
torch.save(model.state_dict(), 'best_model.pt')
# load
model = Model()
model.load_state_dict(torch.load('IrisDatasetModel.pt'))
# save parameters & model class
torch.save(model.state_dict(), 'best_model.pt')
# load
model = torch.load('best_model.pt')
To predict on new data, we need to first switch to evaluate mode mode.eval()
so that dropout and batch normalization layers are turned off. Then we will again, use torch.no_grad()
and pass in the new data.
model.eval()
with torch.no_grad():
print(model(new_data)) # model output
print(model(new_data).argmax()) # max output
Inference Optimization
This Pytorch guide provides several tips to optimize trained models' inference time.