Pytorch

Pytorch is likely the most popular neural network framework given its ease of use compared to Tensorflow, and having more options then the high level framework Keras. It is developed by Facebook's AI Research lab.

Tensors

Tensors are used in place of numpy in Pytorch. This allows faster processing using GPUs.

If as use torch.as_tensor or torch.tensor, it will infer the datatype from the original array and assign it as such.

# convert array/list to pytorch tensor, retains a link to the array
x = torch.as_tensor(arr)
# convert array to tensor, no linkage, just a copy
x = torch.tensor(arr)
# check datatype
x.dtype
...torch.int32

If we want to convert the tensor to specific datatypes, we can refer to the table below.

Data type	dtype	CPU tensor	GPU tensor
32-bit floating point	torch.float32 or torch.float	torch.FloatTensor	torch.cuda.FloatTensor
64-bit floating point	torch.float64 or torch.double	torch.DoubleTensor	torch.cuda.DoubleTensor
16-bit floating point 1	torch.float16 or torch.half	torch.HalfTensor	torch.cuda.HalfTensor
16-bit floating point 2	torch.bfloat16	torch.BFloat16Tensor	torch.cuda.BFloat16Tensor
8-bit integer (unsigned)	torch.uint8	torch.ByteTensor	torch.cuda.ByteTensor
8-bit integer (signed)	torch.int8	torch.CharTensor	torch.cuda.CharTensor
16-bit integer (signed)	torch.int16 or torch.short	torch.ShortTensor	torch.cuda.ShortTensor
32-bit integer (signed)	torch.int32 or torch.int	torch.IntTensor	torch.cuda.IntTensor
64-bit integer (signed)	torch.int64 or torch.long	torch.LongTensor	torch.cuda.LongTensor
Boolean	torch.bool	torch.BoolTensor	torch.cuda.BoolTensor

GPU

Pytorch is able to use GPU to accelerate its processing speed.

We can set cuda by writing an if-else clause. Sometimes, just adding cuda will not work, but we have to specify the device id, i.e. cuda:0

import torch
print('PyTorch Version:', torch.__version__)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

if device.type == 'cuda':
    print('Number of GPUs:', torch.cuda.device_count())
    print('Device properties:', torch.cuda.get_device_properties(0))
    print('Device ID:', torch.cuda.current_device())
    print('Device Name:', torch.cuda.get_device_name(0))

Number of GPUs: 1
Device properties: _CudaDeviceProperties(name='Quadro P1000', major=6, minor=1, total_memory=4040MB, multi_processor_count=4)
Device ID: 0
Device Name: Quadro P1000

We can set the model to run in GPU, ideally by placing the device variable model.to(device).

# check if using gpu
next(model.parameters()).is_cuda
# use gpu
model.cuda()
# or
model.to(device)

We can do the same for the tensors, to specify them to use the GPU.

a = torch.FloatTensor([1.0,2.0])
# check if using gpu
a.device
# use gpu
a.cuda()
# or
a.to(device)

Data Loader

Here's a more specify example on a train-test split dataset.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=33)

X_train = torch.FloatTensor(X_train).to(device)
X_test = torch.FloatTensor(X_test).to(device)
y_train = torch.LongTensor(y_train).to(device)
y_test = torch.LongTensor(y_test).to(device)

An easier way is to use TensorDataset & DataLoader.

from torch.utils.data import TensorDataset, DataLoader

iris = TensorDataset(torch.FloatTensor(data),torch.LongTensor(labels))
iris_loader = DataLoader(iris, batch_size=105, shuffle=True)

Pytorch also have another dataset libraries, including torchvision, torchtext, torchaudio.

Modelling

Model Class

To build the model architecture, we need to assign it within a class, with the __init__ with the layers, and forward with the activation functions.

We then initiate the model, and define the loss function & optimizer.

import torch
import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self, in_features=4, h1=8, h2=9, out_features=3):
        super().__init__()                      # initiate nn.Module
        self.fc1 = nn.Linear(in_features,h1)    # input layer
        self.fc2 = nn.Linear(h1, h2)            # hidden layer
        self.out = nn.Linear(h2, out_features)  # output layer

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.out(x)
        return x

# Instantiate model class
torch.manual_seed(32)   # so that initial weight & bias are same
model = Model()

# define loss & optimizer
criterion = nn.CrossEntropyLoss()
    # model parameters are the layers' parameters of the NN
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

Train

To train the model, we need to iterate the number of epochs, do a feed forward pass, then followed by a back-propagation. In Pytorch, we need to set the gradients to zero for each epoch optimizer.zero_grad() as it accumulates the gradients on each pass.

epochs = 100
losses = []

for i in range(epochs):
    # feed forward 
    y_pred = model.forward(X_train)
    loss = criterion(y_pred, y_train)
    losses.append(loss)
    print(f'epoch: {i:2}  loss: {loss.item():10.8f}')

    # backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

After training, using the validation set, we evaluate the model to see if there is good generalization.

with torch.no_grad(): # not to update gradients
    y_val = model.forward(X_test)
    loss = criterion(y_val, y_test)
print(loss)

Save, Load & Predict

We can either save the model only with the learnt paramters, using state_dict, or save both the learnt parameters & model class. Pytorch convention is to save the models with either .pt or .pth format.

# save learnt parameters (biases & weights) only, but not model class
torch.save(model.state_dict(), 'best_model.pt')
# load
model = Model()
model.load_state_dict(torch.load('IrisDatasetModel.pt'))


# save parameters & model class
torch.save(model.state_dict(), 'best_model.pt')
# load
model = torch.load('best_model.pt')

To predict on new data, we need to first switch to evaluate mode mode.eval() so that dropout and batch normalization layers are turned off. Then we will again, use torch.no_grad() and pass in the new data.

model.eval()
with torch.no_grad():
    print(model(new_data)) # model output
    print(model(new_data).argmax()) # max output

Inference Optimization

This Pytorch guide provides several tips to optimize trained models' inference time.