# Classification¶

We can model a classifier for $$C$$ classes with $$N$$ features using an $$(N+1)$$-dimensional compressed tensor: the first $$N$$ dimensions capture all possible feature values, whereas the last one has size $$C$$ and is used to compute class probabilities.

Here we will try a simple $$2$$-class example in $$N = 2$$, the Swiss roll classification problem.

[1]:

import tntorch as tn
import torch
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

N = 2
C = 2  # Number of classes
P = 100  # Points per class
c0 = torch.rand(P)*8+2
c0 = c0[:, None]
c0 = torch.cat([c0*torch.cos(c0), c0*torch.sin(c0)], dim=1)
c0 += torch.randn(*c0.shape)/1.5
c1 = -c0

plt.figure()
plt.scatter(c0[:, 0], c0[:, 1], color='royalblue')
plt.scatter(c1[:, 0], c1[:, 1], color='firebrick')
plt.gca().set_aspect('equal', 'datalim')
plt.title('Swiss roll (two classes)')
plt.show()

[2]:

# Assemble (X, y) data set
X = torch.cat([c0, c1], dim=0)
y = torch.cat([torch.zeros(len(c0)), torch.ones(len(c1))])

# Shuffle data
idx = np.random.permutation(len(X))
X = X[idx]
y = y[idx]

# Discretize features into [0, 1, ..., nticks-1]
nticks = 128
X = (X-X.min()) / (X.max()-X.min())
X = X*(nticks-1)

# Split into 75% train / 25% test
ntrain = int(len(X)*0.75)
X_train = X[:ntrain, :].long()
y_train = y[:ntrain].long()
X_test = X[ntrain:, :].long()
y_test = y[ntrain:].long()


Let’s set up the $$128 \times 128 \times 2$$ tensor that will be optimized. We will use an expansion using low-frequency cosine wavefunctions:

[3]:

t = tn.rand(shape=[nticks]*N + [C], ranks_tt=10, ranks_tucker=6, requires_grad=True)
t.set_factors('dct', dim=range(N))
t

[3]:

3D TT-Tucker tensor:

128 128  2
|   |   |
6   6   6
(0) (1) (2)
/ \ / \ / \
1   10  10  1


Our tensor’s last dimension is $$2$$: for each feature $$(x_1, x_2)$$ it produces $$2$$ numbers, one per class. For classification we will transform these weights into probabilities using the softmax function:

[4]:

def softmax(x):
expx = torch.exp(x-x.max())
return expx / torch.sum(expx, dim=-1, keepdim=True)


To assess the goodness of a matrix of probabilities (rows are instances, columns are classes) we use the cross-entropy loss:

[5]:

def cross_entropy_loss(probs, y):


We are now ready to fit our tensor network:

[6]:

def loss(t):
return cross_entropy_loss(softmax(t[X_train].torch()), y_train)
tn.optimize(t, loss)

iter: 0      | loss:   0.707212 | total time:    0.0022
iter: 500    | loss:   0.056675 | total time:    1.3520
iter: 1000   | loss:   0.006464 | total time:    2.7943
iter: 1500   | loss:   0.001936 | total time:    4.1302
iter: 2000   | loss:   0.000841 | total time:    5.5035
iter: 2500   | loss:   0.000438 | total time:    6.8394
iter: 3000   | loss:   0.000254 | total time:    8.1114
iter: 3500   | loss:   0.000157 | total time:    9.4423
iter: 4000   | loss:   0.000102 | total time:   10.7657
iter: 4026   | loss:   0.000100 | total time:   10.8177 <- converged (tol=0.0001)


We now predict classes for the test instances and compute the score (#correctly classified / number of test instances):

[7]:

prediction = torch.max(t[X_test].torch(), dim=1)[1]
score = torch.sum(prediction == y_test).double() / len(y_test)
print('Score:', score)

Score: tensor(0.9400)


Finally, we will show the class probabilities for the whole feature space (blue is class 0, red is class 1):

[8]:

fig = plt.figure(figsize=(5, 5))
plt.title('Training set')
plt.imshow(softmax(t.torch())[..., 0].detach().numpy().T, origin='lower', cmap='RdBu')
plt.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], color='royalblue')
plt.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], color='firebrick')
plt.show()

fig = plt.figure(figsize=(5, 5))
plt.title('Test set')
plt.imshow(softmax(t.torch())[..., 0].detach().numpy().T, origin='lower', cmap='RdBu')
plt.scatter(X_test[y_test==0, 0], X_test[y_test==0, 1], color='royalblue')
plt.scatter(X_test[y_test==1, 0], X_test[y_test==1, 1], color='firebrick')
plt.show()