Multilayer Perceptron and Neural Network

A Perceptron is an algorithm used for supervised learning of binary classifiers

Update rule

$\mathbf{w}_{t+1}=\mathbf{w}_{t}+\left(1-H\left(y_{i} \mathbf{w}^{\top} \mathbf{x}_{i}\right)\right) y_{i} \mathbf{x}_{i}$

In [53]:
import numpy as np
import pandas as pd
from sklearn import datasets
In [54]:
class perceptron:
    def __init__(self,lr=0.1,n_iter=200):
        self.lr = lr
        self.n_iter = n_iter
        self.theta = None    
    def fit(self,X,y):
        b = np.ones(X.shape[0])
        b = b.reshape(b.shape[0],-1)
        X = X.reshape(X.shape[0],-1)
        X = np.hstack((b,X))
        y = np.where(y==0,-1,1)
        self.theta = np.random.rand(X.shape[1])  
        for _iter in range(self.n_iter):
            for ind in range(X.shape[0]):
                y_hat = self.theta.T.dot(X[ind])
                if np.sign(y_hat) == y[ind]:
                    pass
                else:
                    self.theta = self.theta + y[ind] * X[ind]
    def predict(self,X):
        b = np.ones(X.shape[0])
        b = b.reshape(b.shape[0],-1)
        X = X.reshape(X.shape[0],-1)
        X = np.hstack((b,X))
        pred = np.sign(X.dot(self.theta))
        return np.where(pred==1,1,0)
    def accuracy(self,pred,label):
        return np.sum(pred==label)/len(label)
In [ ]:
class perceptron:
    def __init__(self,lr=0.1,n_iter=200,init_param='random'):
        self.lr = lr
        self.n_iter = n_iter
        self.init_param = init_param
        self.theta = None    
    def fit(self,X,y):
        b = np.ones(X.shape[0])
        b = b.reshape(b.shape[0],-1)
        X = X.reshape(X.shape[0],-1)
        X = np.hstack((b,X))
        
        y = np.where(y==0,-1,1)

        if self.init_param =='zero':
            self.theta = np.zeros(X.shape[1])
        elif self.init_param =='random':
            self.theta = np.random.rand(X.shape[1])
        else:
            raise Exception("Wrong parameters initialization, initialize to zero or random")
            
        for _iter in range(self.n_iter):
            for ind in range(X.shape[0]):
                y_hat = self.theta.T.dot(X[ind])
                if np.sign(y_hat) == y[ind]:
                    pass
                else:
                    self.theta = self.theta + y[ind] * X[ind]
    
    def predict(self,X):
        b = np.ones(X.shape[0])
        b = b.reshape(b.shape[0],-1)
        X = X.reshape(X.shape[0],-1)
        X = np.hstack((b,X))
        pred = np.sign(X.dot(self.theta))
        return np.where(pred==1,1,0)

    def accuracy(self,pred,label):
        return np.sum(pred==label)/len(label)
In [55]:
iris = datasets.load_iris()
X = iris.data[:, :]  
y = iris.target
y = (y>0)*1
In [70]:
data = np.hstack((X,y.reshape(-1,1)))
np.random.shuffle(data)
data = pd.DataFrame(data,columns=['Feature1','Feature2','Feature3','Feature4','Target'])
data.head(6)
Out[70]:
Feature1 Feature2 Feature3 Feature4 Target
0 6.3 2.9 5.6 1.8 1.0
1 7.4 2.8 6.1 1.9 1.0
2 5.4 3.9 1.7 0.4 0.0
3 6.9 3.1 4.9 1.5 1.0
4 6.1 2.9 4.7 1.4 1.0
5 4.7 3.2 1.3 0.2 0.0
In [60]:
model = perceptron(n_iter=300,init_param='random')
print('Model Pramenters: ',model.theta)
None
In [62]:
model.fit(X,y)
print('Model Pramenters: ',model.theta)
Model Pramenters:  [-0.68541919 -1.12122172 -3.4261289   5.38552346  2.85796252]
In [65]:
print('Training accuracy: ',model.accuracy(model.predict(X),y))
Training accuracy:  1.0

Multi-layer Perceptron


MLp are typically represented by composing together many different functions. $f(\boldsymbol{x})=f^{(3)}\left(f^{(2)}\left(f^{(1)}(\boldsymbol{x})\right)\right)$

Forward pass and back-prop


$\frac{\partial L}{\partial W_{2}}=\frac{\partial L}{\partial X_{2}} \frac{\partial X_2}{\partial W_{2}}$

$\frac{\partial L}{\partial W_{1}}=\frac{\partial L}{\partial X_{2}} \frac{\partial X_2}{\partial X_{1}} \frac{\partial X_1}{\partial W_{1}}$

Parameters update $W \leftarrow W-\alpha * \nabla_{w} L$

Example

Assume Mean squared error loss $L(X_2, Y) = ||X2 - Y||^2$

$\frac{\partial L}{\partial X_{2}} = 2(X_2 - Y)$

$\frac{\partial X_2}{\partial W_{2}} = X_1$

$\frac{\partial X_2}{\partial X_{1}} = W_2$

$\frac{\partial X_1}{\partial W_{1}} = X$

$\nabla_{w_2} L = 2(X_2 - Y)X_1$ and $\nabla_{w_1} L = 2(X_2 - Y)W_2X$

In [119]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('./Dataset_spine.csv')
In [120]:
df = df.drop(['Unnamed: 13'], axis=1)

Given data for classification task

In [121]:
df.head(4)
Out[121]:
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 Col12 Class_att
0 63.027818 22.552586 39.609117 40.475232 98.672917 -0.254400 0.744503 12.5661 14.5386 15.30468 -28.658501 43.5123 Abnormal
1 39.056951 10.060991 25.015378 28.995960 114.405425 4.564259 0.415186 12.8874 17.5323 16.78486 -25.530607 16.1102 Abnormal
2 68.832021 22.218482 50.092194 46.613539 105.985135 -3.530317 0.474889 26.8343 17.4861 16.65897 -29.031888 19.2221 Abnormal
3 69.297008 24.652878 44.311238 44.644130 101.868495 11.211523 0.369345 23.5603 12.7074 11.42447 -30.470246 18.8329 Abnormal
In [122]:
df.describe()
Out[122]:
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 Col12
count 310.000000 310.000000 310.000000 310.000000 310.000000 310.000000 310.000000 310.000000 310.000000 310.000000 310.000000 310.000000
mean 60.496653 17.542822 51.930930 42.953831 117.920655 26.296694 0.472979 21.321526 13.064511 11.933317 -14.053139 25.645981
std 17.236520 10.008330 18.554064 13.423102 13.317377 37.559027 0.285787 8.639423 3.399713 2.893265 12.225582 10.450558
min 26.147921 -6.554948 14.000000 13.366931 70.082575 -11.058179 0.003220 7.027000 7.037800 7.030600 -35.287375 7.007900
25% 46.430294 10.667069 37.000000 33.347122 110.709196 1.603727 0.224367 13.054400 10.417800 9.541140 -24.289522 17.189075
50% 58.691038 16.357689 49.562398 42.404912 118.268178 11.767934 0.475989 21.907150 12.938450 11.953835 -14.622856 24.931950
75% 72.877696 22.120395 63.000000 52.695888 125.467674 41.287352 0.704846 28.954075 15.889525 14.371810 -3.497094 33.979600
max 129.834041 49.431864 125.742385 121.429566 163.071041 418.543082 0.998827 36.743900 19.324000 16.821080 6.972071 44.341200
In [123]:
df = df.drop(['Col7','Col8','Col9','Col10','Col11','Col12'], axis=1)

Data after preprocessing and feature selection

In [124]:
df.head(4)
Out[124]:
Col1 Col2 Col3 Col4 Col5 Col6 Class_att
0 63.027818 22.552586 39.609117 40.475232 98.672917 -0.254400 Abnormal
1 39.056951 10.060991 25.015378 28.995960 114.405425 4.564259 Abnormal
2 68.832021 22.218482 50.092194 46.613539 105.985135 -3.530317 Abnormal
3 69.297008 24.652878 44.311238 44.644130 101.868495 11.211523 Abnormal

MLP Classifier

In [79]:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
In [80]:
y = df['Class_att']
x = df.drop(['Class_att'], axis=1)
In [81]:
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size= 0.25, random_state=27)
In [130]:
clf = MLPClassifier(hidden_layer_sizes=(64,128,32),max_iter=300, alpha=0.0001, solver='sgd', 
                    verbose=10,random_state=21,tol=0.000000001)
In [132]:
clf.fit(x_train, y_train)
In [104]:
y_pred = clf.predict(x_test)
print(accuracy_score(y_test, y_pred))
0.7564102564102564

Model parameters and the target

In [107]:
print(clf.coefs_[0].shape)
print(clf.coefs_[1].shape)
print(clf.coefs_[2].shape)
print(clf.coefs_[3].shape)
print(clf.classes_)
(6, 64)
(64, 128)
(128, 32)
(32, 1)
['Abnormal' 'Normal']

MLP regressor

In [152]:
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=2000,n_features=10, random_state=1)
In [157]:
data_ = np.hstack((X,y.reshape(-1,1)))
np.random.shuffle(data_)
cols = ['Feature_'+str(i) for i in range(1, 11)]+['Target']
data_ = pd.DataFrame(data_,columns=cols)
In [162]:
data_.head(4)
Out[162]:
Feature_1 Feature_2 Feature_3 Feature_4 Feature_5 Feature_6 Feature_7 Feature_8 Feature_9 Feature_10 Target
0 1.362807 1.994342 -1.073064 -0.463248 0.315221 -0.755542 -0.254552 -1.071328 -1.842908 1.336105 155.387351
1 -1.062929 -1.899463 -2.539955 -0.889962 -1.168961 -0.254633 -0.317816 0.731145 -1.112314 0.218078 -367.287844
2 0.103143 0.037233 2.005517 0.234137 -0.609971 0.619853 2.351216 0.620457 0.297582 1.563209 266.329810
3 -1.993137 0.477895 -0.079740 -1.520746 1.919169 -0.057682 0.004939 -1.217973 -0.788276 -0.667930 -48.940628
In [151]:
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
In [159]:
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=1)
In [161]:
regr = MLPRegressor(hidden_layer_sizes=(64,128,32), random_state=1,  max_iter=500).fit(X_train, y_train)
In [163]:
print(regr.score(X_test, y_test))
0.9998915019136736

Model parameters


In [165]:
print(regr.coefs_[0].shape)
print(regr.coefs_[1].shape)
print(regr.coefs_[2].shape)
print(regr.coefs_[3].shape)
(10, 64)
(64, 128)
(128, 32)
(32, 1)

Questions

In [ ]: