Multilayer Perceptron and Neural Network¶

A Perceptron is an algorithm used for supervised learning of binary classifiers

Update rule¶

$\mathbf{w}_{t+1}=\mathbf{w}_{t}+\left(1-H\left(y_{i} \mathbf{w}^{\top} \mathbf{x}_{i}\right)\right) y_{i} \mathbf{x}_{i}$

import numpy as np
import pandas as pd
from sklearn import datasets

class perceptron:
    def __init__(self,lr=0.1,n_iter=200):
        self.lr = lr
        self.n_iter = n_iter
        self.theta = None    
    def fit(self,X,y):
        b = np.ones(X.shape[0])
        b = b.reshape(b.shape[0],-1)
        X = X.reshape(X.shape[0],-1)
        X = np.hstack((b,X))
        y = np.where(y==0,-1,1)
        self.theta = np.random.rand(X.shape[1])  
        for _iter in range(self.n_iter):
            for ind in range(X.shape[0]):
                y_hat = self.theta.T.dot(X[ind])
                if np.sign(y_hat) == y[ind]:
                    pass
                else:
                    self.theta = self.theta + y[ind] * X[ind]
    def predict(self,X):
        b = np.ones(X.shape[0])
        b = b.reshape(b.shape[0],-1)
        X = X.reshape(X.shape[0],-1)
        X = np.hstack((b,X))
        pred = np.sign(X.dot(self.theta))
        return np.where(pred==1,1,0)
    def accuracy(self,pred,label):
        return np.sum(pred==label)/len(label)

class perceptron:
    def __init__(self,lr=0.1,n_iter=200,init_param='random'):
        self.lr = lr
        self.n_iter = n_iter
        self.init_param = init_param
        self.theta = None    
    def fit(self,X,y):
        b = np.ones(X.shape[0])
        b = b.reshape(b.shape[0],-1)
        X = X.reshape(X.shape[0],-1)
        X = np.hstack((b,X))
        
        y = np.where(y==0,-1,1)

        if self.init_param =='zero':
            self.theta = np.zeros(X.shape[1])
        elif self.init_param =='random':
            self.theta = np.random.rand(X.shape[1])
        else:
            raise Exception("Wrong parameters initialization, initialize to zero or random")
            
        for _iter in range(self.n_iter):
            for ind in range(X.shape[0]):
                y_hat = self.theta.T.dot(X[ind])
                if np.sign(y_hat) == y[ind]:
                    pass
                else:
                    self.theta = self.theta + y[ind] * X[ind]
    
    def predict(self,X):
        b = np.ones(X.shape[0])
        b = b.reshape(b.shape[0],-1)
        X = X.reshape(X.shape[0],-1)
        X = np.hstack((b,X))
        pred = np.sign(X.dot(self.theta))
        return np.where(pred==1,1,0)

    def accuracy(self,pred,label):
        return np.sum(pred==label)/len(label)

iris = datasets.load_iris()
X = iris.data[:, :]  
y = iris.target
y = (y>0)*1

data = np.hstack((X,y.reshape(-1,1)))
np.random.shuffle(data)
data = pd.DataFrame(data,columns=['Feature1','Feature2','Feature3','Feature4','Target'])
data.head(6)

model = perceptron(n_iter=300,init_param='random')
print('Model Pramenters: ',model.theta)

None

model.fit(X,y)
print('Model Pramenters: ',model.theta)

Model Pramenters:  [-0.68541919 -1.12122172 -3.4261289   5.38552346  2.85796252]

print('Training accuracy: ',model.accuracy(model.predict(X),y))

Training accuracy:  1.0

Multi-layer Perceptron¶

MLp are typically represented by composing together many different functions. $f(\boldsymbol{x})=f^{(3)}\left(f^{(2)}\left(f^{(1)}(\boldsymbol{x})\right)\right)$

Forward pass and back-prop¶

$\frac{\partial L}{\partial W_{2}}=\frac{\partial L}{\partial X_{2}} \frac{\partial X_2}{\partial W_{2}}$

$\frac{\partial L}{\partial W_{1}}=\frac{\partial L}{\partial X_{2}} \frac{\partial X_2}{\partial X_{1}} \frac{\partial X_1}{\partial W_{1}}$

Parameters update $W \leftarrow W-\alpha * \nabla_{w} L$

Example¶

Assume Mean squared error loss $L(X_2, Y) = ||X2 - Y||^2$

$\frac{\partial L}{\partial X_{2}} = 2(X_2 - Y)$

$\frac{\partial X_2}{\partial W_{2}} = X_1$

$\frac{\partial X_2}{\partial X_{1}} = W_2$

$\frac{\partial X_1}{\partial W_{1}} = X$

$\nabla_{w_2} L = 2(X_2 - Y)X_1$ and $\nabla_{w_1} L = 2(X_2 - Y)W_2X$

MPL in Sk-learn¶

Sk-learn Neural Network

MLP Classifier

MLP Regressor

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('./Dataset_spine.csv')

df = df.drop(['Unnamed: 13'], axis=1)

Given data for classification task¶

df.head(4)

df.describe()

df = df.drop(['Col7','Col8','Col9','Col10','Col11','Col12'], axis=1)

Data after preprocessing and feature selection¶

df.head(4)

MLP Classifier¶

from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

y = df['Class_att']
x = df.drop(['Class_att'], axis=1)

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size= 0.25, random_state=27)

clf = MLPClassifier(hidden_layer_sizes=(64,128,32),max_iter=300, alpha=0.0001, solver='sgd', 
                    verbose=10,random_state=21,tol=0.000000001)

clf.fit(x_train, y_train)

y_pred = clf.predict(x_test)
print(accuracy_score(y_test, y_pred))

0.7564102564102564

Model parameters and the target¶

print(clf.coefs_[0].shape)
print(clf.coefs_[1].shape)
print(clf.coefs_[2].shape)
print(clf.coefs_[3].shape)
print(clf.classes_)

(6, 64)
(64, 128)
(128, 32)
(32, 1)
['Abnormal' 'Normal']

MLP regressor¶

from sklearn.datasets import make_regression
X, y = make_regression(n_samples=2000,n_features=10, random_state=1)

data_ = np.hstack((X,y.reshape(-1,1)))
np.random.shuffle(data_)
cols = ['Feature_'+str(i) for i in range(1, 11)]+['Target']
data_ = pd.DataFrame(data_,columns=cols)

data_.head(4)

from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=1)

regr = MLPRegressor(hidden_layer_sizes=(64,128,32), random_state=1,  max_iter=500).fit(X_train, y_train)

print(regr.score(X_test, y_test))

0.9998915019136736

Model parameters¶

print(regr.coefs_[0].shape)
print(regr.coefs_[1].shape)
print(regr.coefs_[2].shape)
print(regr.coefs_[3].shape)

(10, 64)
(64, 128)
(128, 32)
(32, 1)

	Feature1	Feature2	Feature3	Feature4	Target
0	6.3	2.9	5.6	1.8	1.0
1	7.4	2.8	6.1	1.9	1.0
2	5.4	3.9	1.7	0.4	0.0
3	6.9	3.1	4.9	1.5	1.0
4	6.1	2.9	4.7	1.4	1.0
5	4.7	3.2	1.3	0.2	0.0

	Col1	Col2	Col3	Col4	Col5	Col6	Col7	Col8	Col9	Col10	Col11	Col12	Class_att
0	63.027818	22.552586	39.609117	40.475232	98.672917	-0.254400	0.744503	12.5661	14.5386	15.30468	-28.658501	43.5123	Abnormal
1	39.056951	10.060991	25.015378	28.995960	114.405425	4.564259	0.415186	12.8874	17.5323	16.78486	-25.530607	16.1102	Abnormal
2	68.832021	22.218482	50.092194	46.613539	105.985135	-3.530317	0.474889	26.8343	17.4861	16.65897	-29.031888	19.2221	Abnormal
3	69.297008	24.652878	44.311238	44.644130	101.868495	11.211523	0.369345	23.5603	12.7074	11.42447	-30.470246	18.8329	Abnormal

	Col1	Col2	Col3	Col4	Col5	Col6	Col7	Col8	Col9	Col10	Col11	Col12
count	310.000000	310.000000	310.000000	310.000000	310.000000	310.000000	310.000000	310.000000	310.000000	310.000000	310.000000	310.000000
mean	60.496653	17.542822	51.930930	42.953831	117.920655	26.296694	0.472979	21.321526	13.064511	11.933317	-14.053139	25.645981
std	17.236520	10.008330	18.554064	13.423102	13.317377	37.559027	0.285787	8.639423	3.399713	2.893265	12.225582	10.450558
min	26.147921	-6.554948	14.000000	13.366931	70.082575	-11.058179	0.003220	7.027000	7.037800	7.030600	-35.287375	7.007900
25%	46.430294	10.667069	37.000000	33.347122	110.709196	1.603727	0.224367	13.054400	10.417800	9.541140	-24.289522	17.189075
50%	58.691038	16.357689	49.562398	42.404912	118.268178	11.767934	0.475989	21.907150	12.938450	11.953835	-14.622856	24.931950
75%	72.877696	22.120395	63.000000	52.695888	125.467674	41.287352	0.704846	28.954075	15.889525	14.371810	-3.497094	33.979600
max	129.834041	49.431864	125.742385	121.429566	163.071041	418.543082	0.998827	36.743900	19.324000	16.821080	6.972071	44.341200

	Col1	Col2	Col3	Col4	Col5	Col6	Class_att
0	63.027818	22.552586	39.609117	40.475232	98.672917	-0.254400	Abnormal
1	39.056951	10.060991	25.015378	28.995960	114.405425	4.564259	Abnormal
2	68.832021	22.218482	50.092194	46.613539	105.985135	-3.530317	Abnormal
3	69.297008	24.652878	44.311238	44.644130	101.868495	11.211523	Abnormal

	Feature_1	Feature_2	Feature_3	Feature_4	Feature_5	Feature_6	Feature_7	Feature_8	Feature_9	Feature_10	Target
0	1.362807	1.994342	-1.073064	-0.463248	0.315221	-0.755542	-0.254552	-1.071328	-1.842908	1.336105	155.387351
1	-1.062929	-1.899463	-2.539955	-0.889962	-1.168961	-0.254633	-0.317816	0.731145	-1.112314	0.218078	-367.287844
2	0.103143	0.037233	2.005517	0.234137	-0.609971	0.619853	2.351216	0.620457	0.297582	1.563209	266.329810
3	-1.993137	0.477895	-0.079740	-1.520746	1.919169	-0.057682	0.004939	-1.217973	-0.788276	-0.667930	-48.940628

	Feature1	Feature2	Feature3	Feature4	Target
0	6.3	2.9	5.6	1.8	1.0
1	7.4	2.8	6.1	1.9	1.0
2	5.4	3.9	1.7	0.4	0.0
3	6.9	3.1	4.9	1.5	1.0
4	6.1	2.9	4.7	1.4	1.0
5	4.7	3.2	1.3	0.2	0.0

	Feature1	Feature2	Feature3	Feature4	Target
0	6.3	2.9	5.6	1.8	1.0
1	7.4	2.8	6.1	1.9	1.0
2	5.4	3.9	1.7	0.4	0.0
3	6.9	3.1	4.9	1.5	1.0
4	6.1	2.9	4.7	1.4	1.0
5	4.7	3.2	1.3	0.2	0.0