For Developers

The Mathematical Formulation of Feed-Forward Neural Network

Feed forward neural network.

This article will describe the mathematics and statistical knowledge required to understand the Neural Networks. The term ‘Neural Network’ has its roots from the biological sciences. In the human brain, there are neurons which process the information. Thus, the network of neurons in the human brain is very complicated and vast, and hence, this article will not deal with the biological neurons but the neurons developed artificially which would help the data scientists to process large amounts of data in an efficient manner. One can get the idea of the similarities and differences between biological neurons and artificial neurons by looking at an image given below:

image3.webp

First, it is important to discuss linear models. The Regression and Classification are the two linear models which are entirely based on linear combinations of fixed non-linear basis functions Φ(x). It takes the form as given below:

image5.webp

In the above equation, f(.) is a non linear activation function which is called Sigmoid Function, for classification and identity function for regression models. Now, it will be good to extend this idea for development of neural network models by making the above mentioned basis function Φ(x) dependent on a few parameters which can be further adjusted during the training of the model along with the coefficient ωj . In the Neural Network model, each basis function is a nonlinear function of the linear combination of inputs. Thus, the coefficients present are adjustable during the model training. This leads to a basis of basic neural network models.

Firstly, it is required to construct a M number of linear combinations of the input variables x1,x2,…xD in the form given below:

image4.webp

Where j=1,...,M and the superscript (1) written over the variables ωj represent the position number of the hidden layer i.e. (1) means it is the first hidden layer. The variables ωji and ωj0 represent the weights and biases respectively. The variable quantity aj is known as activations. Now, each of these activations calculated above is transformed using a differentiable non linear activation function called activation function h(.). This can be expressed as below:

image6 - Copy.webp

The above calculated variable zj is the output from the first layer. The above activation function can be chosen as a sigmoidal function or tanh function. This is how the first hidden layer of the neural network is formulated. Though, every Data Scientist may not need to know such complex equations to write a program but it is always good to know! There are several Python libraries, like PyTorch, Keras, Tensorflow, etc. which have already created such complex models and have made work for Data Scientists to be more efficient and easier.

The above calculated value zj is used as an input parameter for another hidden layer i.e. 2nd hidden layer. The mathematical formulation of the second input layer is similar to the first layer as discussed above. The activation value is again calculated which is described below:

image6.webp

It is the same as the first layer but the only difference is that the input parameter of the second layer is the output of the first layer. Now, the output unit activations are transformed using appropriate activation functions. The choice of activation function is truly dependent on the type of data that needs to be processed. For a standardized regression problem, the activation function is an identity function i.e. yk=ak , and for a classification problem it is sigmoidal function i.e. image1.webp, where

image1 - Copy.webp

For multiclass problems, softmax function is used as an activation function. Thus, combining the above equations, the following form of the equation is obtained:

image1 - Copy (2).webp

Hence, it can be concluded that the neural network is a non linear function from a set of input parameters i.e. {x} to the set of output variables i.e. {y} which is optimized by the parameter w that can be easily adjusted. The above final equation can be regarded as the feed forward propagation of the information in the network. It can be visualized as follows:

image2.webp

The neural network consists of two stages of processing each of which is much similar to the perceptron model. Due to this, neural network models are also known as multilayer perceptron (MLP) models. Here, the details of MLP are beyond the scope of this article. There is a significant difference between perceptron and neural network models. The difference is that the neural network continuous sigmoidal functions in the hidden units. On the other hand, perceptron uses step function non linearities.
Now for an instance, if all the activation functions in the hidden units are taken as linear functions, then there exists an equivalent network which is without the hidden units. This fact can be justified as composition of successive linear transformations is itself a linear transformation.

The feed forward neural network has been widely studied by many scientists and have been found very general. Thus, it will be good to say that neural networks are universal approximators. There is one property of feed forward neural networks is that the multiple distinct choices for the variable which is described as weight w can give rise to similar mapping functions with the given input set {x} to the output set {y}. This can be explained with the help of an example. For instance, consider the network architecture (as shown in the image above) with two layers and M hidden units having activation function tanh , and full connectivity between the two layers. Now, if the sign of the weights and biases are changed from positive to negative, then, for a given input vector, the sign of the activation will be changed i.e. from positive to negative as tanh is an odd function, and hence tanh(-a) = - tanh(a). But observe that the change in sign will be compensated by the change in sign of the weights and biases. Thus, the input and output mapping of the function remains unaltered, and hence the two different weight vectors still give out the same output. Final conclusion can be made from this example is that there will be M such sign flips for the M hidden units.

Thus, this article covered the mathematical aspect of feed forward neural networks. There are further steps involved once neural networks are designed, which shall be covered in further articles. Stay tuned for more articles on Machine Learning and Deep Learning!

Press

Press

What's up with Turing? Get the latest news about us here.
Blog

Blog

Know more about remote work. Check out our blog here.
Contact

Contact

Have any questions? We'd love to hear from you.

Hire and manage remote developers

Tell us the skills you need and we'll find the best developer for you in days, not weeks.

Hire Developers