Deep Learning is a buzzword that has been abused for the wrong reasons in the data science industry with people overestimating it’s capabilities more often than not. This guide will give you the most straightforward answer to what deep learning really is and when you should and should not be using it.
Let’s start with the most important question that you need to ask yourself as a data scientist. Why Deep Learning?
There are numerous machine learning algorithms such as Linear Regression, K – Nearest Neighbors, Decision Trees and the like but why do we need to use Deep Learning? The first reason is that Deep Learning accounts for the interactions between your input variables or factors. An interaction is simply how the behavior of one input variable affects all the other input variables.
The second reason to why Deep Learning is so useful is because you can use it for classification and regression types of machine learning problems. It can also be used for unsupervised and reinforcement learning problems.
The Concept Behind Deep Learning
Deep learning uses a bunch of powerful neural networks to work it’s magic. How does a neural network work?
A neural network has 3 layers:
- The Input Layer: Consists of all your input variables or factors.
- The Output Layer: The layer in which the neural network makes the final prediction.
- The Hidden Layer: Consists of all the interactions between the input variables.
In order to help you get a visual picture of the 3 layers in a neural network we have illustrated it for you below:
How does this neural network work? In order to help you understand this as clearly as possible let’s use the the infamous human resource analytics dataset found on Kaggle: https://www.kaggle.com/ludobenistant/hr-analytics
Let’s say that we want to predict the number of years an employee will spend in the firm based on 3 input variables –
- Satisfaction Level: The satisfaction level of this employee is 0.8.
- Number of projects: The employee has completed 5 projects.
- Last Evaluation Score: The employee scored a 0.6 in his/her last evaluation.
Let’s feed this information into a neural network as shown below:
From the network shown above, we have fed the input details mentioned earlier in the guide into their respective boxes in the input layer. The numbers on top of the lines in black are called the “weights“. The weights are automatically assigned by the neural network algorithm and are chosen in a way such that it produces the most accurate prediction.
You’re probably wondering – How did we get the 5.8 and 5.6 in the blue boxes within the hidden layer. In order to get the 5.8 we multiply the 0.8 in the input with the corresponding weight going into the upper blue box – 1. We then multiply the 5 in the input with the corresponding weight going into the upper blue box. We then sum up the results to get 5.8.
To get the 5.8 : (0.8 x 1) + (5 x 1) = 5.8
To get the 5.6 : (5 x 1) + (0.6 x 1) = 5.6
To get the output of 6 we follow the same procedure stated above: (5.8 x 2) + (5.6 x -1) = 6.
We predicted that this employee would stay with the firm for 6 years based on the 3 input variables while accounting for the interactions between these variables.
This method of prediction using a neural network is called Forward Propagation.
Activation functions capture the non-linearities during interactions. They thus help in removing the common bias that interactions between the input variables are always linear. The activation functions are usually placed in the hidden layer of the neural network.
There are many types of activation functions out there such as the Tanh() activation function and the ReLU activation function. In the industry, the ReLU (Rectified Linear Activation Function) is the most commonly used function.
So where does this ReLU function come into the neural network and how does it work? Take a look at the image below:
The network shown above is the same one that we used to make a prediction for the number of years that he/she would spend in the firm. Without using the ReLU activation function we got a prediction of 6 years. After the ReLU activation function we got a prediction of 10.6 years.
The way the ReLU activation function works is simple:
ReLU(-5) = 0
ReLU(5) = 5
This means that when we multiply and sum the numbers from the input layer into the hidden layer and if the output is negative, we assign a value of 0 to the respective box in the hidden layer. That is why instead of assigning -5.6 in the above example we assigned a 0.
When to NOT use Deep Learning
- When you don’t have a lot of money to spend on solving your problem
Deep learning is time consuming and computationally very expensive. You need to try different loss functions, activation functions, experiment with different number of hidden layers, different number of nodes for each instance of a hidden layer. There are a lot of hyper parameters to tweak and a lot of architectures to build which takes away a lot of time. Deep learning is still a “Black Box” that is not fully interpretable as of now and must be executed when the budget and/or time is high.
- When you have to explain your model’s parameters/feature selection to the “C-Suite”
Due it’s lack of interpretability, you really have to think twice before explaining your model to the “decision markers”. Sometimes a direct linear relationship between the input and the output might solve your problem in a more effective way. At times, you may not need to account for the non-linear relationships between the factors. Take this into account when making predictions using a neural network. Deep learning is known for it’s high predictive power but low interpretability.
In conclusion, a neural network is not a one stop solution for solving every problem in the world. Sometimes, it’s wiser to know that a better solution can be obtained through a simple linear regression or a decision tree algorithm which will save you both time and computational costs.
Stay tuned for the follow up on this guide in which we will help you learn how you can implement a deep learning model using TensorFlow in Python.
Happy Deep Learning!