Hamburger_menu.svg

FOR DEVELOPERS

An Introduction to Naive Bayes Algorithm for Beginners

Naive Bayes Algorithm

Imagine this: you're working for a company, you’ve generated your hypothesis, cleaned your data, created relevant features, and discussed the importance of variables. Now your stakeholders want to see a baseline model within an hour. What will you do?

You have a huge number of data points and many variables in your training set. In a situation like this, the best course would be to use Naive Bayes. It’s a technique for constructing classifiers that is very fast compared to other classification algorithms like logistic regression, support vector classifier, etc.

This article will cover the basics of Naive Bayes algorithm, how it works, and everything you need to know about what goes on under the hood.

What is Naive Bayes algorithm?

Naive Bayes falls within the boundaries of supervised machine learning algorithms that are mainly used for classification. In this context, ‘supervised’ means that the algorithm is trained with both input features and categorical outputs. But why is it called Naive? In basic terms, Naive Bayes classifier assumes that the presence of a particular feature in a class is not related to the presence of any other feature. Or, that the effect of an attribute value on a given class is independent of the values of the other attributes.

The model is easy to use and is especially useful for large datasets. Along with simplicity, it’s known to surpass even highly advanced classification methods.

Before getting into the nitty-gritty of this algorithm, it’s important to understand the Bayes theorem and conditional probability as the algorithm works on the principle of the latter.

Conditional probability

Here’s an example of a population of 12 people that loves oranges and grapes.

image6_11zon.webp


Image source: Author

image6_11zon.webp


Image source: Author

4 people love oranges, 3 love grapes, 2 love both oranges and grapes, and the other 3 people don’t like either fruit. Below is a contingency table for the data we have:

image7_11zon.webp


Image source: Author

image7_11zon.webp


Image source: Author

Now, what if you see a green object on the floor as if it were a grape? This will tell you that the next person you meet will definitely love grapes. What is the probability that the next person also loves oranges, after knowing that the next person you meet will love soda? In other words, what is the probability that someone loves grapes and oranges, given that you know they love grapes?

This can also be written in mathematical form:

image15_11zon.webp


Image source: Author

image15_11zon.webp


Image source: Author

The vertical line is used to mean “given that”. You can read it as: what is the probability that someone likes grapes and oranges, given that they already love grapes? In statistical terms, this is called conditional probability.

Let’s calculate this probability. The contingency table shows that the probability that someone likes grapes and oranges has already been calculated - but without knowing for a fact that they like grapes. Since it’s not precisely known that they like grapes, the denominator consisted of a total number of people in the population. Now, since you know that the person already likes grapes, the population will come down to only those who like grapes, which is 5.

image12_11zon.webp


Image source: Author

image12_11zon.webp


Image source: Author

Just like before, there are only 2 people who like oranges and grapes so the numerator will be 2. Since you already know they love grapes, the denominator will be 5.

image9_11zon.webp


Image source: Author

image9_11zon.webp


Image source: Author

Before it was known that the people liked grapes, the probability was 2/12=0.16. This probability increased from 0.16 to 0.4 after knowing they liked grapes. Similarly, you can calculate the probability that someone doesn’t like oranges, given that you know they love grapes.

Note that the probability may change if additional information about the problem is provided. This is what is done with machine learning problems. You need to predict something, given that you already know something about it.

Conditional probability can also be written as:

image14_11zon.webp


Image source: Author

image14_11zon.webp


Image source: Author

Bayes’ rule

Bayes’ theorem or Bayes’ rule is named after Thomas Bayes. It’s a mathematical rule based on statistics and probability that aims to calculate the probability of one scenario based on its relationship with another scenario.

Consider this scenario: Your friend asks to play with him and tells you that he’s bringing a friend. There’s a 50% chance the friend is female. Your friend then texts you to ask if you remember Ariana. With this additional information, under Bayes’ theorem, the probability is more likely the friend is female.

Historically, Bayes’ theorem led to significant breakthroughs. It was even used to crack Enigma codes during World War II. Alan Turing, the famous British mathematician, used Bayes’ theorem to determine the German messaging code. He and his team used probability models to break down the almost infinite number of possible translations based on the messages that were most likely to be translatable, ultimately cracking the Enigma code.

Now that the basic concepts are clear, let’s understand this mathematically.

Consider that A and B are any two events. Using your understanding of conditional probability, you have:

image5_11zon.webp


Image source: Author

image5_11zon.webp


Image source: Author

image4_11zon.webp


Image source: Author

image4_11zon.webp


Image source: Author

Naive Bayes

Naive Bayes algorithm is a classification technique based on Bayes’ theorem, which assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. There are various applications of this algorithm including face recognition, NLP problems, medical diagnoses and a lot more.

Here’s an example: there’s a dataset and you want to classify whether the text written is in a sports context or not. Let’s suppose that ‘a very good game’ is one of the texts in the dataset. According to Naive Bayes assumption, you are no longer looking at the entire sentence, but the individual words.

For the purpose of this example, ‘a very good game’ is the same as ‘a good very game’ and ‘game a good very’. It is written as:

image2_11zon.webp


Image source: Author

image2_11zon.webp


Image source: Author

When there are various X variables, it is simplified by assuming that Xs are independent, so:

image3_11zon.webp


Image source: Author

image3_11zon.webp


Image source: Author

For n number of X, the formula becomes Naive Bayes:

image11_11zon.webp


Image source: Author

image11_11zon.webp


Image source: Author

which can be expressed as:

image13_11zon.webp


Image source: Author

image13_11zon.webp


Image source: Author

To save time and effort, ignore the denominator since it is a constant value. The formula finally becomes:

image17_11zon.webp


Image source: Author

image17_11zon.webp


Image source: Author

Naive Bayes example

Below is training data on which Naive Bayes algorithm is applied:

image8_11zon.webp


Image source: Author

image8_11zon.webp


Image source: Author

Step 1: Make a Frequency table of the data.

image10_11zon.webp


Image source: Author

image10_11zon.webp


Image source: Author

Step 2: Create a Likelihood table by finding probabilities like Overcast probability = 0.29.

image16_11zon.webp


Image source: Author

image16_11zon.webp


Image source: Author

Step 3: Use Naive Bayes equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction.

Problem: Players will play if the weather is Rainy. Is this statement correct?
You can solve it using the above discussed method of posterior probability.

P(Yes | Rainy) = P( Rainy | Yes) * P(Yes) / P (Rainy)
Here, you have P (Rainy |Yes) = 2/9 = 0.22, P(Rainy) = 5/14 = 0.36, P(Yes)= 9/14 = 0.64

Now, P (Yes | Rainy) = 0.22 * 0.64 / 0.36 = 0.39, which has a higher probability.

Naive Bayes uses a similar method to predict the probability of different classes based on various attributes. This algorithm is mostly used in NLP problems like sentiment analysis, text classification, etc.

Project to apply Naive Bayes

Naive Bayes is the most basic algorithm that produces good results in textual data. If you’re a beginner to it, you can learn it by making an SMS-SPAM CLASSIFIER that classifies a message as spam if it contains negative comments. You can also learn Streamlit, an open-source app framework for machine learning and data science teams that is used to create attractive web pages in minutes.

Author

  • An Introduction to Naive Bayes Algorithm for Beginners

    Turing

    Author is a seasoned writer with a reputation for crafting highly engaging, well-researched, and useful content that is widely read by many of today's skilled programmers and developers.

Frequently Asked Questions

The Naive Bayes Classification Algorithm is a probabilistic classifier built on probability models. It incorporates independent predictions based on the data and information you have, despite not being sure about them. But those independent assumptions will not have a real impact, so it is known as being naive.

The Naive Bayes Theory will provide you with a way of calculating the conditional probability with the below formula:

P(A|B) = P(B|A) * P(A)/P(B)

In the above formula, P(A|B) is the posterior probability, and P(A) is an event that occurs because of the marginal probability and is known as the prior.

The Naive Bayes Algorithm is used for solving classification problems. It is a supervised learning model used for text classification that includes high-dimensional dataset training.

In machine learning, the Naive Bayes Algorithm comes under Supervised Learning Algorithms. These are based on Bayes's theory for solving classification problems like text classification, which includes high-dimensional dataset training.

The steps to follow for the Naive Bayes Algorithm are as below:

  • Separating the data according to the class it belongs.
  • Summarizing the dataset
  • Summarizing the data depending on the class it belongs to.
  • Applying a Gaussian Probability Density Function.
  • Predicting Class Probability

The benefits of the Naive Bayes Algorithm are as below:

  • It doesn’t require heavy training data.
  • It can handle both discrete and continuous data.
  • It is highly scalable with the total number of predictors and data points it offers.
  • It is quick and is used for making real-time predictions.
View more FAQs
Press

Press

What’s up with Turing? Get the latest news about us here.
Blog

Blog

Know more about remote work. Checkout our blog here.
Contact

Contact

Have any questions? We’d love to hear from you.

Hire remote developers

Tell us the skills you need and we'll find the best developer for you in days, not weeks.