For Developers

How to Use Python for Learning Vector Quantization From Scratch

Python for Learning Vector Quantization From Scratch

Learning vector quantization (LVQ) is a prototype-based learning method. A supervised learning classification algorithm, it can be used as an alternative to some machine learning (ML) algorithms. While the actual algorithm isn't especially strong, it is simple and instinctive. It also has a few expansions that make it a powerful tool in various ML-related works. In this article, we will explore learning vector quantization in detail and how to implement it in Python.

How does learning vector quantization work?

Learning vector quantization is similar to self-organizing maps. At least one prototype is used to address each class in the dataset. Every prototype is depicted as a point in the feature space. New (obscure) data points are then allotted the class of the prototype that is closest to them. To select the closest point, a distance measure must be characterized. The Euclidean distance metric is a good choice to calculate the next closest point.

There is no constraint on the number of prototypes that can be utilized per class. However, there should be at least one prototype for each class. The picture beneath shows a straightforward learning vector quantization framework where each class (red and green) is addressed by a prototype.

Learning vector quantization framework.webp

Image source: Medium.com

How do we fit prototypes to each class so that they are a decent portrayal of that class? We start by choosing a distance metric. In this model, we will apply the Euclidean distance metric.

Note that the Euclidean distance between two vectors in N dimensions is given by:

Euclidean distance between two vectors.webp

We can utilize the squared Euclidean distance which doesn't expect us to figure the square root.

The architecture of learning vector quantization

The basic architecture of learning vector quantization consists of two layers: the input layer and the output layer. The image below shows the structure of the algorithm.

Learning vector quantization algorithm.webp

Image source: GeeksforGeeks

Here, we have ‘n’ number of input units and ‘m’ number of output units. The layers are interconnected to each other by having weights on them.

learning vector quantization architecture.webp

Image source: CentOS

The math behind learning vector quantization

Let’s take a look at the mathematical concept behind LVQ.

Consider the following five input vectors and their target class.

Input vectors and their target class.webp

In each input vector, there are four input components (x1, x2, x3, x4) and two target classes (1, 2).

Let’s assign the weights based on the class. As there are two target classes, the first two vectors can be used as weight vectors as w1 = [ 0 0 1 1 ] & w2 = [ 1 0 0 0 ].

Weight vectors.webp

The remaining three vectors can be used for training.

Consider the learning rate 𝛼 as 0.1.

Let’s take our first input vector (the third vector).

Input vector: [ 0 0 0 1 ]

Target class: 2

The next step is to calculate the Euclidean distance. The formula is:

Euclidean distance formula.webp

where,

wij is the weight

xi is the input vector component.

Now, we can calculate D(1) and D(2) which are the distance of the input unit from the first and second weight vectors, respectively.

Distance of input unit from weight vectors.webp

Here, D(1) is lesser than D(2) and the winner index is J = 1.

Since the target class 2 is not equal to J, updating the weight can be done by:

Weight updating expression.webp

The updated weight vector will be:

Updated weight vector result.webp

Let’s repeat the same process for the rest of the input vectors.

Input vector: [ 1 1 0 0 ]

Target class: 1

Target class 1.webp

Here, D(2) is less than D(1) and the winner index is J = 2.

Since the target class 2 is not equal to J, updating the weight can be done by:

Expression for updating weights.webp

The updated weight vector will be:

Updated weight vector.webp

Input vector: [ 0 1 1 0 ]

Target class: 1

Target class 1 for learning vector quantization.webp

Here, D(1) is lesser than D(2) and the winner index is J = 1.

Since the target class 2 is equal to J, updating the weight can be done by:

Updating weights expression.webp

The updated weight vector will be:

Result of weight vector.webp

This is the end of the first iteration, i.e., an epoch. We have applied weights on all three input vectors. Similarly, we can perform ‘n’ number of epochs till all the winning vectors become equal to the target class of the input vectors.

Algorithm of learning vector quantization

Let’s take a look at a simplified view of the LVQ algorithm.

STEP 1

Initialize the weights.

STEP 2

Select the training sample for the given number of epochs.

STEP 3

Calculate the feature vector and update it.

STEP 4

Repeat the steps for all the training samples.

STEP 5

Predict the test examples.

How to implement learning vector quantization in Python

Implementing learning vector quantization in Python.webp

Below is the Python code to implement LVQ.

In this example, we will use the digits dataset available in sklearn. It contains 1797 images, each of which is 8x8 pixels.

Import the necessary libraries

The first step is to import the required libraries.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import minmax_scale
from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score
import matplotlib.pyplot as plt
import numpy as np
import math

Load the dataset

We then load the digits dataset and print it.

digits = datasets.load_digits()
digits

Output:

Loading digits dataset.webp

Digits dataset.webp

We print the input image.

plt.imshow(digits.images[-2], cmap='gray_r')
plt.show()

Output:

Input image.webp

We can check the image by printing the target.

digits.target[-2]

Output:

Printing the target.webp

Split the dataset

The next step is to split the dataset into training and testing data.

X = digits.data
Y = digits.target
X, Y

Output:

Train-test split.webp

x_train, x_test, y_train, y_test = train_test_split(X, Y, shuffle=False, test_size=0.3)

Train using LVQ

Next, we define the training phase of the data using the LVQ algorithm.

def lvq_train(X, y, a, b, max_ep, min_a, e):
    c, train_idx = np.unique(y, True)
    r = c
    W = X[train_idx].astype(np.float64)
    train = np.array([e for i, e in enumerate(zip(X, y)) if i not in train_idx])
    X = train[:, 0]
    y = train[:, 1]
    ep = 0
 
    while ep < max_ep and a > min_a:
        for i, x in enumerate(X):
            d = [math.sqrt(sum((w - x) ** 2)) for w in W]
            min_1 = np.argmin(d)
 
            min_2 = 0
            dc = float(np.amin(d))
            dr = 0
            min_2 = d.index(sorted(d)[1])
            dr = float(d[min_2])
            if c[min_1] == y[i] and c[min_1] != r[min_2]:
                W[min_1] = W[min_1] + a * (x - W[min_1])
 
            elif c[min_1] != r[min_2] and y[i] == r[min_2]:
                if dc != 0 and dr != 0:
 
                    if min((dc/dr),(dr/dc)) > (1-e) / (1+e):
                        W[min_1] = W[min_1] - a * (x - W[min_1])
                        W[min_2] = W[min_2] + a * (x - W[min_2])
            elif c[min_1] == r[min_2] and y[i] == r[min_2]:
                W[min_1] = W[min_1] + e * a * (x - W[min_1])
                W[min_2] = W[min_2] + e * a * (x- W[min_2])
        a = a * b
        ep += 1
    return W, c

Test the LVQ

We then define the testing function for the data.

def lvq_test(x, W):
    
    W, c = W
    d = [math.sqrt(sum((w - x) ** 2)) for w in W]
 
    return c[np.argmin(d)]

Evaluate the algorithm

We start training the data.

W = lvq_train(x_train, y_train, 0.2, 0.5, 100, 0.001, 0.3)
W

Output:

Training the data for learning vector quantization.webp

Training data for LVQ.webp

We test the algorithm.

predicted = []
for i in x_test:
    predicted.append(lvq_test(i, W))

We have now completed the training and testing of the data.

Let’s evaluate our model and check the accuracy.

def print_metrics(labels, preds):
    print("Precision Score: {}".format(precision_score(labels,
           preds, average = 'weighted')))
    print("Recall Score: {}".format(recall_score(labels, preds,
           average = 'weighted')))
    print("Accuracy Score: {}".format(accuracy_score(labels,
           preds)))
    print("F1 Score: {}".format(f1_score(labels, preds, average =
           'weighted')))
print_metrics(y_test, predicted)

Output:

Evaluating the model.webp

Code source

The accuracy is 87% which is a decent score.

Go ahead and implement LVQ with a dataset of your choosing. You can also improve the algorithm. For example, by taking into consideration various prototypes to be utilized per class.

Variants of learning vector quantization

The LVQ algorithm has other variants, namely LVQ2, LVQ2.1, and LVQ3. They were developed by Teuvo Kohonen.

LVQ2 algorithm

The second improved version of the LVQ algorithm is similar to Bayesian decision theory.

The steps for LVQ2 are the same as for LVQ. However, there are a few differences. In the LVQ2 algorithm, the weights are applied in certain conditions such as:

  • When the input vector is incorrectly classified.
  • When the next closest vector classifies correctly.
  • When the input vector is enough to get the decision boundary.

Here, learning takes place only when the input vector comes within a window which can be updated as follows.

Input vector in LVQ2 algorithm.webp

Updating the weights can be done by:

Updating weights in LVQ2 algorithm.webp

Weights updating in LVQ2 algorithm.webp

where

LVQ2 algorithm Updating the weights.webp

LVQ2.1 algorithm

LVQ2.1 is a popular variant of learning vector quantization. In LVQ2, the weights are updated at two conditions: one for the winning vector which has the same class label, and the other for the next vector which has a different class label. However, in LVQ2.1, either vector may have the same class labels.

Here, the condition for the window within which the input vector fits can be

Window condition for LVQ2.1 algorithm.webp

Condition for window for LVQ2.1 algorithm.webp

The weights can be updated by:

Updating weights in LVQ2.1 algorithm.webp

Weights updating in LVQ2.1 algorithm.webp

LVQ3 algorithm

In LVQ3, learning is further extended for the cases where the input vector, the winning vector, and the other closest vector belong to the same class label.

Here, the window condition can be

Window condition for LVQ3 algorithm.webp

The weights can be updated by:

Updating weights for LVQ3 algorithm.webp

Weights updating for LVQ3 algorithm.webp

where m ranges from 0.1 < m < 0.5 is a stabilizing constant.

Pros and cons of learning vector quantization

LVQ offers several benefits. It is straightforward, instinctive, and simple to execute while yielding respectable performance. It can be considered one of the most powerful algorithms for prototype-based classification. Even though other popular ML algorithms such as support vector machines and other deep learning architectures can achieve excellent results, LVQ is a smart alternative. It enables lower complexity and reduces computational costs.

Note, however, that the Euclidean distance can bring about issues if the data has many aspects or is noisy. Appropriate standardization and preprocessing of features are necessary. Dimensionality reduction also needs to be performed if the dataset has many dimensions.

Real-world applications of learning vector quantization

LVQ is mostly used in:

  • Fraud detection
  • Intelligent sensor systems
  • Real-time adaptive traffic signal control
  • Fault diagnosis
  • Text categorization
  • Advanced driver assistance systems
  • Multi-class classification.

In this article, we learnt the basics of the learning vector quantization algorithm, its architecture, and its workflow. We also implemented it using Python. Practice it using various datasets to get a better understanding of how it works.

Author

  • Author

    Turing

    Author is a seasoned writer with a reputation for crafting highly engaging, well-researched, and useful content that is widely read by many of today's skilled programmers and developers.

Frequently Asked Questions

Learning vector quantization is a supervised learning algorithm that is mainly used for classification problems.

Learning vector quantization is a supervised learning algorithm for classifying input data using simple vector and distance calculations. Vector quantization is an unsupervised destiny estimator.

LVQ is related to the KNN algorithm. One of the disadvantages of KNN is that the entire training dataset needs to be analyzed. On the other hand, LVQ allows you to select the number of training samples of your choice. It reduces memory utilization.

One of the important advantages of learning vector quantization is the predefined model complexity that is determined by the prototypes. It uses limited memory and offers faster calculations.

View more FAQs
Press

Press

What's up with Turing? Get the latest news about us here.
Blog

Blog

Know more about remote work.
Checkout our blog here.
Contact

Contact

Have any questions?
We'd love to hear from you.

Hire and manage remote developers

Tell us the skills you need and we'll find the best developer for you in days, not weeks.

Hire Developers