A graph is a descriptive way of representing the relationship between different entities. Meanwhile, an entity - also known as a node - represents the depicted individuals or factors that correlate on the graph. The entire representation of relationships between various nodes connected by links in a graph is studied and used to extract information by performing various analyses. It’s what is referred to as graph analytics. Venturing deep into the same includes fundamental concepts that focus on understanding the graph from different perspectives. Centrality is one such concept and this is what the article will focus on.
Centrality is a crucial concept in graph analytics that deals with distinguishing important nodes in a graph. Simply put, it recognizes nodes that are important or central among the whole list of other nodes in a graph.
With so many angles that can define the importance of every node, various metrics are taken into consideration to study each node from a different perspective. Centrality offers all the relevant analytical information regarding that node and the graph from which the conclusion can be derived.
The different perspectives of a particular node are studied under different indices, which are collectively known as centrality measures. They help in gaining a better grasp of the network and cutting through the chaos while extracting information from a network. Since importance is defined differently by each measure, it's vital to understand them all to decide on the best one for graph visualization applications.
Before diving into degree centrality, here’s a little refresher on the degree of a node in a graph.
There are two kinds of graphs – directed and non-directed. The degree of a node in a non-directed graph is defined as the number of links a node has with other nodes. In a directed graph, the degree is subdivided into in and out degrees. In-degree refers to the links incident on the node while out-degree is the number of nodes directed at other nodes from a particular node.
Degree centrality defines the importance of a node based on the degree of that node. The higher the degree, the more crucial it becomes in the graph. It’s used to find popular individuals, the most connected individuals, individuals who connect quickly in a wider network, or the ones that hold the most information.
It’s one of the simplest of all the centrality measures of node connectivity, and is used in transactional data, account activity, etc.
Closeness centrality identifies a node's importance based on how close it is to all the other nodes in the graph. The closeness is also known as geodesic distance (GD), which is the number of links included in the shortest path between two nodes.
To calculate that closeness or GD for a node, sum up all the GD amidst that and all the other nodes in the network on the graph.
Closeness centrality finds application in identifying individuals that are in a position to influence the entire network in the fastest way possible. It’s also used to identify influences outside a highly connected network. Individuals with a high score of this centrality can acquire and control crucial information within the organization. A third application is to predict the importance of words in a particular document on the basis of a graph-based keyphrase extraction process.
This centrality is a type of closeness centrality. As mentioned, GD is measured between nodes. The harmonic centrality measures give a more accurate measure of closeness in a case when some of the nodes are outside the perimeter of reach.
Betweenness centrality defines the importance of any node based on the number of times it occurs in the shortest paths between other nodes. It measures the percent of the shortest path in a network and where a particular node lies in it.
A node with high betweenness centrality is considered the most influential one over other nodes in the network. This is because this measure can provide insights into the most critical path as disrupting them will disrupt the network.
Betweenness centrality is used to analyze global terrorism networks. Anti-terrorism agencies utilize the information from these measures to detect and eliminate possible threats. It’s also used to measure the network flow in telecommunication networks or e-commerce package delivery processes. In addition, microbloggers use this centrality to enhance their reach on Twitter with the assistance of a recommendation engine. It shows who they should connect with to form a bigger communication network.
Eigenvector centrality defines a node's importance based on the function of its neighboring nodes. For instance, consider a node in a network. Check all the nodes it’s connected to. If a node is linked to or surrounded by highly important nodes in a network, it ought to have a high eigenvector centrality score. It’s what makes it an important part of the whole network.
Relationships with high scoring nodes have more contribution to the score of a node than connections to nodes with low eigenvector centrality scores.
This centrality identifies nodes that influence the whole network, not only those that are linked. A high score of a node will automatically mean that its neighboring connected nodes have high scores too. It has a wide range of applications, one of which is the calculation of PageRank used by Google. It also finds use in understanding human social networks, malware propagation, etc.
A subvariant in the eigenvector centrality is PageRank. It’s defined as the measure of directional influence of nodes and, thus, is most suited for directed graphs.
Since eigenvector centrality is suitable for undirected graphs, there wasn’t one for directed graphs before PageRank entered the picture. It has applications in online platforms like Twitter, which uses this centrality to offer recommendations of other accounts that a user can follow.
PageRank is also used to detect flaws in the fraud detection system utilized in the insurance and healthcare industries. The algorithm even predicts traffic workflows in public spaces and streets by running over a graph that includes road intersections.
ArticleRank is a subvariant of PageRank and gives a measure of the transitive influence of nodes. However, it contradicts the assumption of the nodes’ relationship which narrates that relationships with low out degree nodes are important compared to relationships from higher out degree nodes. Now that all the centrality measures have been explained, you can compare nodes in your network based on various perspectives. With massive nodes in millions of networks worldwide, the centrality measures are tools that offer strong conclusions.
Tell us the skills you need and we'll find the best developer for you in days, not weeks.