Connectivity in Adjacency Matrix Representations
I won't go into the exact definition of adjacency matrices – it is a way to represent directed and undirected graphs (the edges \(E\) of some graph \(G\)).
It is a straightforward representation, but while I was messing around with it, I found some interesting properties.
Take the following adjacency matrix representation of a 6-node graph for example (note that \(a_{i,j}\) represents an edge \(j \longrightarrow i\). There is a reason why this is backwards; we will soon see.) :
$$A= \left[
\begin{array}{c|cccccc}
&a&b&c&d&e&f\\\hline
a & 0 & 1 & 0 & 0 & 0 & 0 \\
b & 1 & 0 & 0 & 0 & 0 & 0 \\
c & 0 & 1 & 0 & 0 & 0 & 0 \\
d & 0 & 0 & 1 & 0 & 0 & 0 \\
e & 0 & 0 & 0 & 1 & 0 & 0 \\
f & 0 & 0 & 1 & 0 & 1 & 0 \\
\end{array}
\right]$$
How would we determine the connectivity of this matrix (the reachability between pairs of nodes)?
Interesting enough, with the way we set up this matrix, we can view it a linear operator that can evaluate whether certain nodes are reachable.
Let's create some mask \(m\) defined as the following
$$m = \begin{bmatrix}0\\1\\0\\0\\0\\0\end{bmatrix}$$
What this mask is trying to express is that we are only concerned with the connectivity from the second node, namely \(b\). Now, if we operate on this mask, we would obtain
$$A m = \begin{bmatrix}1\\0\\1\\0\\0\\0\end{bmatrix} \implies \left[\begin{array}{c|c}1&a\\0&b\\1&c\\0&d\\0&e\\0&f\end{array}\right]$$
This result then represents our mask's connectivity (node \(b\)) to all other nodes. We can see the augmented vector on the right, where it shows that node \(a\) and \(c\) have an inwards edge from \(b\).
This is an interesting lens to look at the matrix representation, but this is nothing new. We already know this information by looking at the second column of the matrix. What makes everything interesting is when we apply the operator a second time.
Notice when we applied the operator the first time, we are looking at the edges themselves as found in the original matrix – they are essentially paths of length 1. If we apply the operator another time, we are looking at paths of length 2, since we are using connectivity information for paths of length 1. Let us still use \(b\) as the starting point for our example
$$A^2 m = \begin{bmatrix}0\\1\\0\\1\\0\\1\end{bmatrix} \implies \left[\begin{array}{c|c}0&a\\1&b\\0&c\\1&d\\0&e\\1&f\end{array}\right]$$
Sure enough, these numbers represent the reachability from \(b\) with paths of length 2. This might be hard to visualize with just a matrix, but if we draw out the graph and look at it, it becomes a lot clearer.
We can see three reachable nodes with path of length 2:
- \(b \to a \to b\)
- \(b \to c \to d\)
- \(b \to c \to f\)
And on the vector \(A^2m\) we can certainly see them reflected with 1s at \(b\), \(d\), and \(f\).
Now, since we can view the connectivity from a single mask (which is essentially extracting one column from the matrix, can't we do it will all the columns? Indeed, if we calculate the operator itself raised to some power \(n\), \(A^n\) it will show us the reachability information between nodes with a path of length \(n\). As an example, \(A^2\) looks like this
$$A^2= \left[\begin{array}{c|cccccc}
&a&b&c&d&e&f\\\hline
a & 1 & 0 & 0 & 0 & 0 & 0 \\
b & 0 & 1 & 0 & 0 & 0 & 0 \\
c & 1 & 0 & 0 & 0 & 0 & 0 \\
d & 0 & 1 & 0 & 0 & 0 & 0 \\
e & 0 & 0 & 1 & 0 & 0 & 0 \\
f & 0 & 1 & 0 & 1 & 0 & 0 \\
\end{array}\right]$$
Similarly, if we want to look at the reachability with paths of length 6, we raise it to the 6th power. We see that there are only viable paths from \(a\) and \(b\). This is because these two nodes are cyclic while all other nodes are not. Since we only have 6 nodes, length 6 paths can only be achieved through this cyclic loop.
$$A^6= \left[\begin{array}{c|cccccc}
&a&b&c&d&e&f\\\hline
a & 1 & 0 & 0 & 0 & 0 & 0 \\
b & 0 & 1 & 0 & 0 & 0 & 0 \\
c & 1 & 0 & 0 & 0 & 0 & 0 \\
d & 0 & 1 & 0 & 0 & 0 & 0 \\
e & 1 & 0 & 0 & 0 & 0 & 0 \\
f & 0 & 2 & 0 & 0 & 0 & 0 \\
\end{array}\right]$$
Now, what if we want to check the connectivity of the entire graph (weakly connected, connected, or strongly connected)? We can still apply this method. This time, however, since connectivity could be with paths of varying lengths, we want to consider all possibilities. Since we have 6 nodes, we know that the maximum number of edges that we can possibly need to traverse to another node (assuming it is possible at all) is at most 6 (the case when we traverse through all nodes and return to our original node). If we then add up the matrices for each of these cases, we would be able to find out which nodes have a path between them (note that the path we are referring to doesn't require all distinct edges. It is also known as a walk, simple path, or trial.). Let's call this connectivity matrix for paths with length \(\leq n\) the matrix \(C_n\).
This matrix is thus defined as
$$C_n = \sum_{i = 1}^n A^i$$
For our case \(n = 6\), we find
$$\sum_{i = 1}^6 A^i = \left[\begin{array}{c|cccccc}
&a&b&c&d&e&f\\\hline
a & 3 & 3 & 0 & 0 & 0 & 0 \\
b & 3 & 3 & 0 & 0 & 0 & 0 \\
c & 3 & 3 & 0 & 0 & 0 & 0 \\
d & 2 & 3 & 1 & 0 & 0 & 0 \\
e & 2 & 2 & 1 & 1 & 0 & 0 \\
f & 3 & 5 & 2 & 1 & 1 & 0 \\
\end{array}\right]$$
We see that most of our non-zero entries are below the diagonal. This means we can only reach nodes with labels that appear later in the alphabetical order than the current node. This is with the exception of \(a \to a\), \(b \to a\), and \(b \to b\) of course. This also means that the graph is connected!
$$C_6 + {C_6}^T = \left[\begin{array}{c|cccccc}
&a&b&c&d&e&f\\\hline
a & 6 & 6 & 3 & 2 & 2 & 3 \\
b & 6 & 6 & 3 & 3 & 2 & 5 \\
c & 3 & 3 & 0 & 1 & 1 & 2 \\
d & 2 & 3 & 1 & 0 & 1 & 1 \\
e & 2 & 2 & 1 & 1 & 0 & 1 \\
f & 3 & 5 & 2 & 1 & 1 & 0 \\
\end{array}\right]$$
To clarify, the requirement for the graph to be connected is not that all entries under the diagonal are non-zero, but instead \(C_n + {C_n}^T\) should have non-zero entries everywhere except for the diagonal (which is true in our case of \(n = 6\)). This is the case because of the definition of being "connected"
Now how about strongly connected? This is easy to derive, we only have to change up our previous condition slightly.
Let's test this out by simply adding an edge \(f \to a\) to our original adjacency matrix – this should make the graph strongly connected.
The new matrix \(A'\) should look like
$$A'= \left[
\begin{array}{c|cccccc}
&a&b&c&d&e&f\\\hline
a & 0 & 1 & 0 & 0 & 0 & 1 \\
b & 1 & 0 & 0 & 0 & 0 & 0 \\
c & 0 & 1 & 0 & 0 & 0 & 0 \\
d & 0 & 0 & 1 & 0 & 0 & 0 \\
e & 0 & 0 & 0 & 1 & 0 & 0 \\
f & 0 & 0 & 1 & 0 & 1 & 0 \\
\end{array}
\right]$$
Now let's find \(C_6\) of \(A'\)
$$C_6 = \left[\begin{array}{c|cccccc}
&a&b&c&d&e&f\\\hline
a & 7 & 7 & 6 & 2 & 4 & 4 \\
b & 4 & 7 & 3 & 2 & 2 & 4 \\
c & 4 & 4 & 3 & 1 & 2 & 2 \\
d & 2 & 4 & 2 & 1 & 1 & 2 \\
e & 2 & 2 & 2 & 1 & 1 & 1 \\
f & 3 & 6 & 3 & 2 & 2 & 3 \\
\end{array}\right]$$
Notice how there are no zero entries outside the diagonal (well there are no zero entries on the diagonal either), this means our definition is correct!
Now, if we want to actually implement our method here as an algorithm, it becomes rather costly. We would be looking at a complexity of \(O(n^3)\) which is enormous compared to breadth-first search, which only takes \(O(m + n)\) where \(m\) is the number of edges and \(n\) is the number of nodes.
So, I guess I'll just leave it as something interesting that reflects some qualities of BFS while being rather inefficient.