# A full implementation of breadth-first search of a graph

In this post I’ll share a full, standalone implementation of breadth-first search of a graph. Breadth-first search (hereafter BFS) is used in many graph algorithms. In particular it is useful for finding the shortest path (in terms of fewest vertices, or “hops”) between two vertices A and B.

There are many pseudocode examples showing breadth-first search of a graph on the interwebs, and lots of textual descriptions. There are also many videos on Youtube talking about this algorithm.

Unfortunately, despite all of the above, I had a really hard time finding the complete source code of a graph traversal that I could read and study — at least, one that was at the right level of complexity for me to actually learn how it works. A lot of the code I found online or in books was too abstract and hid implementation details from me by relying on “helper” libraries which were not really explained. For example, this seemed to be the case with the Java code from Sedgewick. The underlying data structures of most algorithms’ implementations were hidden away in library code which I would have needed to study to understand what was really happening.

On the other hand, the couple of production-quality graph implementations I looked at were very hard to read for someone not very familiar with the code and the algorithm already.

I knew that in order to really learn how this algorithm works, I needed to aim somewhere in the middle. In the end, I decided to implement it myself to make sure I understood it. I’m sure that’s pretty common!

So what do I mean by a full, standalone implementation?

I mean that you can download just the code on this page, and run it on any machine where Perl is installed, and it will work. Other than the language it’s written in, there are no external libraries or other dependencies.

Without further ado here is my take on a full implementation of breadth-first search of a graph. If you download only the code on this page and run it on your machine, it will work – the batteries are included, as they say.

(Incidentally, the statement above is also true of the Scheme code from Simple Network Search in Scheme, if you’d rather look at an implementation in a non-ALGOL language. However, that code is not really “my own creation”, it’s merely ported to Scheme from a Lisp book.)

## Algorithm prose description

At a high level, the way BFS works is:

• 1. Push the starting vertex A onto a queue
• 2. While the queue is not empty, get the next vertex from it (in this case A, since we just added it)
• 3. Look at each neighboring vertex N of A and take the following steps:
• 1. If N has already been marked as visited, skip it and go to the next neighbor of A.
• 2. Add the pair `(N, A)` to a spanning tree (or a reverse linked list). If we find the end node B, we will walk this spanning tree backward to find the path from the end vertex B back to the start vertex A.
• 3. If the currently visited neighbor N is the end vertex B we’re looking for, we walk the spanning tree backward to build the path taken from B back to A.
• 4. If the currently visited neighbor N is not the end vertex, we push N onto the queue to be processed.
• 5. Finally, we mark N as visited.

## Pseudocode

To supplement the prose description above, here is some pseudocode for BFS. It maps pretty directly onto the concrete implementation we will get into in the next section. I did complain a little above about pseudocode, but of course I’ll subject you to mine just the same. :-} I do think this is a reasonably “complete” pseudocode that you could actually use to implement BFS, unlike many examples that I found.

```    function bfs(start, end, graph) {

push start onto queue
mark start as seen
make a spanning tree for storing the path from start to end

while queue is not empty {
vertex = get next item from queue
for each neighbor of vertex in the graph {
if neighbor has been seen {
next
}
add (neighbor, vertex) to the spanning tree
if this neighbor is the end node we want {
walk the spanning tree, printing path from start to end
exit
}
else {
push neighbor onto the queue
}
mark this neighbor as visited
}
}
print "No path found"
exit
}
```

## Implementation

The code below shows a direct implementation of BFS in Perl. The graph it will search is the one shown in the diagram at the top of this post. In code, the data structure for the graph is as shown below. It’s an array of arrays: the first element of each array is the vertex A, followed by an array of its neighbors. This is the classic “adjacency list” representation of a graph.

```    my \$graph = [['s', ['a', 'd']],
['a', ['s', 'b', 'd']],
['b', ['a', 'c', 'e']],
['c', ['b']],
['d', ['s', 'a', 'e']],
['e', ['b', 'd', 'f']]];
```

Next, here is the main loop of BFS. If you know Perl, you can see that it maps pretty much 1:1 to the description in the above pseudocode. The shape of this code would be pretty much the same in another language such as Python or Ruby, with a few small syntactic differences, but hardly any semantic differences.

```    sub find_path_between {
my ( \$start, \$end, \$graph ) = @_;

return () unless defined \$start && defined \$end;

my @path;     # Path so far
my @queue;    # Vertices still to visit.
my %seen;     # Vertices already seen.
my \$found;    # Whether we have found the wanted vertex.
my \$st = {};  # Spanning tree, used to find paths.

if ( \$start eq \$end ) {
push @path, \$start;
return @path;
}

push @queue, \$start;
\$seen{\$start}++;

while (@queue) {
my \$v         = shift @queue;
my \$neighbors = get_neighbors( \$v, \$graph );

for my \$neighbor (@\$neighbors) {
next if \$seen{\$neighbor};
if ( \$neighbor eq \$end ) {
\$found++;
@path = _st_walk( \$start, \$end, \$st );
return @path;
}
else {
push @queue, \$neighbor;
}
\$seen{\$neighbor}++;
}
}
return \$found ? @path : ();
}
```

Once the main loop structure is written so that you are walking the vertices of the graph in the correct (breadth-first) order, one part I found slightly tricky was keeping a “trail of bread crumbs” from the end node back to the start. You may have noticed that the code above uses two helper functions — `_st_add` and `_st_walk` — that hide that complexity away a little bit. We will now look at them in more detail.

`_st_add` is trivial, and could have been written directly as a hash table access. The idea is to keep a logical “reverse linked list” structure in a hash table, where each vertex added to the list has a link pointing back to the previous vertex. I hid it from myself in this function so I could read the logic in the main search loop more easily.

```    sub _st_add {
my ( \$vertex, \$neighbor, \$st ) = @_;
\$st->{\$neighbor}->{prev} = \$vertex;
}
```

`_st_walk` is a little more interesting. As noted above, we kept a trail of bread crumbs from our end vertex (if such exists) back to the start vertex. `_st_walk` walks that trail of crumbs backward and builds an array holding all of the vertices visited along the way, which it returns to the caller.

```    sub _st_walk {
my ( \$start, \$end, \$st ) = @_;

my @path;
push @path, \$end;

my \$prev = \$st->{\$end}->{prev};
while (1) {
if ( \$prev eq \$start ) {
push @path, \$start;
last;
}
push @path, \$prev;
\$prev = \$st->{\$prev}->{prev};
next;
}
return reverse @path;
}
```

Finally, you may have noticed that in order to visit all of the neighbors of some node A we have to be able to list them. The function `get_neighbors` handles that task, along with its helper function `_find_index`. Let’s look at them in turn.

`get_neighbors` looks up some node N in the graph, and returns its list of neighbors, if there are any.

```    sub get_neighbors {
my ( \$k, \$graph ) = @_;

my \$index = _find_index( \$k, \$graph );

if ( defined \$index ) {
return \$graph->[\$index]->[1];
}
else {
return;
}
}
```

`_find_index` looks up the node N‘s index in our array-based graph representation. This is actually not a very performant way to do this, since it’s doing a linear search into an array. There are ways to speed this up, such as using a hash of arrays for faster vertex lookup, or using a cache. However I felt it would be better to keep the graph’s data representation as simple as possible for this example. (Incidentally, the below is exactly the sort of somewhat uninteresting but very necessary bookkeeping code I was having a hard time finding in much of the example code in books or online.)

```    sub _find_index {
my ( \$wanted, \$graph ) = @_;

# Naive linear search, for now.
my \$i = 0;
for my \$elem (@\$graph) {

# Definedness check here is necessary because we delete
# elements from the graph by setting the element's index to
# undef.  In other words, some graph indices can be undef.
if ( defined \$elem->[0] && \$elem->[0] eq \$wanted ) {
return \$i;
}
\$i++;
}
return;
}
```

Finally, we have a `main` function that drives everything. First, we find the shortest path (by number of nodes) from ‘s’ to ‘c’. Then, we find the shortest path from ‘s’ to ‘f’:

```    sub main {

my \$graph = [
[ 's', [ 'a', 'd' ] ],
[ 'a', [ 's', 'b', 'd' ] ],
[ 'b', [ 'a', 'c', 'e' ] ],
[ 'c', ['b'] ],
[ 'd', [ 's', 'a', 'e' ] ],
[ 'e', [ 'b', 'd', 'f' ] ]
];

my \$start = 's';
my \$end   = 'c';

my @path = find_path_between( \$start, \$end, \$graph );

print qq[Path from '\$start' to '\$end' is: @path\n];

# Find a second path.
\$end  = 'f';
@path = find_path_between( \$start, \$end, \$graph );
print qq[Path from '\$start' to '\$end' is: @path\n];
}
```

Putting everything above together and running it will print the following output:

```    Path from 's' to 'c' is: s a b c
Path from 's' to 'f' is: s d e f
```

## References

1. Aho, Ullman, Hopcroft – Data Structures and Algorithms.

2. Sedgewick – Algorithms in Java, Part 5: Graph Algorithms.

3. Orwant, Hietaniemi, Macdonald – Mastering Algorithms with Perl.

## Appendix: The complete program listing

```    #!perl

use strict;
use warnings;

sub find_path_between {
my ( \$start, \$end, \$graph ) = @_;

return () unless defined \$start && defined \$end;

my @path;     # Path so far
my @queue;    # Vertices still to visit.
my %seen;     # Vertices already seen.
my \$found;    # Whether we have found the wanted vertex.
my \$st = {};  # Spanning tree, used to find paths.

if ( \$start eq \$end ) {
push @path, \$start;
return @path;
}

push @queue, \$start;
\$seen{\$start}++;

while (@queue) {
my \$v         = shift @queue;
my \$neighbors = get_neighbors( \$v, \$graph );

for my \$neighbor (@\$neighbors) {
next if \$seen{\$neighbor};
if ( \$neighbor eq \$end ) {
\$found++;
@path = _st_walk( \$start, \$end, \$st );
return @path;
}
else {
push @queue, \$neighbor;
}
\$seen{\$neighbor}++;
}
}
return \$found ? @path : ();
}

sub _st_walk {
my ( \$start, \$end, \$st ) = @_;

my @path;

push @path, \$end;
my \$prev = \$st->{\$end}->{prev};
while (1) {
if ( \$prev eq \$start ) {
push @path, \$start;
last;
}
push @path, \$prev;
\$prev = \$st->{\$prev}->{prev};
next;
}
return reverse @path;
}

my ( \$vertex, \$neighbor, \$st ) = @_;
\$st->{\$neighbor}->{prev} = \$vertex;
}

sub get_neighbors {
my ( \$k, \$graph ) = @_;

my \$index = _find_index( \$k, \$graph );

if ( defined \$index ) {
return \$graph->[\$index]->[1];
}
else {
return;
}
}

sub _find_index {
my ( \$wanted, \$graph ) = @_;

# Naive linear search, for now.
my \$i = 0;
for my \$elem (@\$graph) {

# Definedness check here is necessary because we delete
# elements from the graph by setting the element's index to
# undef.  In other words, some graph indices can be undef.
if ( defined \$elem->[0] && \$elem->[0] eq \$wanted ) {
return \$i;
}
\$i++;
}
return;
}

sub main {
my \$graph = [
[ 's', [ 'a', 'd' ] ],
[ 'a', [ 's', 'b', 'd' ] ],
[ 'b', [ 'a', 'c', 'e' ] ],
[ 'c', ['b'] ],
[ 'd', [ 's', 'a', 'e' ] ],
[ 'e', [ 'b', 'd', 'f' ] ]
];

my \$start = 's';
my \$end   = 'c';

my @path = find_path_between( \$start, \$end, \$graph );

print qq[Path from '\$start' to '\$end' is: @path\n];

# Find a second path.
\$end  = 'f';
@path = find_path_between( \$start, \$end, \$graph );
print qq[Path from '\$start' to '\$end' is: @path\n];
}

main();
```