Search Contact information
University of Cambridge Home Department of Engineering
University of Cambridge >  Engineering Department >  computing help >  Languages >  C++

Trees, Graphs and C++

This document aims to provide enough background information to encourage you to write graph-related C++ code. First some Standard Containers are shown in action, and their use extended to deal with user-defined classes. Then some Tree and Graph concepts are introduced. Finally implementation ideas are given to help tackle some exercises, building on C++'s supplied containers.

Introduction to Standard Containers

Many books exist dealing with sorting and organising data. With the emergence of languages that support data abstraction, people tried to generalise data structures. The Smalltalk language introduced some useful ideas. These have been developed and brought into the mainstream with C++'s Standard Library. The Standard Library includes containers that hold data. Containers are grouped into 3 types These containers have many member functions in common. For example, s.size() gives the number of elements in the container s if s is a set, a list or even a string of characters. The Standard Library also includes algorithms that operate on the containers. To be of maximum use these algorithms need to be able to operate on as many types of containers as possible, so each container has member functions that give algorithms access to the data. For example, s.begin() will always return an Iterator (like a pointer) to the first element of a container s. By hiding the implementation details of containers in this way, high-level generic algorithms can be written which will not only work with the standard containers but also with any new containers that a programmer might write.

The Standard Library doesn't contain a tree container, but if you wrote one so that it had the standard hooks that algorithms require, then algorithms like copy etc. should work with it. Clearly a tree container would require some special routines that other containers wouldn't support, but C++ makes it easy to add functionality in, building on what you already have.

Developing Containers

Ways to extend the basic use of containers include

Queues

Before we look at trees, let's see how to use some simpler containers.
#include <deque>                                                       
#include <algorithm>                                            
#include <iostream>                                                        
using namespace std;                                                    
                                                                       
int main()                                                                 
{                                                             
deque<int> Q;       //  An empty, elastic, container ready to hold ints.
// A deque is a double-ended queue. Queues (unlike lists) are
// data structures where you wouldn't expect to insert new items
// anywhere else but at the ends

Q.push_back(0);     //  Many containers have a push_back
                    //  routine. It's a way to add elements
                    //  to a container in a safe way - the
                    //  container will grow if it's not big enough.
Q.push_back(2);                                                          
Q.push_back(1);     

for(int i=0;i<Q.size(); i++){
   cout << Q[i].value; cout << " ";
}
cout <<endl;

// Now let's sort!          
sort(Q.begin(), Q.end());
// the arguments we've given to sort mean that the whole queue
// will be sorted, and that the result will be copied back to
// the original queue. sort uses the default sorting criterium
// for integers, and can use the swap function of Q to switch
// elements around. 

for(int i=0;i<Q.size(); i++){
   cout << Q[i].value; cout << " ";
}
cout <<endl;
// Output: 0 1 2
                    
}
To create a queue of strings rather than a queue of integers, we don't need to change much. This ease of coping with different data types will be useful when we come to create trees.
#include <deque>                                                       
#include <string> // added so that we can use strings
#include <algorithm>                                            
#include <iostream>                                                        
using namespace std;                                                    
                                                                       
int main()                                                                 
{                                                             
deque<string> Q;       // changed

Q.push_back("standard");           
Q.push_back("template");                                                        
Q.push_back("library");     

for(int i=0;i<Q.size(); i++){
   cout << Q[i].value; cout << " ";
}
cout <<endl;

// Now let's sort!          
sort(Q.begin(), Q.end());
// sort uses the default sorting criterium for strings - alphabetical

for(int i=0;i<Q.size(); i++){
   cout << Q[i].value; cout << " ";
}
cout <<endl;
}
As well as being able to have queues of standard items like strings and integers, we can also invent our own types and have queues of those. The main problem is that when we sort we have to specify the sort criteria for our own types. It's sufficient to specify what the '<' operator should do for your own type. Here's an example
#include <deque>                                                       
#include <algorithm>                                            
#include <iostream>                                                        
using namespace std;                                                    

class simple {
public:
  int value;
  // redefine the '<' operator, so when 2 simple objects are
  // compared, the 'value' fields of the objects are compared.
  const bool operator<(const simple &rhs) const
        { return this->value < rhs.value; }
};
                                                                       
int main()                                                                 
{                                                             
 deque<simple> Q;       //  An empty, elastic, container.

 simple s;
 s.value=0;
 Q.push_back(s); 
 s.value=2;
 Q.push_back(s);   
 s.value=1;                                          
 Q.push_back(s);     

 for(int i=0;i<Q.size(); i++){
   cout << Q[i].value; cout << " ";
 }
 cout <<endl;
 // Output: 0 2 1

 // Now let's sort! sort uses the '<' operator that we created in the 'simple' class   
 sort(Q.begin(), Q.end());

 for(int i=0;i<Q.size(); i++){
   cout << Q[i].value; cout << " ";
 }
 cout <<endl; 
 // Output: 0 1 2
}

Trees

There are many types of tree structures (threaded trees, Splay trees, 2-3 trees, etc). The choice depends on many factors - how dynamic the data is, whether best-case or worst-case behaviour is the most important, whether you want to avoid recursion, etc. Trees are often a natural choice (e.g. when writing a chess program) but they are also effective in many situations where data that's received sequentially can be beneficially stored in a more organised fashion - e.g. parsing mathematical expressions, parsing sentences, reading words that later need to be searched, or retrieved in alphabetical order.

Binaries trees (where each node has at most 2 children) are commonly used - not because they most closely model reality, but because they're computationally and mathematically more easy to deal with.

Suppose that a program were to be given 10,000 words (non-unique) and had to list the unique words used. Suppose also that memory was at a premium, so that storing every non-unique word is to be avoided. Each time a word is read the program needs to check if the word's new and store it only if it is. If the words are stored in an array alphabetically, searching will be faster, but adding a word to the start of the array will be slow because subsequent words will have to be shifted.

Using a binary tree will speed things up. Each time a word is read, the program starts at the top of the tree. If the node is empty, the word is stored there, otherwise the left or right branch is taken according to whether the word comes before or after the word stored in the node. For example, suppose the first 5 words were "standard", "template", "library", "data" and "structures". "standard" would be stored in the top node. "template" would be stored in the right-child node of the top node. "library" would be stored in the left-child node of the top node.

              standard
               /   \
          library  template
"data" comes before "standard" so the left link will be taken. "data" comes before "library", so "data" will be stored in the left-child node of "library". The final tree looks like this
              standard
               /   \
          library  template
             /      /
           data   structures
If this tree remains balanced, then it could fit 10,000 words within 15 layers, so a word-search would require only 15 comparisons at most. Insertions (new nodes) are cheap too. Contrast this with the worst case array scenario, where 10,000 comparisons might need to be made before a search is complete!
However, if the words are supplied in alphabetical order, this tree-based method suffers - each new word will add a right-node, deepening the tree each time, making searching as slow as with arrays. Various techniques exist to combat this. The benefits of these techniques are so enormous that simple binary trees aren't used in big programs - see the appendix.

Tree Traversal

There are 2 basic approaches to brute-force tree searching - breadth-first and depth-first. With breadth-first you start at the top node then visit all its childen before visiting its children's childen - i.e. you explore the tree layer by layer. With depth-first searching you keep going down, backtracking when you can go no further. The purpose of the search and the shape of the tree affect the choice of approach. For example, if you're playing chess and you don't have much time to think, it's better to do a breadth-first search rather than looking at one move (which might be bad anyway) at great depth.

With either method there's a choice of ways to order operations on the node.

As an example consider the following tree, which represents a mathematical expression.
             -
           /   \
          *     *
         / \   / \
        2   + 3   -
           / \   / \
          x   5 y   4
If you do a depth-first search, processing each left-branch and right-branch of a node before printing the node's contents you'll get
      2 x 5 + * 3 y 4 - * -
(check to see if you agree with this!) which is the expression in post-fix notation but if you deal with nodes differently (still depth-first search, but this time processing the left-branch, printing the contents of the node, then processing the right-branch) you'll get something much closer to the normal representation.

The depth-first algorithm can be described more formally as follows

  1. Form one element queue Q consisting of the root node
  2. Until the Q is empty or the goal has been reached, determine if the first element in Q is the goal.
    1. If it is, do nothing
    2. If is isn't, remove the first element from Q and add the first element's children, if any, to the front of the queue (so that children are searched before siblings)
  3. If the goal is reached, success; else failure
Try this algorithm by hand on the following tree
             A
           /   \
          B     C
         / \   / \
        D   E F   G
           / \   / \
          H   I J   K
The Standard Library makes it relatively easy to translate this algorithm into code nearly line-by-line. If each vertex has a list of sub-vertices called child then
    
    list<Vertex > q;
    q.push_back( start ); 
creates and initialises a list, and
     while( !q.empty( ) ){
         Vertex v = q.front( ); // get the first item
         if (!v.goal_reached()) { 
            q.pop_front( );      // remove it from the list
            for( int i = 0; i < v.child.size( ); i++ )
               q.push_front(v.child[i]); // push each child to the front of the queue
         }
     }
modifies the list.

It's sometimes possible to sort this list of nodes to be visited so that the most promising nodes are visited first.

[a landscape]
Suppose you started at point A of this 2D landscape. The code for these methods follows the stages of the algorithm closely. We've seen earlier how sort can be used to sort a list of data structures like Vertex
     deque<Vertex> q;
     deque<Vertex> new_wave;
     while( !q.empty( ) ){
         Vertex v = q.front( ); // get the first item
         if (!v.goal_reached()) { 
            q.pop_front( );      // remove it from the list
            for( int i = 0; i < v.child.size( ); i++ )
               new_wave.push_front(v.child[i]); // push each child onto a queue
            // Now decide whether to sort this new queue before adding
            // it to the main queue, or add it to the main queue before
            // sorting the whole queue
            if (mode==hill_climbing) {
              sort(new_wave.begin(),new_wave.end());
              while(!new_wave.empty()){
                 q.push_front(new_wave.back( ));
                 new_wave.pop_back( );
              }
            }
            else {
              while(!new_wave.empty()){
                 q.push_front(new_wave.back( ));
                 new_wave.pop_back( );
              }
              sort(q.begin(),q.end());
            }
         }
     }
Branch and Bound methods break the set of feasible solutions into smaller subsets, and find for each subset a lower bound (as high as possible) on the function one's trying to maximise in the hope of isolating the optimal solution. These subsets often take the form of subtrees. If the subsets are well chosen, and if the bounds are easy to calculate, these methods can greatly reduce the number of nodes that need to be explicitly searched.

Many tree traversal strategies require the program to remember previous nodes in the path so that backtracking is possible. This (and recursion) can use up a lot of memory. With threaded trees each node knows its sucessor - with a depth-first search parents lead to their first child which in turn leads to siblings. The last sibling leads to the next node to be visited. Maintaining such a tree requires more work when nodes are added or removed, and makes traversal less flexible but with each node having a successor, it becomes more possible to treat the Graph as a container on which iterators can operate.

Graphs

A tree is a special case of the more general graph (or net). Graphs have many uses from network analysis to spreadsheet operation. Graphs have vertices (or nodes) and edges (which can be one-way or undirected). Edges can have a cost associated with them, and the cost may depend on the direction taken along the edge. Popular graph-related puzzles include More formally, a graph is a pair (V,E), where V is a finite set and E is a binary relation on V. V is called a vertex set whose elements are called vertices. E is a collection of edges, where an edge is a pair (u,v) with u,v in V. In a directed graph, edges are ordered pairs, connecting a source vertex to a target vertex. In an undirected graph edges are unordered pairs and connect the two vertices in both directions, hence in an undirected graph (u,v) and (v,u) are two ways of writing the same edge.

If some edge (u,v) is in graph , then vertex v is adjacent to vertex u. In a directed graph, edge (u,v) is an out-edge of vertex u and an in-edge of vertex v. In an undirected graph edge (u,v) is incident on vertices u and v. In a directed graph, the number of out-edges of a vertex is its out-degree and the number of in-edges is its in-degree. For an undirected graph, the number of edges incident to a vertex is its degree.

A path is a sequence of edges in a graph such that the target vertex of each edge is the source vertex of the next edge in the sequence. If there is a path starting at vertex u and ending at vertex v we say that v is reachable from u. A path is simple if none of the vertices in the sequence are repeated. A path is a cycle if the first and last vertex in the path are the same. A graph with no cycles is acyclic.

Graph Traversal

Graphs can be traversed much as trees can (depth-first, breadth-first, etc), but care must be taken not to get stuck in a loop - trees by definition don't have cycles, and in a tree there's always only one path from the root to a node whereas in a graph there may be many paths between any pair of nodes.

The following directed graph has 6 nodes. The Dijkstra-Moore algorithm can be used to find the shortest path from a given node to each of the others as long as costs aren't negative. A breadth-first search from the first node radiates out across the graph. If it reaches an unvisited node, it sets its distance from the first node to be the distance of the previous node from the first node, plus the length of the edge from the previous node. If a node's already been visited, it will already have been given a distance from the first node, but this might not be the shortest distance - the current path might be shorter. So a comparison is done, and the node's distance is set to the minimum.

For example, in the graph below, suppose that A was the source node. In the first stage, node B's distance would be set to 7 and node C's distance to 1. During the second stage, routes from nodes B and C would be explored. When node B is reached from node C, the distance is 1+5, so node B's distance from A is reset to 6 from its original 7. As the algorithm progresses, node B's distance will later be reset to 5, which is the final value for node B's distance. The algorithm terminates after 5 stages because by then all of the routes will have been traversed - routes can't have more than number_of_vertices - 1 edges.

[a weighted digraph]

Implementing Graph Algorithms

Graphs provide useful source material for practising C++ - inheritance provides a useful mechanism for integrating node, edge and graph structures, and the Standard Library takes away much of the chore. Some issues to bear in mind are See the References below for a list of books and online courses. Below is one way that graph algorithms could be implemented, showing sample input and output. When this file is fed into the program the output is
B is 5 from the source.
Shortest path is A to C to E to B
C is 1 from the source.
Shortest path is A to C
D is 8 from the source.
Shortest path is A to C to E to D
F is 6 from the source.
Shortest path is A to C to E to B to F
E is 3 from the source.
Shortest path is A to C to E
The program works with trees too. The hill climbing problem above can be set up with the following file
A B 1 anon
B C 1 anon
C D 1 anon
A E 1 anon
E F 1 anon
NODEVALUE A 0
NODEVALUE B 2
NODEVALUE C 0
NODEVALUE D 1
NODEVALUE E 1
NODEVALUE F 3
// For best-first change the next line to  SEARCHMODE bestfirst
SEARCHMODE hill
STARTNODE A
GOALNODE F
Graph member functions include Graphs can also be created programatically rather than being read in from a file. Here's an example which creates a 6-node graph then works out the shortest paths from node "B" -
#include "graphs.cc"

int main( int argc, char *argv[ ] )
{
    Graph g;
    Search_Description s;
    vector<string> nodenames;

    nodenames.push_back("A");
    nodenames.push_back("B");
    nodenames.push_back("C");
    nodenames.push_back("D");
    nodenames.push_back("E");
    nodenames.push_back("F");

    for (int i=0;i<6;i++)
       for (int j=i+1;j<6;j++)
          g.addEdge(nodenames[i],nodenames[j],i+j,"anon");

    g.shortestpaths("B");
    return 0;
}

Exercises

References

Note that the STL (Standard Template Library) has been subsumed into C++'s Standard Library.

Books

Code and documentation

See Also

Appendix

© Cambridge University Engineering Dept
Information provided by Tim Love (tpl)
Last updated: February 2008