Algorithm Library Design: Course Home Page -- Lecture Notes -- Source Code -- References

3. STL and Generic Programming


Introduction

The Standard Template Library (STL) falls into the class of foundation libraries. It provides basic data types, such as list, vector, set, and map, and it provides basic algorithms, such as find and sort. The STL is part of the C++ Standard Library, but not all templates in the standard library belong to the STL, for example, strings are not part of the STL. Historically, the STL started as an Ada library [Musser89]. It became widely recognized as the report [Stepanov95] started circulating in the C++ standardization committee. The STL became in a slightly modified form part of the C++ standard. These modifications make the standard version incompatible to the first STL, but despite that it is still called the STL. A good up-to-date introduction and reference for the STL can be found in [Austern98]. The reference part is also available online from SGI, see [SGI-STL].

The programming paradigm underlying STL is called generic programming. Here is one definition [Jazaeri98]:

Generic programming is a sub-discipline of computer science that deals with finding abstract representations of efficient algorithms, data structures, and other software concepts, and with their systematic organization. The goal of generic programming is to express algorithms and data structures in a broadly adaptable, interoperable form that allows their direct use in software construction. Key ideas include:

Concept and Model

Consider our first example of a function template, swap:
template <class T>
void swap( T& a, T& b) {
    T tmp = a; 
    a = b;
    b = tmp;
}
When the template is instantiated (by calling the function) the placeholder T becomes an actual type. However, compilation can only succeed if this actual type has an assignment operator and a copy constructor. The function could have been implemented using a default constructor and assignment, but the copy constructor is more likely to exist than the default constructor (given that the assignment operator is required anyway).

We can distinguish between syntactic requirements and semantic requirements. The syntactic requirements are the assignment operator and the copy constructor in our example. If an actual type fails to comply with these requirements a compilation error points that out. The semantic requirements are that the copy constructor and the assignment operator should actually copy the values, should be side effect free, and in general should behave according to the C object model, e.g., tmp = y; x = tmp; should give you the same as x = y;. Remember that these are user defined functions. Semantic requirements are not checkable at compile time.

Instead of documenting requirements always in all detail, it is convenient to group them in often used combinations. We call these collections of requirements concepts. The concept for the swap function parameter is called Assignable.

If an actual type fulfills the requirements of a concept, it is a called a model for this concept. In our example, int is a model of the concept Assignable.

Common basic concepts

Concept Syntactic requirements
Assignable copy constructor
assignment operator
Default Constructible default constructor
Equality Comparable equality and inequality operator
LessThan Comparable order comparison with operators <, <=, >=, and >

A regular type is one that is a model of Assignable, Default Constructible, Equality Comparable, and one in which these expressions interact in the expected way, for example, for x = y; we may assume that now x == y true is.

In general, concepts factor out common signature and behavior for template arguments. One can think of a concept as the `greatest common denominator' of all types for which a function template is supposed to work. Of course, the function has then to be implemented using only the operations specified in the concept.

In analogy to the object-oriented paradigm, concepts correspond to virtual base classes, and models correspond to derived classes. However, there is the important difference that concepts are nowhere explicitly coded in the language. They are only communicated in documentations. This is a maintenance disadvantage, but also an advantage, because it avoids the coupling of a common base class. A common base class needs a header file and all derived classes have to agree on this single header file, linking, etc.

In general, the flexibility is resolved at compile time which gives us the advantages of strong type checking and inline efficiency where needed. If runtime flexibility is needed, the generic data structures and algorithms can be parameterized with a base class used in the object-oriented programming to get the runtime flexibility.


Generic Algorithms Based on Iterators

Algorithmic abstraction is a key goal in generic programming [Musser89]. One aspect is to reduce the interface to the data types used in the algorithm to a set of simple and general concepts. One of them is the Iterator concept in STL which is an abstraction of pointers. Iterators serve two purposes: They refer to an item and they traverse over the sequence of items that are stored in a data structure, also known as container class in STL. Five different categories are defined for iterators: input, output, forward, bidirectional and random-access iterators, according to the different possibilities of accessing items in a container class. The usual C-pointer referring to a C-array is a model for a random-access iterator.

The following table shows the different iterator concepts and the refinement relation between them and the basic concepts (see above). The syntactic requirements are only sketched here, see [ISO-C++-98, SGI-STL] for the full requirements.

Concept Refinement of Syntactic requirements
Trivial Iterator Assignable, Equality Comparable operator*()
operator->()
Input Iterator Trivial Iterator operator++(), ...
Output Iterator Assignable operator*(), operator++() ...
Forward Iterator Input Iterator, Output Iterator, Default Constructible ...
Bidirectional Iterator Forward Iterator operator--(), ...
Random Access Iterator Bidirectional Iterator, LessThan Comparable operator+(), operator+=(), operator-(), operator[](), ...

Sequences of items are specified by a range [first,beyond) of two iterators. This notion of a half-open interval denotes the sequence of all iterators obtained by starting with the iterator first and advancing first until the iterator beyond is reached, but it does not include beyond. The iterator beyond is also referred to as the past-the-end position.

A container class is supposed to provide a member type called iterator, which is a model of the Iterator concept, and two member functions: begin() returns the start iterator of the sequence and end() returns the iterator referring to the past-the-end position of the sequence. The list class template example from the previous section can be extended as follows, though we leave the actual implementation of the iterator open.

template <class T>  class list {
    void push_back( const T& t); // append t to list.
    typedef ... iterator;
    iterator begin();
    iterator end();
};
Generic algorithms are not written for a particular container class in STL, they use iterators instead. For example, a generic contains function can be written to work for any model of an input iterator. It returns true iff the value is contained in the values of the range [first,beyond).
template <class InputIterator, class T>
bool contains( InputIterator first, InputIterator beyond, const T& value){
    while ((first != beyond) && (*first != value)) 
        ++first;
    return (first != beyond);
}
This generic contains function can be used with C-pointers referring to a C-array. Recall that C-pointers are a model for a random access iterator, which is more general than an input iterator. The following example declares an array of a hundred integers and searches for a 42.
int a[100];
// ... initialize elements of a.
bool found = contains( a, a+100, 42);
We can also search only a part of an array.
bool in_first_half = contains( a, a+50, 42);
bool in_third_quarter = contains( a+50, a+75, 42);
This generic contains function can also be used with our list class template as illustrated in the following example:
list<int> ls;
// ... insert some elements into ls.
bool found = contains( ls.begin(), ls.end(), 42);
A generic copy function copies the values of an iterator range to a sequence starting where another iterator points to. The copy function returns an iterator pointing to the past-the-end position of the target sequence after copying.
template <class InputIterator, class OutputIterator>
OutputIterator copy( InputIterator first, InputIterator beyond, 
                     OutputIterator result){
    while (first != beyond)
        *result++ = *first++;
    return result;
}
Lets copy 100 elements from an array of integers to another array of integers.
int a1[100];
int a2[100];
// ... initialize elements of a1.
copy( a1, a1+100, a2);
The copy function is writing over the already existing elements in a2. If we want to copy the 100 elements into a list that is empty at the beginning, we cannot use the begin() iterator of the list. For an empty list the begin() iterator is actually equal to the end() iterator, which is not dereferenceable.

The STL provides in these cases small adapters that interface between the concepts. Here, the adapter is a model of an output iterator, and it uses a model of a container class, here the list, to append a new element to the end of this container class whenever an element is written to the iterator. We will see later on how this back_inserter adaptor is actually implemented. Here is the example how it is used with the copy function and the list class assuming we still have the array a1 at hand.

list<int> ls;
copy( a1, a1+100, back_inserter(ls));
There are also adapters to interface between C++ I/O streams and iterators. The following example reads integers from the standard input stream and writes them to the standard output stream, each integer followed by a carriage return "\n". The istream_iterator with the empty parenthesis denotes the past-the-end position for this range, which is the end-of-file condition for the stream.
copy( istream_iterator<int>(cin), istream_iterator<int>(), 
      ostream_iterator<int>( cout, "\n"));
The concepts in the STL and the adaptors form an extremely flexible toolkit. Most adaptors are small classes and function. Own adaptors for other concepts are easy to add. The whole is more than the sum of its parts.


A First Partial Implementation of an Iterator

The stream iterator adaptor example makes a point: Streams, and ranges, can be infinite. For technical reasons, this idea works best with input iterators that generate the sequence on the fly, i.e., they compute the sequence from a small internal state. A first example would be an iterator to a constant value.
template <class T>
class Const_value {
    T t;
public:
    // Default Constructible !
    Const_value() {}  
    explicit Const_value( const T& s) : t(s) {}

    // Assignable by default.

    // Equality Comparable (not so easy what that should mean here)
    bool operator==( const Const_value<T>& cv) const { return ( this == &cv); }
    bool operator!=( const Const_value<T>& cv) const { return !(*this == cv); }

    // Trivial Iterator:
    const T& operator* () const { return  t; }
    const T* operator->() const { return & operator*(); }

    // Input Iterator
    Const_value<T>& operator++() { return *this; }
    Const_value<T>  operator++(int) {
        Const_value<T> tmp = *this;
        ++*this;
        return tmp;
    }
};
Note that operator!= and operator++(int) are implemented in terms of other member functions of the iterator. In this example, they are unnecessarily complicated. But in general, only a small subset of the member functions needs to be implemented for a new iterator, all other member functions are generic.

Other examples for such simple input iterators are a counting iterator and a random number generator.

Using the concept of lazy evaluation from functional programming languages we can also imagine iterators representing more complex and potentially infinite sequences, for example, the sequence of prime numbers.

However, there is no point in copying an infinite sequence. Instead, we might be interested in a finite subsequence. Another generic function, copy_n solves this. Note that copy_n is not part of the C++ standard, but it is available in most implementations of the STL (or easy to write). (see also Const_value.C)

int a[100];
Const_value<int> cv( 42);
copy_n( cv, 100, a);  // fills a with 100 times 42.

Function Objects

A function object basically is an instance of a class with the operator() member function implemented, such that a call to this member function of the object looks like a function call.

Concept Refinement of Syntactic requirements
Generator Assignable function call, no arguments: Result operator()()
Unary Function Assignable function call, one argument: Result operator()(Arg1)
Binary Function Assignable function call, two arguments: Result operator()(Arg1, Arg2)
Predicate Unary Function result type is bool
Binary Predicate Binary Function result type is bool

Function objects are well suited as parameters for generic functions. A typical example would be the exchange of the equality comparison with a function object, which is currently hard coded as the operator== in the generic contains function from above. First, we define a function object equals that performs the same comparison.

template <class T> 
struct equals {
    bool operator()( const T& a, const T& b) { return a == b; }
};
We modify the iterator-based generic contains function from above. It needs an additional template parameter Eq and takes an additional function parameter eq for a binary function object which is used for the comparison.
template <class InputIterator, class T, class Eq>
bool contains( InputIterator first, InputIterator beyond, const T& value,
               Eq eq ) {
    while ((first != beyond) && ( ! eq( *first, value))) 
        ++first;
    return (first != beyond);
}
The example using C-arrays with the contains function needs now an additional argument -- the function object. The expression equals<int>() calls the default constructor for the template class equals<int> from above which is a function object comparing two integers for equality.
int a[100];
// ... initialize elements of a.
bool found = contains( a, a+100, 42, equals<int>());
The next section illustrates how the additional parameter of the contains function can be automatically selected if the value type of the iterator is known. C++ allows to use also simple function pointers as function objects. The advantage of objects is that they can have an internal state. We continue our example of the contains function and define a comparison object that is true when the absolute value of the difference of its two arguments is smaller than eps. The eps value is stored in the function object itself. At construction time of the function object the actual value for eps is initialized, in our example to one, so that the contains function will also return true if the values 41 or 43 do occur in the range.
template <class T> 
struct eps_equals {
    T epsilon;
    eps_equals( const T& eps) : epsilon(eps) {}
    bool operator()( const T& a, const T& b) { 
        return (a-b <= epsilon) && (b-a <= epsilon); 
    }
};
bool found = contains( a, a+100, 42, eps_equals<int>(1));
How about a function object that counts the number of comparisons needed as a side-effect? Here it is:
template <class T> 
struct count_equals {
    size_t& count;
    count_equals( size_t& c) : count(c) {}
    bool operator()( const T& a, const T& b) {
        ++count;
        return a == b;
    }
};
size_t counter = 0;
bool found = contains( a, a+100, 42, count_equals<int>(counter));
// counter contains number of comparisons needed.
Note that since function objects are usually passed by value in the STL we store a reference to an external counter and not the counter value itself in the function objects.


Iterator Traits

Iterators refer to items of a particular value type. Algorithms parameterized with iterators might need the value type directly. Assuming that iterators are implemented as classes the value type can be defined as a local type of the iterator, as in the following example of an iterator referring to integer values. The value type can be referred to with the expression iterator_over_ints::value_type.
struct iterator_over_ints {
    typedef  int  value_type;
    // ...
};
Since a C-pointer is a valid iterator, this approach is not sufficient. The solution chosen for the STL is the iterator traits class, which is a class template parameterized with an iterator:
template <class Iterator> 
struct iterator_traits {
    typedef  typename Iterator::value_type  value_type;
    // ...
};
The value type of the iterator example class above can now be expressed as iterator_traits< iterator_over_ints >::value_type. For C-pointers a specialized version of the iterator traits class exists.
template <class T> 
struct iterator_traits<T*> {
    typedef  T  value_type;
    // ...
};
Now the value type of a C-pointer, e.g., to int, can be expressed as iterator_traits< int* >::value_type. Here, partial specialization is required. The iterator traits class contains also definitions about the difference_type, the iterator_category, the pointer type and the reference type of the iterator.

The example of the generic contains function with the function object from above can be made more convenient for the default use with a default initializer as follows: (see also contains.C)

template <class InputIterator, class T>
bool contains( InputIterator first, InputIterator beyond, const T& value) {
    typedef typename iterator_traits<InputIterator>::value_type value_type;
    typedef equals<value_type> Equal;
    return contains( first, beyond, value, Equal());
}
STL makes use of traits classes in other places as well, for example, char_traits to define the equality test and other operations for a character type. In addition, this character traits class is used as a template parameter for the basic_string class template, which allows the adaption of the string class to different character sets.


Implementing Adaptable Function Objects

Adaptable function objects require in addition to regular function objects some local types that describe the result type and the argument types. A function pointer can be a valid model for a function object, but it cannot be a valid model of an adaptable function object.

Concept Refinement of Syntactic requirements, model T
Adaptable Generator Generator T::result_type
Adaptable Unary Function Unary Function T::result_type, T::argument_type
Adaptable Binary Function Binary Function T::result_type, T::first_argument_type, T::second_argument_type
Adaptable Predicate Predicate, Adaptable Unary Function  
Adaptable Binary Predicate Binary Predicate, Adaptable Binary Function  

Small helper classes help to define adaptable function objects easily. For example, our function object equals from above could be derived from std::binary_function to declare the appropriate types.

#include <functional>

template <class T> 
struct equals : public std::binary_function<T,T,bool> {
    bool operator()( const T& a, const T& b) { return a == b; }
};
The definition of binary_function in the STL is as follows:
template <class Arg1, class Arg2, class Result>
struct binary_function {
    typedef Arg1   first_argument_type;
    typedef Arg2   second_argument_type;
    typedef Result result_type;
};
Adaptable function objects can be used with adaptors to compose function objects. The adaptors need the annotated type information to declare proper function signatures etc. An examples is the negater unary_negate that takes an unary predicate and is itself a model for an unary predicate, but with negated boolean values.

template <class Predicate>
class unary_negate
    : public unary_function< typename Predicate::argument_type, bool> {
protected:
    Predicate pred;
public:
    explicit unary_negate( const Predicate& x) : pred(x) {}
    bool operator()(const typename Predicate::argument_type& x) const {
        return ! pred(x);
    }
};
The function adaptors are paired with function templates for easy creation. The idea is that the function template derives the type for the template argument automatically (because of the matching types).
template <class Predicate>
inline unary_negate< Predicate>
not1( const Predicate& pred) {
  return unary_negate< Predicate>( pred);
}
A short program in [Stepanov95] makes use of this negater. The program copies all integers from cin to cout that cannot be divided by the integer parameter given to the program. (see also remove_if_divides.C)
int main( int argc, char** argv) {
    if ( argc != 2)
        throw( "usage: remove_if_divides integer\n");
    remove_copy_if( istream_iterator<int>(cin), istream_iterator<int>(),
                    ostream_iterator<int>(cout, "\n"),
                    not1( bind2nd( modulus<int>(), atoi( argv[1]))));
    return 0;
}
The other function object adaptor in this example, bind2nd, is again a small helper function to create an object of type binder2nd.
template < class Operation, class Tp>
inline binder2nd< Operation> 
bind2nd( const Operation& fn, const Tp& x) {
    typedef typename Operation::second_argument_type Arg2_type;
    return binder2nd< Operation>( fn, Arg2_type(x));
}
An object of type binder2nd stores an adaptable binary function object and a value compatible with the type of the second argument of the adaptable binary function object. The object itself behaves then like an unary function object. Whenever its operator is called, it returns the value of the binary function object called with its argument and its internally stored value as second argument. This adapter binds a value to the free variable of the second argument of a binary function object. There is a similar adaptor called binder1st that binds a value to the first argument. This is similar to currying known in functional programming languages (it needs much more writing in C++ to make it work, but then it works). So, these are higher order function objects.
template <class Operation> 
class binder2nd
  : public unary_function< typename Operation::first_argument_type,
                           typename Operation::result_type> {
protected:
    Operation op;
    typename Operation::second_argument_type value;
public:
    binder2nd( const Operation& x,
	       const typename Operation::second_argument_type& y) 
	: op(x), value(y) {}
    typename Operation::result_type
    operator()(const typename Operation::first_argument_type& x) const {
        return op(x, value); 
    }
};
Other function object adaptors exist that can compose function objects, or encapsulate function pointers and member function pointers in adaptable function objects.


Implementation of the Iterator Adaptor back_inserter

A class template that is a model of an output iterator. It keeps a reference to a container class as internal state. Each time an expression for a back_insert iterator i of the from *i = value; is evaluated, the value is appended to the container class using the push_back() member function.

template <class Container>
class back_insert_iterator {
protected:
    Container* container;
public:
    typedef Container           container_type;
    typedef output_iterator_tag iterator_category;
    typedef void                value_type;
    typedef void                difference_type;
    typedef void                pointer;
    typedef void                reference;

    explicit back_insert_iterator(Container& x) : container(&x) {}
    back_insert_iterator<Container>&
    operator=(const typename Container::value_type& value) { 
	container->push_back(value);
	return *this;
    }
    back_insert_iterator<Container>& operator*()     { return *this; }
    back_insert_iterator<Container>& operator++()    { return *this; }
    back_insert_iterator<Container>& operator++(int) { return *this; }
};
A small helper function template provides again the convenience not to type the template arguments explicitly.

template <class Container>
inline back_insert_iterator<Container> back_inserter(Container& x) {
  return back_insert_iterator<Container>(x);
}
Here is a short example of its use with a list class.

list<int> ls;
copy( a1, a1+100, back_inserter(ls));

More Iterators

See Iterator_identity.h and Iterator_identity.C for an adaptor class that takes an iterator and behaves itself exactly like this iterator. The example in Iterator_base.h and Iterator_base.C implements the same adaptor, but based on the Barton-Nackman trick


Function Dispatch using Iterator Category at Compile Time

An iterator belongs to a specific iterator category. This category can used to select different algorithms. For example the difference between two iterators can be computed in constant time for random access iterators, but can only be computed in linear time (by counting) for all other categories.

The C++ standard defines five empty classes to denote the different iterator categories. These types will be used as symbolic tags at compile time.

struct input_iterator_tag {};
struct output_iterator_tag {};
struct forward_iterator_tag : public input_iterator_tag {};
struct bidirectional_iterator_tag : public forward_iterator_tag {};
struct random_access_iterator_tag : public bidirectional_iterator_tag {};
An iterator is assumed to have a local type iterator_category that is defined to be one of these tags.

struct Some_iterator {
    typedef forward_iterator_tag iterator_category;
    // ...
};
This iterator category is accessed using iterator traits. Now we can implement a generic distance function (original implementation as it is in the STL):
template <class InputIterator>
inline typename iterator_traits<InputIterator>::difference_type
__distance( InputIterator first, InputIterator last, input_iterator_tag) {
    typename iterator_traits<InputIterator>::difference_type n = 0;
    while (first != last)
        ++first; ++n;
    return n;
}

template <class RandomAccessIterator>
inline typename iterator_traits<RandomAccessIterator>::difference_type
__distance( RandomAccessIterator first, RandomAccessIterator last,
           random_access_iterator_tag) {
    return last - first;
}

template <class InputIterator>
inline typename iterator_traits<InputIterator>::difference_type
distance( InputIterator first, InputIterator last) {
    typedef typename iterator_traits<InputIterator>::iterator_category 
      Category;
    return __distance(first, last, Category());
}
Note how the class hierarchy among the iterator tags is used to reduce the number of overloaded functions __distance that need to be implemented here. Following the refinement relation of the iterator concepts, the forward_iterator_tag should be derived also from the output_iterator_tag. Obscure reasons about multiple derivation kept this derivation out of the standard. On the other hand, this derivation isn't likely to simplify real implementations anyway.

These tags are quite convenient to annotate symbolic information at compile time. However, there is a catch. An object has always non-zero size, even of an empty class. This is reasonable (the address identifies an object) and helps defining invariants about size, allocation, arrays, etc. However, if we derive from an empty class, like we do with function objects and binary_function<Arg1,Arg2,Result>, we would like to avoid any size penalties. In principle the compiler could perform this optimization, but, for example, g++ does not. The following program shows the effect.

#include <iostream>

using namespace std;

class A {};
class B : public A {
    int i;
};
class C {
    int i;
};

int main() {
    cout << "size of A = " << sizeof(A) << endl;
    cout << "size of B = " << sizeof(B) << endl;
    cout << "size of C = " << sizeof(C) << endl;
    return 0;
}


Lutz Kettner (<surname>@mpi-sb.mpg.de). Last modified on Tuesday, 29-Jul-2003 12:26:26 MEST.