- Introduction
- Compile-time programming
- Constraining templates using the SFINAE principle
- Type classification
- Self Reflection of Types in C++: The flatten.C Example
- Expression Templates

// prime.C // Program by Erwin Unruh template <int i> struct D { D(void*); operator int(); }; template <int p, int i> struct is_prime { enum { prim = ((p%i) && is_prime< (i>2 ? p : 0), i-1>::prim) }; }; template <int i> struct Prime_print { Prime_print<i-1> a; enum { prim = is_prime<i,i-1>::prim }; void f() { D<i> d = prim; } }; struct is_prime<0,0> { enum { prim = 1}; }; struct is_prime<0,1> { enum { prim = 1}; }; struct Prime_print<2> { enum { prim = 1}; void f() { D<2> d = prim; } }; void foo() { Prime_print<20> a; }Todd Veldhuizen uses these advanced template techniques with template metaprograms [Veldhuizen95a] and with expression templates [Veldhuizen95b], which we are going to detail later.

In [Veldhuizen95a] a template metaprogram for bubblesort is described. Instantiated for a constant number of elements, the metaprogram unrolls the loops of bubblesort and creates the decision tree to sort the elements. The second example is a compile time function for sinus and cosinus that can be used to implement a metaprogram for the FFT, resulting in a single unrolled function for a 256 FFT with all roots evaluated as constants.

In [Veldhuizen95b] templates are used to write code such as:

// Integrate a function from 0 to 10 DoublePlaceholder x; double result = integrate( x / (1.0 + x), 0.0, 10.0);The term

Both techniques are used in Blitz++, a library for numerical
algorithms with vectors and matrices, see `
http://oonumerics.org/blitz/`.

- types
- constant integral values

int f(T t) { return t.category(); } int a = f(t); // function (f) : associates a value (a) to an object (t)Now observe the similarity with the following function on types :

template < typename T > struct F { typedef typename T::Category type; }; typedef F<T>::type A; // function (F) : associates a type (A) to the type (T)Constant integral values can be associated to types and vice-versa, at compile time.

template < typename T > struct G { enum { value = 3 }; }; template < int i > struct H { typedef H<i+1> type; }; template < int i > struct K { enum { value = i+1 }; }; int a = G<T>::value; // (G) : associates an integral value (a) to the type (T) typedef H<2>::type A; // (H) : associates a type (A) to an integral constant value (2) int a = K<2>::value; // (K) : associates an integral value (a) to an integral constant value (2)Of course, such "meta" functions are not restricted to only one argument, and not even to one return entity.

template < int i, typename T, int j, typename U > struct Z { enum { value1 = i+1 }; enum { value2 = j+i }; typedef Y<T, U> type; };Built-in operators (

```
+, -, *, /, %, ?:, ^, !, &, |, &&, ||,
<<, >>, <, >, <=, >=, ==, !=
```

) can also be used
to generate constant integral values directly :
int a = K<2+3>::value; // (+) : associates an integral value (5) to two integral constant values (2 and 3)We can apply composition of meta functions :

int a = G<F<2>::value>::value; // Composition of F and G.To do real programming, we need to be able to perform branches. This can be achieved using (eventually partial) specialization.

template < int i > struct D { enum { value = D<i-1>::value }; }; template <> struct D<0> { enum { value = 0 }; }; // Which is equivalent to the run-time program : int d(int i) { if (i == 0) return 0; return d(i-1); } // Note that the following works as well in this particular case : template < int i > struct E { enum { value = i==0 ? 0 : D<i-1>::value }; };The

`is_prime`

example in the introduction illustrates some more
possibilities.
Another tool, which allows to connect compile-time programming to the powerful
function overload resolution mechanism of C++, is the `sizeof()`

operator :

struct LargeStruct { char c[2]; }; char f(...); LargeStruct f(double); template < int s > struct Is_double { enum { value = (s == 1) }; }; // x is some expression, e.g. "std::sqrt(2.0)" int a = Is_double< sizeof(f(x)) >::value;Note that

`sizeof()`

returns an integral constant value. It also
has the property that the expression to which it is applied needs not be
defined (being declared is enough).
Overload resolution is a complex mechanism, and we will see later how it allows
to extract some properties of types automatically.
`f`

:
void f(); void f(int); void f(double); void f(...); template < typename T > void f(T const&); int main () { float x; f(x); // which function f is going to be called ? }The compiler proceeds in 2 stages : it first selects all functions that could match the argument (in this case, the last 3 declarations), then it discards those which require a conversion which has the least "priority". If there is not one unique best match, then this ambiguous case leads to an error. So conversion rules come into play, and this is an area of C++ which is very complicated. In this case, the template version is going to be chosen because there is no perfect match otherwise, the other possibilities require conversions which have less priority, and the ellipses version is the one with the least priority.

There is a particular feature of the first processing stage performed by the compiler, when it tries to substitute the types of the actual arguments of the function call into each function declaration : this process does not produce an error if it fails, it just discards the considered function from the set of possible matching functions. This is called the "Substitution Failure Is Not An Error" (SFINAE) principle. Now let's add the following additional overloaded function to the previous program :

template < typename T > struct G {}; template < typename T > void f(T const&, typename G<T>::type = 0);Given that

`G`

does not define a nested type `type`

,
trying to substitute `float`

in this overloaded function is going
to fail. The SFINAE principle implies that the program is perfectly valid,
and that the first template is still chosen.
Now what happens if we define a nested type `type`

in
`G`

?

template < typename T > struct G { typedef int type; };Then the second template matches as well (there is no failure during the substitution process), and we get an ambiguity error with the first template, because it matches with equal priority.

Now this opens new horizons, because the template class `G`

can
be specialized for some particular types, and can define or not a nested type
`type`

. This allows to perform some selection on the templates,
to constrain them depending on some properties on types. However, the process
is not yet so nice, because it requires an additional argument to the function
(although we can specify a default value), and this is impossible for
overloaded operators.

The remark that is going to save this idea for operators is that the SFINAE principle applies to the whole function signature, including the return type.

template < typename T > typename G<T>::type f(T const&);Let us now have a look at how we could make use of this in a practical case : the CGAL library. There, we would like to define generic functions for adding geometric vectors :

template < typename Vector_2 > Vector_2 operator+(Vector_2 const &v, Vector_2 const & w) { return Vector_2(v.x() + w.x(), v.y() + w.y()); }Obviously, this doesn't work, because this template is way too general (it matches almost any type), and it would clash with, for example :

template < typename Vector_3 > Vector_3 operator+(Vector_3 const &v, Vector_3 const & w) { return Vector_3(v.x() + w.x(), v.y() + w.y(), v.z() + w.z()); }So, let's consider concrete types provided by the user, to which we would like to apply our generic operator :

class My_vector { // 2D vector type int _x, _y; public: My_vector(int x, int y) : _x(x), _y(y) {} int x() const { return _x; } int y() const { return _y; } }; class My_point { // 2D point type int _x, _y; public: My_point(int x, int y) : _x(x), _y(y) {} int x() const { return _x; } int y() const { return _y; } };The types

`My_vector`

and `My_point`

have identical
interfaces, and we would like that our generic operator applies only on
`My_vector`

.
So the idea is to constrain the function template, in such a way that it will
be considered only for types that are 2D geometric vector types, and which
provide the needed requirements (`.x()`

and `.y()`

member
functions, correct constructor, adequate semantics...).
This kind of thing is typically a job for a traits class.

template < typename T > struct IsVector_2 { enum { value = false }; // By default, a type is not a 2D vector. }; template <> struct IsVector_2 <My_vector> { enum { value = true }; // But My_vector is a 2D vector. };So now we can easily constrain the template using another accessory tool :

template < typename T, typename, bool = T::value > struct Enable_if; template < typename T, typename R > struct Enable_if <T, R, true> { typedef R type; }; template < typename Vector_2 > typename Enable_if< IsVector_2<Vector_2>, Vector_2 >::type operator+(Vector_2 const &v, Vector_2 const &w) { return Vector_2(v.x() + w.x(), v.y() + w.y()); }The following main function illustrates what we finally get :

int main() { My_vector v(1,2), w(3,4); My_vector z = v + w; // OK My_point p(1,2), q(3,4); My_point r = p + q; // error : // no match for `My_point& + My_point&' operator }The whole program can be found here : vector_2.C.

We have just seen an example of traits class to express a property of a type. We are now going to see ways to write traits classes to automatically extract fundamental properties of types.

Let's start by writing a traits class which determines if a type is a fundamental type or not. Given that there is a small finite number of such types, it is easy to enumerate them :

template < typename > struct IsFundamental { enum { value = false }; }; template <> struct IsFundamental<int> { enum { value = true }; }; template <> struct IsFundamental<short> { enum { value = true }; }; template <> struct IsFundamental<double> { enum { value = true }; }; // similarly for bool, char, long, long double and the unsigned versions.Now let's try to classify compound types, that is, types which are constructed from other types : plain pointers, references, arrays... We can use partial specialization to identify some of these categories. In this case, it is also useful to extract which type(s) the compound is made of (e.g. the type of each item for an array).

// Primary template template < typename T > struct CompoundType { enum { IsPointer = false, IsReference = false, IsArray = false, IsFunction = false, IsPointerToMember = false }; typedef T base_type; }; // Partial specialization for references template < typename T > struct CompoundType<T&> { enum { IsPointer = false, IsReference = true, IsArray = false, IsFunction = false, IsPointerToMember = false }; typedef T base_type; }; // Partial specialization for pointers template < typename T > struct CompoundType<T&> { enum { IsPointer = true, IsReference = false, IsArray = false, IsFunction = false, IsPointerToMember = false }; typedef T base_type; }; // Partial specialization for arrays template < typename T, size_t N > struct CompoundType<T[N]> { enum { IsPointer = false, IsReference = false, IsArray = true, IsFunction = false, IsPointerToMember = false }; typedef T base_type; }; // Partial specialization for empty arrays template < typename T > struct CompoundType<T[]> { enum { IsPointer = false, IsReference = false, IsArray = true, IsFunction = false, IsPointerToMember = false }; typedef T base_type; }; // Partial specialization for pointers to members template < typename T, typename C > struct CompoundType<T C::*> { enum { IsPointer = false, IsReference = false, IsArray = false, IsFunction = false, IsPointerToMember = true }; typedef T base_type; };Unfortunately, function types cannot be recognized that easily. We are therefore going to use the SFINAE principle, after making the following remark : the only types which cannot be gathered in an array are function types, void and references. We now exploit this fact by trying to make an array of the given type, and if it fails, it means the type is one of these special types. "Trying", here, is a synonym for using the SFINAE principle.

template < typename T > class IsFunction { struct LargeStruct { char c[2]; }; template < typename U > static char test(...); template < typename U > static LargeStruct test(U (*)[1]); public: enum { value = sizeof( IsFunction<T>::test<T>(0) ) == 1 }; }; template < typename T > struct IsFunction<T&> { enum { value = false }; }; template <> struct IsFunction<void> { enum { value = false }; }; template <> struct IsFunction<void const> { enum { value = false }; };How to determine enumerated types ? What are the features of this types that we could exploit to detect them ? We know that they are convertible to integral types. But we need something more, because a user defined class could also define a conversion to an integral type.

We are not going to describe the full code here, but the curious reader can find the solution in [Vandevoorde03]. Hint : two user defined conversions cannot be applied automatically consecutively, whereas type promotion between enums and integral types do not have this restriction.

To conclude : it is possible to write traits classes that determine almost
all basic properties of types. Libraries such as
Boost provide such mechanisms in an extensive
way, and the next revision of the standard library may contain such a
`type_traits`

mechanism. However, there are properties which cannot
be determined using the current language : it is possible to determine if a
class defines a static data member of a given name using SFINAE
(static_member_detection.C), but it
is not possible to extract the list of all data members of a class. Such a
possibility would require changes in the core language. The following section
illustrates what could be done with such a feature.

The following is a correct program in the functional programming
language PolyP (see `
http://www.cs.chalmers.se/~patrikj/poly/polyp/`):

module Main where import Flatten(flatten) data Foo a = Bar | Cee a | Doo (Foo a) a (Foo a) x = Doo (Doo (Cee 'P') 'a' (Doo Bar 't' Bar)) 'r' (Doo Bar 'i' (Doo Bar 'k' Bar)) main = print (flatten "Patrik", flatten x) -- expected output: ("Patrik", "Patrik")

The trick is the implementation of `flatten` without knowing
the structure of `Foo`. Instead, PolyP allows to write
functions over the structure of how types can be defined in the
language in general. There is only a small set of possebilities how
types can be defined in PolyP, for example, alternatives of types or
Cartesian product of types.

Can we write such a `flatten` function in C++? No, we cannot,
but almost. The missing information is what could be called
self-inspection of types. We need meta information of a type how it is
constructed. For example, for a struct we need to know its member
types.

Let us assume we would have such information, and for the running example we provide this information just by hand (as a clever annotation to the type definition). Then, we could write the following program in C++ (see flatten.C for the full program):

// ----------------------------------- USER DATA TYPES struct B; struct A { int i; A* a; B* b; }; struct B { A* a; }; // ----------------------------------- EXAMPLE MAIN PROGRAM int main() { A a1; a1.i = 1; A a2; a2.i = 2; A a3; a3.i = 3; B b; a1.a = &a2; a2.a = &a3; a3.a = 0; a1.b = &b; a2.b = 0; a3.b = 0; b.a = &a2; list<int> ls; flatten(a1,ls); copy( ls.begin(), ls.end(), ostream_iterator<int>(cout, " ")); cout << endl; }The program creates an acyclic graph rooted at a1. Nodes of type A can point to nodes of type A and to nodes of type B. They also contain an integer value, the value we want to concatenate during our

The return value of the `flatten` call is stored in the list
`ls`. The output of the program is the sequence [1,2,3,2,3].

We have to add some annotation for our type definitions of A and B to
make this example work. For the `flatten` function, we can get
away with a rather dense notation that gives us for each structure the
types of the members that it contains (incl. how many), and a way to
access the different members. We define a class template with two
arguments. The first argument is the type we want to annotate. The
second argument is the cardinal number of the current member we are
describing. The general class template has an operator() that just
returns an object of type UNKNOWN. For each member in each type we
write a specialization of the class template. The specialization has
an operator() that, given an object of that type, returns the member
of the given cardinal number.

// ----------------------------------- ANNOTATE TYPES FOR SELF-INSPECTION struct UNKNOWN {}; template <class X, int i> struct T { UNKNOWN operator()( X&) { return UNKNOWN();} }; template <> struct T<A,1> { int operator()(A& a) { return a.i; } }; template <> struct T<A,2> { A* operator()(A& a) { return a.a; } }; template <> struct T<A,3> { B* operator()(A& a) { return a.b; } }; template <> struct T<B,1> { A* operator()(B& b) { return b.a; } };The

// ----------------------------------- VALUE / POINTER template <class X, class I> struct flatten_class { void operator()( X x, list<I>& ls) { flatten_items( T<X,1>()(x), T<X,1>(), x, ls); } }; template <class P, class I> struct flatten_class<P*,I> { void operator()( P* x, list<I>& ls) { if ( x != NULL) flatten_class<P,I>()(*x, ls); } }; template <class X, class I> void flatten( X x, list<I>& ls) { flatten_class<X,I>()( x, ls); }Second, we distinguish between structs and simple types. For structs, we iterate through all members of the struct and call

// ----------------------------------- LEAF / STRUCT WITH SUBITEMS template <class Item, class X, int n, class I> struct flatten_items_class { // flatten all subitems void operator()( Item i, X x, list<I>& ls) { flatten( i, ls); flatten_items( T<X,n+1>()(x), T<X,n+1>(), x, ls); } }; template <class Item,int n, class I> struct flatten_items_class<Item,I,n,I> { // item of type I found void operator()( Item, I i, list<I>& ls) { ls.push_back(i); } }; template <int n, class I> // item of type I found struct flatten_items_class<UNKNOWN,I,n,I> { void operator()( UNKNOWN, I i, list<I>& ls) { ls.push_back(i); } }; template <class X, int n, class I> // no (further) subitems struct flatten_items_class<UNKNOWN,X,n,I> { void operator()( UNKNOWN, X x, list<I>& ls) {} }; template <class Item, class X, int n, class I> void flatten_items( Item i, T<X,n>, X x, list<I>& ls) { flatten_items_class<Item,X,n,I>()(i,x,ls); }This concludes our presentation of flatten.C. The purpose of this example was to stretch the imagination with what would be possible in C++ (and is possible in other languages), but how ugly and unmaintainable it looks in C++.

Expression templates have several uses, but basically the advantages of this technic are :

- it allows to write complex types in a natural manner. These types represent expressions.
- it can be applied to implement problem specific optimizations automatically, allowing to reach performance that only hand-coded low level code could have reached, by rephrasing some expressions at compile time from a high level description.

class SArray { double * _a; size_t _s; public: SArray (size_t n) : _a(new double[n]), _s(n) {} double const& operator[](size_t i) const { return _a[i]; } double & operator[](size_t i) { return _a[i]; } size_t size() const { return _s; } ~SArray() { delete[] _a; } }; SArray operator+(SArray const& a, SArray const& b) { assert(a.size() == b.size()); SArray tmp(a.size(); for (int i = 0; i < a.size(); ++i) tmp[i] = a[i] + b[i]; return tmp; }Now, if we consider the typical use below, we can see that there is an efficiency problem with the last line : a temporary array is created in order to store the result of

`x + y`

. This does not increase the number
of additions to be performed, but it increases the amount of memory required by
the program, which makes it slower due to cache effects.
SArray x, y, z, t; ... // do something useful with x, y, z t = x + y + z;We can solve the problem by using a dedicated function that adds three arrays in one shot :

SArray add_3(SArray const& a, SArray const& b, SArray const& c) { assert(a.size() == b.size() && a.size() == c.size()); SArray tmp(a.size(); for (int i = 0; i < a.size(); ++i) tmp[i] = a[i] + b[i] + c[i]; return tmp; } ... t = add_3(x, y, z);This is inconvenient because the notation is cumbersome, and for any expression based on arrays, we would like to benefit from the optimization, without requiring to write a new function for, e.g.

`x+y*z-z*x`

, or whatever
the user of our array class wants to compute. This is where expression
templates come into play.The idea is to postpone the actual evaluation of the expression until the assignment operator is seen. This is done by creating an object that will encode the expression in the form of a tree :

Each node in this tree has a particular type representing the type of operation
(e.g. `+`

) that it represents, together with references to its
operands.

template < typename OP1, typename OP2 > class A_Add { OP1 const& op1; // first operand OP2 const& op2; // second operand public: A_Add (OP1 const& a, OP2 const& b) : op1(a), op2(b) {} // What it is contributing in the final computation double operator[] (size_t i) const { return op1[i] + op2[i]; } };Notice now how

`operator+`

is going to create a node of the
expression tree, and no computation on the arrays at all. We first need to
write a wrapper around the concrete array `SArray`

. This is the
assignment operator which triggers the recursive computation over the tree.
template < typename Rep = SArray > class Array { Rep expr_rep; public: // initialization with a size explicit Array(size_t s) : expr_rep(s) {} // create an array from a possible representation Array (Rep const& rb) : expr_rep(rb) {} // Assignment of arrays of the same type Array& operator=(Array const& b) { assert(size() == b.size()); for (size_t i = 0; i < b.size(); ++i) expr_rep[i] = b[i]; return *this; } // Assignment of arrays of the different type template < typename Rep2 > Array& operator=(Array<Rep2> const& b) { assert(size() == b.size()); for (size_t i = 0; i < b.size(); ++i) expr_rep[i] = b[i]; return *this; } double operator[] (size_t i) { assert(i < size()); return expr_rep[i]; } Rep const& rep() const { return expr_rep; } Rep & rep() { return expr_rep; } }; template < typename R1, typename R2 > Array<A_Add<R1, R2> > operator+(Array<R1> const& a, Array<R2> const& b) { return Array<A_Add<R1, R2> > (A_Add<R1, R2>(a.rep(), b.rep())); }Similarly for the other operations :

template < typename OP1, typename OP2 > class A_Mul { OP1 const& op1; // first operand OP2 const& op2; // second operand public: A_Mul (OP1 const& a, OP2 const& b) : op1(a), op2(b) {} double operator[] (size_t i) const { return op1[i] * op2[i]; } }; template < typename R1, typename R2 > Array<A_Mul<R1, R2> > operator*(Array<R1> const& a, Array<R2> const& b) { return Array<A_Mul<R1, R2> > (A_Mul<R1, R2>(a.rep(), b.rep())); }With this implementation, we have achieved what we have promised. Note that a complete implementation would also have to take care of some corner cases like proper handling of local variables (by copying them in the tree nodes, not referencing them), also paying attention to cases where the variable being assigned to is also in the arguments (e.g.

`x = y + x`

)...
Caveats : for the optimizations to apply, the compiler needs to perform two
kinds of particular optimizations itself : inlining all the small functions
which locally create a tree on the stack, and split the tree node structures
in small components. For example, the latter is not being performed by
`g++`

at the moment (version 3.3). As with some other template
technics, it also has the potential to increase compilation times considerably
depending on the program.

Another caveat of the expression templates is a bad interaction with generic libraries. Consider :

template < typename T > T square(T const& t) { return t*t; } Array<> x, y, z; ... z = square(x); // OK z = square(x+y); // errorThe problem here is that

`x+y`

doesn't have the "base" type
`Array<>`

, but it has the type of a tree node.
Inside the `square`

function, the return value is therefore
going to be that particular type, but there is no possible conversion
between any two different "internal" tree node types. The only possible
conversion is from an internal tree node to the "base" type
`Array<>`

. Therefore it has to be explicitely
converted before the call to the `square`

function :
... z = square(Array<>(x+y)); // OK z = square<Array<> >(x+y); // OKThe other possibility is to partially overload the

`square`

function as shown below. The advantage is that the call sites do not
require any change, but the disadvantage is that the overloading has
to know about the `R`

parameter of `Array`

, which
otherwise never shows up in the usage of this class.
template < typename R > Array<> square(Array<R> const& t) { return t*t; } // or, taking further advantage of the "optimization" : template < typename R > Array<A_Mul<R, R> > square(Array<R> const& t) { return t*t; }There are other useful applications of expression templates, for example in the Boost Lambda library, which allows to create functors "on the fly", starting from placeholders variables

`_1, _2...`

provided by the library. This way you can write
functors on the fly using a natural syntax like :
std::vector<double*> V; ... std::sort(V.begin(), V.end(), *_1 > *_2);Another application can be found in the GMP library (GNU Multi Precision), which provides C++ types for computing with multi precision numbers (integers, rationals...). In this case, expression templates are used to avoid creating temporaries. We can also cite PETE (Portable Expression Template Engine) which is a tool which allows to create expression templates easily.

Lutz Kettner
(*<surname>*@mpi-sb.mpg.de).
Last modified on Tuesday, 29-Jul-2003 12:26:25 MEST.