Reliable Software Logo
 Home  >  C++ Resources  > Resource Management   >  Auto Vector

auto_vector – Container That Owns Heap Objects

Bartosz Milewski

Bartosz is the author of C++ In Action, Industrial Strength Programming Techniques. During his years at Microsoft he worked on several projects involving various aspects of operating systems. He ported the Mach OS virtual memory system to the x86 platform, took part in the development of the HPFS and OFS file systems, was the architect and development lead of the Windows 2000 content index. He is now the president of Reliable Software, www.ReliSoft.com, a distributed software company.

auto_ptr

Some programmers think auto_ptrs are at best marginally useful, at worst totally evil. Others think auto_ptrs are the best thing since sliced bread. Personally I subscribe to the latter school--auto_ptrs are the cornerstones of resource management. Any naked (not encapsulated) pointer that stores the address of a heap-allocated object is a potential memory leak. It is extremely difficult to guarantee the deallocation of such an object for every possible path of execution, especially those created by exceptions.

The creators of the C++ Standard Library went to great lengths to prove that their code was exception safe. Application programmers can't afford to go into this kind of analysis for every memory allocation and every pointer assignment. The only reliable option is to use auto_ptrs and let the built-in language mechanisms, in particular stack unwinding, take care of deallocations. The main idea of resource management is that each call to operator new must immediately pass the result to an auto_ptr, be it a stack-based one or a data member of some object. Transfers of such resources are done by passing or assigning auto_ptrs.

Unfortunately the creators of the Standard Library took care of only half of the job. They provided us with an auto_ptr that works like a container that can store (and own--be responsible for the deallocation) only one heap object. Not only are there no containers that could own multiple heap objects, but it is impossible to instantiate a container of auto_ptrs using a fully standard-compliant compiler. That is because all standard containers (and the algorithms that work on them) assume copy semantics for stored objects. When you access an element of a standard container, you get a copy of the object. But copying an auto_ptr causes the resource to be transferred. The caller who wanted to access a container of auto_ptrs would end up getting the resource and removing it from the container. It's even worse with standard algorithms, which think nothing of making temporary duplicates of objects, wreaking havoc with auto_ptrs.

You could try designing and implementing your own container of auto_ptrs, making sure that by accessing it you don't destroy its contents . Overloading such a container's operator[] to return a pointer rather than an auto_ptr will do the trick. The question is: how to make such a container work with standard algorithms? Since this problem hadn't been solved at the time, the creators of the Standard Library were reluctant to implement "auto containers." In this article I will propose an implementation of a container that owns heap objects and whose iterators work with standard algorithms. The addition of such a container to the future C++ Standard Library would allow it to fully support resource management of heap objects.

Vector of Pointers

Let's first look at the behavior of a legitimate standard container, since that's what we will try to imitate. We'll focus our attention on the most popular container--the standard vector. A vector can store objects by value or by reference. It's the second option that we're interested in. When I say "by reference" I don't mean C++ references--you can't instantiate a vector of references, because they cannot be copied or assigned. You can however instantiate a standard vector of pointers:

std::vector<Item *> itemPtrs;

Pointers provide reference semantics. They are also assignable, which is the requirement for elements of any standard container.

How does a vector of pointers differ from a typical vector of values? Its modifying operations, like insert or push_back, accept pointers instead of values. The dereferencing, for instance by operator[], returns a pointer (in fact an lvalue that can be modified):

itemPtrs [i] = itemPtr;

Its iterators, when dereferenced, yield pointers:
std::vector<Item *>::iterator it = itemPtrs.begin ();
Item * itemPtr = *it;
(*it)->ItemMethod ();

Most importantly, the iterators of a vector of pointers can work with standard algorithms. For instant, you could in principle do this:

std::sort (itemPtrs.begin (), itemPtrs.end ());

However, the result of such sorting wouldn't probably be what you'd expect. The compiler knows how to compare pointers--it compares the addresses. Sorting pointers by addresses rarely makes sense. What you need is the comparison of objects to which the pointers refer. That can be accomplished by providing the sort algorithm with the appropriate predicate.

For simplicity, let's assume that class Item has the operator< (less than) defined. Listing 1 shows the example of a pointer predicate. This predicate could be used as follows:

std::sort (itemPtrs.begin (), itemPtrs.end (), ItemPtrLess ());

Listing 1

struct ItemPtrLess
{
    bool operator () (Item const * a, Item const * b)
    {
        return *a < *b;
    }
};

In fact, for objects that define operator less-than, we can create a template predicate (see Listing 2), to be used like this:

std::sort (itemPtrs.begin (), itemPtrs.end (), ptr_less<Item> ());

Listing 2

template<class T>
struct ptr_less
{
    bool operator () (T * a, T * b)
    {
        return *a < *b;
    }
};

Although a vector of pointers is perfectly acceptable in the framework of the Standard Library, its use is seriously limited by one shortcoming--it doesn't take care of managing the lifetimes of the objects it stores. In particular, its destructor doesn't destroy them. To correctly store and manage heap-allocated objects we need to generalize the standard vector of pointers. I will call this generalization auto_vector.

auto_vector

The auto_vector template uses the vector of pointers as its workhorse. Several auto_vector methods, such as size, capacity, reserve, begin, end, rbegin, rend, and so on, call the corresponding methods of the vector of pointers. The iterators are typedef'd to the corresponding vector-of-pointers iterators. Listing 3 shows the first sketch of the implementation:


Listing 3

template <class T> 
class auto_vector
{
public:
    typedef typename std::vector<T*>::iterator iterator;
    ~auto_vector ()
    {
        clear ();
    }
    void clear ()
    size_t size () const { return _ptrs.size (); }
    iterator begin () { return _ptrs.begin (); }
    iterator end () { return _ptrs.end (); }
private:
    std::vector<T*>  _ptrs;
};

The first thing worth noticing is that, from the point of view of standard algorithms, auto_vector behaves just like a vector of pointers. For instance, if you want to sort it, you can apply the std::sort algorithm to its iterators, with a predicate that accepts pointers.

We already know that auto_vector's destructor must do some additional cleanup. It does it by calling clear, which takes care of deleting all the pointers.

template <class T>
void auto_vector<T>::clear ()
{
    std::for_each (_ptrs.begin (), _ptrs.end (), delete_ptr<T> ());
    _ptrs.clear ();
}

The functor delete_ptr is another useful template:

template <class T>
struct delete_ptr
{
    void operator () (T * p)
    {
        delete p;
    }
};

Next on our agenda is the problem of passing pointers to and from auto_vector. As I have explained in the beginning of this article, one should not use naked pointers when ownership is at issue. We will assume therefore that heap objects to be stored in auto_vector are always encapsulated using auto_ptrs, before being transferred to auto_vector. Conversely, auto_vector will always release the ownership of such objects by passing them to the caller in auto_ptrs.

For example, the push_back method takes an auto_ptr, pushes its internal pointer onto the vector, and relieves the auto_ptr from its ownership.

template <class T>
void auto_vector<T>::push_back (std::auto_ptr<T> ptr)
{
    _ptrs.push_back (ptr.get ());
    ptr. release ();
}

Notice that the argument to push_back is not a reference. Passing auto_ptrs by reference is confusing to the caller--it's never obvious whether the callee will take over the ownership or not.

There is also a side benefit to passing auto_ptr by value--polymorphism of pointers can be translated naturally into polymorphism of auto_ptrs. For instance, if you have a container of pointers to objects of class Base, (auto_vector<Base>) you'll be able to pass it an auto_ptr containing a pointer to a derived object, (auto_ptr<Derived>). You wouldn't be able to do it with (non-const) references.

The pop_back method requires some more consideration. A standard vector's pop_back doesn't return any value, instead the client is supposed to first call top to get the object, and then discard the top element by calling pop_back. This scheme would not work very well with auto_vector. If the client were to use the top method of auto_vector to transfer the ownership of an object, and then forgot to immediately call pop_back to remove it form the container, auto_vector would end up with a null pointer at its top. It is much better to implement auto_vector::pop_back to return auto_ptr (see Listing 4).


Listing 4

template <class T>
std::auto_ptr<T> auto_vector<T>::pop_back () 
{
    T * p = _ptrs.back ();
    _ptrs.pop_back ();
    return std::auto_ptr<T> (p);
}

Notice how the use of auto_ptrs and auto_vectors allows a program to maintain total control over the ownership of heap objects from conception to destruction. An object is created and immediately passed to an auto_ptr in a single statement:

auto_ptr<Item> item (new Item);

In this form, it can be passed to and from functions, and it can be stored in an auto_vector:

auto_vector<Item> items;
items.push_back (item);

The object can be safely retrieved, again using auto_ptr, at a later time:

auto_ptr<Item> item = items.pop_back ();

If not, the object is automatically destroyed during the destruction of auto_vector. When this kind of programming methodology is strictly followed (and it is very easy to follow), you will never have to deal with memory leaks or double deletions. What's more, you will get all this without the overhead of garbage collection.

Random Access

So far we've been treating auto_vector like a stack. One of the major features of any vector, however, is the ability to provide constant-time access to its elements through indexing. Since auto_vector is a generalization of a vector of pointers, we expect the application of operator[] to return a pointer:
template<class T>
T * auto_vector<T>::operator [] (size_t i) const 
{
    return _ptrs [i];
}

This definition will only work correctly for the const version of operator[]. But the result of indexing a non-const vector can be used on the left side of the assignment operator, allowing direct modification of an element of the vector. In the case of auto_vector, such use would result in the overwriting of a pointer stored in it. The object to which this pointer was pointing would leak out. Moreover, the ownership of the pointer that replaced it would be transferred to auto_vector (it would be deleted in the destructor of auto_vector). This might be all right in a fragment like this:

auto_vector<Item> items;
items.resize (1)
items [0] = new Item;

but it could also lead to trouble (double deletion) in a slightly different context:
auto_ptr<Item> itemPtr (new Item);
items [0] = itemPtr.get (); // bad!

What we would really like to have is operator[] that, when used as an l-value, accepts only auto_ptr, takes the ownership of the pointer away from it, and deletes the previous pointer stored under the same index. You would then use it, for instance, like this:

items [i] = auto_ptr<Item> (new Item);

The trick to implementing this kind of behavior is for operator[] to return a special object of the class auto_lvalue.

    auto_lvalue operator [] (size_t i) 
    { 
        return auto_lvalue (_ptrs [i]); 
    }

Class auto_lvalue (listing 5) defines its own assignment operator that accepts auto_ptr and deletes the original pointer. It overloads the standard dereference operators, so it can also be used as an r-value and behave like a regular pointer.

There is no way to mistakenly put a naked pointer directly into auto_vector.


Listing 5

class auto_lvalue
{
public:
    auto_lvalue (T * & p) : _p (p) {}
    operator T * () const { return _p; }
    T * operator-> () const { return _p; }
    auto_lvalue & operator= (std::auto_ptr<T> ap)
    {
        if (ap.get () != _p)
        {
            delete _p;
            _p = ap.release ();
        }
        return *this;
    }
private:
    T * & _p;
};

Exception Safety

The question of exception safety of auto_vector has to be approached from a slightly different perspective then that of regular containers. Normally the requirement is that any container operation either succeeds or leaves the container and its arguments unchanged. In the case of auto_vector we cannot guarantee that the arguments will not change (in fact we can guarantee that they will). I will argue that this is indeed the desired behavior.

Let's look again at the auto_vector's push_back method.

void push_back (std::auto_ptr<T> ptr)
{
    _ptrs.push_back (ptr.get ());
    ptr. release ();
}

Suppose that the internal push_back throws an exception--for instance when reallocating the vector's internal buffer. The standard library guarantees that the state of _ptrs will not change (we are talking about a standard vector of pointers). The stack unwinding mechanism will take care of the deleting of the object stored in the auto_ptr argument. This is necessary, since nobody will be able to access this object any more--its ownership has been passed to push_back during argument processing. And here's where auto_vector breaks the traditional exception safety contract. The object passed to push_back in the auto_ptr will, after the call, no longer be accessible through that auto_ptr. It will either be added to auto_vector or deleted.

But what else can we expect to happen when push_back fails? The whole idea of using auto_ptr for ownership transfers was to make sure that the object in it would be destroyed if an exception happens. You don't stick an object inside an auto_ptr if you are planning on rescuing it from the aftermath of an exception. Therefore this kind of behavior is preferable.

In general, the only sensible contract between the caller and the callee, when passing auto_ptr, is the following:

After passing auto_ptr to a function, the caller should assume that the auto_ptr no longer contains an object

Conclusion

The idea of auto_vector can be easily generalized to other types of containers. Special care has to be taken when adding, removing, or substituting elements of such a container. The interface should be designed in such a way that clients would have to go out of their way to mess up resource management (create a leak or incur double deletion). Having the appropriate methods accept or return auto_ptrs will do the trick.

Consistent use of auto_ptrs, auto_vectors, and other auto-containers results in garbage-collected code without garbage collection. We've been using this methodology at Reliable Software for many years with great success.

Bibliography

  1. Bartosz Milewski, Resource Management in C++, Journal of Object Oriented Programming, March/April 1997, Vol. 10, No 1. p. 14-22
  2. Bartosz Milewski, Strong Pointers and Resource Management in C++, C++ Report, September 1998 and February 1999 (two parts)
  3. Bjarne Stroustrup, C++ Programming Language, Third Edition, Addison-Wesley (1997). Examples of resource management could be found in the Bjarne Stroustrup's famous book since its first edition. The latest one he has a section on resource management.
  4. Bartosz Milewski, C++ In Action, Industrial-Strength Programming Techniques, Addison-Wesley, 2001