Inside the Qt 4 Containers

Материал из Wiki.crossplatform.ru

Версия от 08:14, 10 декабря 2008; Root (Обсуждение | вклад)
(разн.) ← Предыдущая | Текущая версия (разн.) | Следующая → (разн.)
Перейти к: навигация, поиск

__NOTOC__

by Jasmin Blanchette

Qt 4 provides a new set of container classesthat are available to Qt applications in addition to theSTL containers. In this article, we will see how they are representedin memory. Knowing how containers work internally makes it easier tochoose the right container for the right circumstance. Just be awarethat some of the classes mentioned here are internal and should notbe used directly.

Содержание

Sequential Containers

QVector<T>

QVector<T> stores its items consecutively in memory. With thisapproach, only one block of memory is allocated, reducing the timeand memory overhead associated with dynamic memory allocation.

center

Like nearly all of Qt's containers and most other value classes(including QString), QVector<T> is implicitly shared,meaning that when we take a copy of a vector, only a pointer (d) iscopied and a reference count (ref) is incremented. A deep copyoccurs only when we try to modify the data. For this reason, implicitsharing is sometimes referred to as "copy on write". Implicitsharing is the main difference between QVector<T> and STL'sstd::vector<T> class.

The QBasicAtomic class, used to store thereference count, is an integer type that supports atomic (thread-safe)increment and decrement operations, ensuring that implicit sharing works acrossthreads. QBasicAtomic is internal andimplemented in assembly language on the different architectures supported by Qt.

The diagram below depicts the situation in memory when we take twoextra copies of an existing QVector<T> (using the QVector<T> copy constructor or assignmentoperator):

center

To avoid reallocating the data every time the vector grows, QVector<T> allocates more memorythan necessary. While size is the number of elements in the container,alloc is the number of elements that were actually allocated. QVector's growth strategy depends on thetype T:

  • If T is a movable type (a data type that can safely bemoved around in memory using memcpy() or memmove(), such asthe C++ primitive types and Qt's implicitly shared classes), QVector<T> uses realloc()to grow by increments of 4096 bytes. This makes sense because modern operatingsystems don't copy the entire data when reallocating a buffer; the physicalmemory pages are simply reordered, and only the data on the first and lastpages need to be copied.
  • For non-movable data types, QVector<T> grows by 50 per centincrements, ensuring that repeatedly calling append() results inamortized O(n) behavior rather than O(nІ).

The distinction between movable and non-movable types is also used tospeed up insertions in the middle of the vector. When such aninsertion takes place, the elements that come after the point ofinsertion must be moved one position further. If T is a movabletype, this is achieved using memmove(); otherwise, QVector<T>needs to move the items one by one using operator=(). The sameapplies to removals in the middle.

If you use QVector<T> with a customtype T, Qt will assume that the type is non-movable. To declare itmovable, use theQ_DECLARE_TYPEINFO()macro after your custom class's definition:

class MyClass
{
    ...
private:
    MyClassPrivateData *data;
};
 
Q_DECLARE_TYPEINFO(MyClass, Q_MOVABLE_TYPE);

The type information provided by Q_DECLARE_TYPEINFO() is used byother containers as well, notably QList<T>. If your data type isa POD type (a "plain old data" type) that has no constructor ordestructor and that can be copied and moved in memory usingmemcpy() or memmove(), you can make a stronger statementabout it and specify "primitive" rather than "movable":

class MyClass
{
    ...
private:
    int x;
    int y;
};
 
Q_DECLARE_TYPEINFO(MyClass, Q_PRIMITIVE_TYPE);

If T is a primitive type, QVector<T> doesn't invoke theconstructor and the destructor. In addition, all the optimizationsthat apply to movable types also apply to primitive types.

QList<T>

Unlike its predecessors QPtrList<T> and QValueList<T>, QList<T> is implemented as an "arraylist" and not as a linked list. This is consistent with what modern languagessuch as Java and C# offer.

Depending on the type T, QList<T> has two representations. Inthe general case, QList<T> has anarray of pointers to its items, which are allocated on the heap usingnew:

center

This organization makes insertions and removals in the middle fasterthan what QVector<T> can offer fornon-movable types (because pointers can be moved around usingmemmove()), at the cost of some overhead.

Like QVector<T>, QList<T> preallocates memory at theend of its buffer (up to 4096 bytes), but in addition it also reserves somespace at the beginning; the occupied entries arearray[begin], array[begin + 1], ...,array[end- 1]. Reserving space at the beginning has two main benefits:

  • A common list operation such as prepending an item or removingthe first item usually takes constant time, because the buffer can growbackwards. This is the reason why QQueue<T>is based on QList<T>, whereas QStack<T> is based on QVector<T>.
  • In the general case of an insertion at position k, QList<T> can choose between moving thefirst k items by one position to the left and moving the lastn - k items by one position to the right. If k is 10 andn is 2000, it makes more sense to move 10 items to the left than 1990to the right.

Exceptionally, in the special case where type T has the size of apointer and can be moved in memory using memcpy() ormemmove(), QList<T> storesthe elements directly in its buffer:

center

This optimization covers several very common types, includingvirtually all pointers types, most of Qt's implicitly shared classes(including QString), as well as int on32-bit platforms.

QLinkedList<T>

QLinkedList<T> provides a doubly linked list. If you have a verylarge dataset and need consistent O(1) insertions and removals, QLinkedList<T> is the right container for you.

center

Be aware that although insertions are guaranteed to occur in constanttime, QLinkedList<T> starts paying off only on lists ofapproximately 200 items or more. For lists smaller than that, QList<T> offers better performance and requires less memory than QLinkedList<T>.

Now that we understand how Qt's main sequential containers arerepresented in memory, we can reread the guidelines that appear inQt's reference documentation and check that they square with reality:

  • For most purposes, QList is the right class to use. Itsindex-based API is more convenient than QLinkedList'siterator-based API, and it is usually faster thanQVector because of the way it stores its items inmemory. It also expands to less code in your executable.
  • If you need a real linked list, with guarantees of constanttime insertions in the middle of the list and iterators toitems rather than indexes, use QLinkedList.
  • If you want the items to occupy adjacent memory positions,use QVector.

It turns out the guidelines are sensible, although they don't mentionthat for pointer types and implicitly shared types like QString, the main difference between QList and QVector is that QList provides faster insertions and removals in thefirst half of the container.

QVarLengthArray<T, Prealloc>

Unlike the other sequential containers in Qt, QVarLengthArray<T, Prealloc>isn't implicitly shared. It has memory for a certain number of elements,specified by the Prealloc template parameter. If Preallocisn't specified, it defaults to 256.

center

If size is larger than Prealloc, QVarLengthArray<T, Prealloc>allocates an extra buffer using malloc() and ignores the preallocatedbuffer:

center

QVarLengthArray<T, Prealloc>is ideal for tight loops, because in the common case wheresize <= Prealloc, it doesn't performany dynamic memory allocation. Plain C++ arrays cannot be used in that case,because C++ requires that the array size is known at compile-time.

Associative Containers

QMap<Key, T>

QMap<Key, T> stores a list of(key, value) pairs ordered by key and provides lookups inO(log n).Internally, QMap<Key, T> uses askip-list, a data structure that offers similar performance and the samealgorithmic complexity as the more common red-black tree. Earlier versions of QMap used a red-black tree, but skip-lists result inless code in the executable and require less memory per node.

center

Skip-lists are a generalization of linked lists, where each node hasone, two, three, or more forward pointers; forward[0] points tothe next node, forward[1] (if present) points to the next nodethat has a forward[1] field, and so on. The backward pointerin each node is used to iterate backwards.

To lookup a key, QMap starts by advancing as far aspossible using the chain of forward[11] pointers, then it continueswith forward[10], forward[9], ..., forward[0].Since items are sorted by key, by comparing the current node's key with the keyto find, QMap can tell whether it has gone too far ornot.

For example, to find item G in the skip-list depicted below, wefollow QMapData's forward[2] link toD. Continuing with forward[2] would lead us to H, whichis too far; therefore we follow forward[1], which brings us toF. From there, forward[1] leads to H, which is too far,so we follow forward[0] and reach G.

center

In an ideal world, skip-lists would always be perfectly balanced asshown above, in a nice 1-2-1-3-1-2-1-4-1-2-1-3... pattern that isreminiscent of the notches on a ruler. In practice, thisrarely happens, because items can be inserted in any order. Butthere are circumstances where this utopian scenario occurs: when weread a QMap from a QDataStream, and when a deep copy of a QMap is taken.

QHash<Key, T>

Hash tables usually provide faster lookups than map. For that reason,Qt offers QHash as an (unsorted) alternative to QMap. The two classes have almost identical APIs, butthe requirements on the Key type are different:

  • For QMap, the Key type must supportoperator<(), which specifies a total order (i.e., if neitherx < y nory < x, then x == y).
  • For QHash, the Key type must supportoperator==() as well as qHash(), a global hash function.

Qt provides implementations of qHash() for C++ primitive numerictypes, pointer types, QChar, QByteArray, and QString. For example, here is the qHash()overload that handles QStrings:

uint qHash(const QString &amp;key)
{
    const QChar *ptr = key.unicode();
    int n = key.size();
    uint h = 0;
    uint g;
 
    while (n--) {
        h = (h << 4) + (ptr++)->unicode();
        if ((g = (h &amp; 0xf0000000)) != 0)
            h ^= g >> 23;
        h &amp;= ~g;
    }
    return h;
}

You can use custom types as Keys in a QHash,as long as you provide a custom qHash() implementation. Yourimplementation must meet the requirement that if x == y,then qHash(x) == qHash(y).

In addition, to ensure maximum performance, the hash function shouldbe designed in such a way that if x != y, then it's highlylikely that qHash(x) != qHash(y). Thus, a trivialimplementation such as
uint qHash(const MyClass &amp; /* key */)
{
    return 0;
}
is a very poor hash function (although it does qualify as one).

Inside QHash, each item is stored in a separate nodeand the nodes are linked together inside a bucket. The hash function is usedto determine in which bucket to put an item. More precisely, an itemwith key k is stored inbuckets[qHash(k) % numBuckets].

With the trivial qHash() implementation above, all the itemswould end up in bucket 0, creating a long chain of items andresulting in very slow (linear) lookups. With a suitable qHash()implementation, the items are more likely to be distributed evenlyamong the buckets, and lookups occur in constant time on average(which beats QMap's O(log n) behavior).

center

To minimize the number of collisions, the bucket table always has aprime number of entries. As the QHash grows, so does the numberof buckets. If you know in advance approximately how many items youwill store in the QHash, you can call reserve() to set thenumber of buckets, avoiding the continual resizing of the buckettable.

If the Key type is int or uint, the hash function used isthe identity function (i.e., qHash(x) == x). In addition, QHash is smart enough not to store the key in that case, since itis identical to the hash value, saving four bytes per node.

When traversing a QHash using an iterator, the iterator pointsdirectly at a node. Calling ++ on the iterator advances theiterator to the next node in the same bucket, using the currentnode's next pointer. If the current node is the last node in thebucket (as is typically the case when we use a suitable hashfunction), next points to the QHashData object. This objectis recognizable by its first field (fakeNext), which is always anull pointer. From there, the iterator continues iterating in thenext bucket.

QMultiMap<Key, T> and QMultiHash<Key, T> have thesame representation as their base class. QCache<Key, T> uses a QHash<Key, T *>, and QSet<T> uses a QHash<T, QHashDummyValue>, whereQHashDummyValue is an internal type that doesn't use any memory.

Conclusion

Whereas STL's containers are optimized for raw speed, Qt's containerclasses have been carefully designed to provide convenience, minimalmemory usage, and minimal code expansion. For example, Qt'scontainers are implicitly shared, meaning that they can be passedaround as values without forcing a deep copy to occur. Also,sizeof(QXxx) == sizeof(void *) for any Qt container,and no memory allocation takes place for empty containers.

Qt's containers fully take advantage of partial templateinstantiation to provide different implementations depending on thedata type that is stored. Qt solves the code expansion issue thatarises in conjunction with C++ templates by putting most of thecontainer code in non-template internal classes such asQListData and QMapData, making Qt's containers particularlywell suited for embedded programming.

Now that you know how the Qt 4 containers work under the hood, youcan take advantage of this knowledge to write code that does theproper memory/space tradeoffs based on your needs.


Copyright © 2006 Trolltech Trademarks