Choice of containers
Information technology, even embedded devices, is about information gathering, processing or calculation, and control. Input and data needs to be juggled around and maybe sorted. In this article, we want to point out some fundamental implications in the choice of a data container.
Vector
The first container is the array or “vector”. If the data is not be sorted, adding data at the end is simple. Except at the moment when the reserved size of the array is exceeded, then a new bigger memory region needs to be made available and the data from the old array, needs to be copied (depending on the memory configuration). Indexed access or picking a random element from the vector is very fast. If the data is to be sorted, insertion sort becomes much slower as – depending on the size of the array – potentially lots of data elements needs to be copied to make room for the new element at the correct index. But, searching for a specific element via the binary search method is more scalable and time efficient for larger values of n. Moreover, for real-time systems, we are interested in worst case performance: Algoritms that are time deterministic and timing predictable, and more specifically that always take the same time to come up with the requested answer (depending on the number of elements in the container), are preferred. This in contrast with some other applications and algorithms that are optimized for providing the solution in the shortest average time.
List
A second possible container is the linked list. Two variations; the single and the double linked list. For the first one, one always needs the head to start since there is only forward navigation: Each element only has a link or pointer to the next element. For the double linked list, we can navigate forward (to the next element) and backward (to the previous element). Adding an element, is simply a matter of unlinking, insert and linking to the previous and the next element. Removing an element is just as simple. On the other hand, sorting and searching in a linked list is less efficient when compared with the binary search method on a sorted vector (depending on the number of elements in the container).
Tree
A (sorted) binary tree is a further evolution of the data container concept. While the double linked list has links to its previous and next element, the binary tree element has 3 links: a root element, a left and a right element. A binary tree is sorted when starting from the root (the binary tree root element is the element which has no root or where the root is null), an element –
that is smaller – is linked to by the “left”, and an element – that is bigger – is linked to by the “right”.
When the binary tree is “balanced”, the search operations could be made time deterministic and timing predictable (i.e. again to always take the same time depending on the number of elements in the container) :
A balanced binary tree is commonly defined as a binary tree in which the depth of the left and right subtrees of every node differ by 1 or less.
Advantages and disadvantages
To recapitulate:
The unsorted vector
+ indexed access
+ fast iteration
+ fast adding except when exceeding reserved memory
– slower removing
Use when indexed access is important, when insert and remove is less important, and sorting is not needed.
The sorted vector
+ indexed access
+ fast iteration
+ time deterministic and timing predictable element searching
– slower adding (binary search for insert position, make room for element by copying elements)
– slower removing
Use when sorting, indexed access and searching is important, when insert and remove is less important.
The linked list
+ fast forward (and also backward for double linked list) iteration
+ fast adding
+ fast removing
– slow indexed access
– slower searching
Use when insert/remove must be fast, when sorting and searching is less important.
The binary tree
+ iteration is slower than compared with a vector
+ time deterministic element searching (for a balanced tree)
+ faster removing
– slower adding (first one needs to determine the insert position)
– (offline) re-balancing could be required
Use when insert/remove may take longer, but where sorting and searching (on bigger sets) is more important, and the implementation must remain simple.
The even more advanced container we look at later is the red-black tree and the hash table…
Important
In order to take a look at and to compare the different algorithms and how they provide their solution in a time deterministic way, refer to e.g. http://bigocheatsheet.com/
In real-time systems we must adhere to strict timeline, and thus worst case performance of an algorithm is most important. Again to evaluate worst case timings, cache must always be ‘cold’. For applications that do not have this limitation, the concept of data locality begins to play a big role: Linear searching an array of objects might become much faster in practice because the cache is more optimally used when compared with a binary tree e.g. that has to jump from object address to object address. All depends on platform architecture, number of elements and the specific application, of course.
Conclusion
Vectors and lists are simple, do not have much overhead, and are used extensively for small embedded programs. In that case, the maximum number of elements (and thus the allocated memory) are usually pre-determined resulting in fast adding, removing and searching on limited data sets.
If one considers using a binary tree, we will discuss better solutions in a next article.
In general I agree with the analysis of the containers presented here. But on the moving of data, things have changed considerably since C++11 introduced move semantics. Adding a move constructor to your data structure will cause containers like vector behave different (like in ‘faster’)… See https://mbevin.wordpress.com/2012/11/20/move-semantics/ for examples.