In programming, the need for dynamically resizable arrays is ubiquitous. Growable arrays have long been used to address this requirement, but with the advent of vector
, a powerful container class provided by the Standard Template Library (STL), and ArrayList
(in the Java Collections Framework), developers have gained access to a highly optimized and efficient solution.
In this blog, we will explore the motivation behind using growable arrays named the GrowAbleArray
class (for this blog only), delve into its implementation (specifically the insertion at the end part), and analyze its time complexity. Then, our focus will shift to our own implementation of a Vector
class, similar to vector
; we'll discuss its insertion at the end part, and analyze its time complexity. We will also highlight the distinguishing factors in ArrayList
. Finally, we will compare GrowAbleArray
and Vector
to understand why vectors are preferred over traditional growable arrays, considering their strengths, weaknesses, performance, and memory management strategies in modern programming scenarios.
Static arrays, while efficient and powerful, come with a limitation—they can't be resized. This constraint arises from the need to define variables and arrays at compile time, where their sizes are predetermined on the stack. Although this approach enables precise memory allocation and efficient runtime execution, it poses challenges when dealing with data that dynamically expands.
Enter dynamic arrays—a solution that empowers us to overcome the limitations of static arrays. By allowing us to allocate memory at runtime (on heap), dynamic arrays provide the flexibility to handle growing data seamlessly. With this powerful concept, we can tackle programming tasks involving massive data manipulations where the size of the data is dynamic.
In the above animation, you can see how we first created a dynamic array of size three and then expanded it to size four by inserting a new value at the end of the array. Although this expansion involves creating a new memory of size+1
, copying the previous data to the new memory, writing the new value at the end, and then relocating the old pointer to the base of the new memory gives the illusion that a new value is simply added at the end of the original data.
Let's make a customized generic template class by extending this idea.
Now, let's proceed with the implementation of a growable array. In this example, we are creating a user-defined type of a growable array named GrowableArray
. We are also overloading the subscript []
operator to provide similar behavior to a primitive array. Additionally, we will implement the operator<<
to facilitate the ease of printing our GrowableArray
instances.
#include <iostream>using namespace std;template <typename T>class GrowableArray{T * A;int size;public:GrowableArray(){size = 0;A = nullptr;}void push_back(T v){T * HA = new T [size + 1];for(int i = 0; i < size; i++){HA[i] = A[i];}HA[size] = v;if(size!=0)delete [] A; // As deleting a nullptr in some compiler is not safeA = HA;HA = nullptr;size++;}T & operator [](int i){if(i>=size) throw "Out of Boundary access";return A[i]; // This is dangerous if i is out of bound}friend ostream & operator <<(ostream & out, const GrowableArray<T> & V){ // This requires the data values must have << operator overloadedout << "{ ";for(int i = 0; i < V.size; i++){out<< V.A[i] << " ";}out<< "}";return out;}~GrowableArray(){delete [] A;}};int main(){// your code goes hereGrowableArray<int> G;G.push_back(1);G.push_back(2);G.push_back(3);cout << "G: " << G << endl;return 0;}
Let's discuss what we have done in the above code.
C++ implementation
We have created a user-defined type GrowableArray
by using the properties of a dynamic array. In the GrowableArray
class, we define two data members:
Line 6: A pointer A
of template type T
to enable the user to make a growable array of any type they like.
Line 7: A variable size
of type int
to hold the track of how much of a memory is allocated on heap.
Lines 9–13: We define a constructor GrowableArray()
to make sure that all the data members must be in a valid state.
Lines 14–27: We define the method push_back(int v)
to handle the insertion of elements into the array. This method increases the size of the array by one. It does this by creating a new heap array of size size+1
, copying all the elements from the previous memory into the new array, and saving the new value in the additional available space. Subsequently, the method deletes the previously allocated memory, and the pointer is relocated to the new memory, exactly as demonstrated in the example of expanding a growable array from size three to size four
Java implementation
Run the code of the GrowableArray.java
file.
It has a similar implementation. The push_back()
functions repeat a similar expanding factor. Similar to operator<<
, this implementation uses the toString()
function, which is used to print the Java GrowableArray
.
Let’s have a look at the time complexity of these growable arrays (implementations of both C++ and Java).
Let’s assume we have to load
If we insert the 1st record, it will take 1 copying.
If we insert the 2nd record, it will take 2 copying (1 for the previous record relocated and 1 for the new value).
Inserting the 3rd record will take 3 copying (2 for the previous records relocated and 1 for the new value).
If we insert the
So, that means the cost of inserting
Let's understand the solution of this arithmetic series using the given illustration:
Therefore, the total cost, as illustrated in the animation above, is approximately
On average, if we say that in a growable array we have inserted
Let’s imagine we're loading 10 million integers (in a binary file containing GrowableArray
implementation on a machine of, say, a GA.h
) will take several clock ticks, but let’s assume that for the sake of an example, if one iteration executes in 1-clock tick, then
Take a pause and read this: “Only
This is an unbearable wait. Now imagine if you are working on Facebook, Amazon, or Google records, where you have to work on petabytes of data that will be impossible to manage if just loading (not processing) takes such a huge amount of time.
The implementation of vectors closely resembles that of a growable array, but with a slight difference. The main concern arises during insertion in the growable array, where the need to relocate previously inserted records leads to recopying. This prompts us to question how we can minimize this recopying process.
What if, instead of increasing every time by one and recopying everything previously inserted to new space, we increase the size by a larger number, let’s say
The concept behind this implementation involves two crucial variables: capacity
and size
. The capacity
represents the actual memory space occupied on the heap, while the size
indicates the number of records inserted by the user in the array. The initial capacity can be set to either 1 or any positive size specified by the user.
During insertion, if there is remaining capacity (i.e., size < capacity
), the new value can be seamlessly added to the index of size
(and size
gets incremented accordingly). However, when size
equals capacity
, the vector expands not by just one but by twice the previous capacity
. While copying still occurs during this expansion, the subsequent half of the capacity
values (approximately) will have a constant insertion cost of O(1).
Before delving into the efficiency analysis of this approach, let’s proceed with its implementation. The only change required is in lines 18–27.
#include <iostream>using namespace std;template <typename T>class Vector{T * A;int size;int capacity;public:Vector(int _size = 1){size = 0;capacity = _size;A = new T[_size];}void push_back(T v){if(capacity == size){capacity *= 2;T * HA = new T [capacity];for(int i = 0; i < size; i++){HA[i] = A[i];}delete [] A; A = HA; HA = nullptr;}A[size] = v;size++;}T & operator [](int i){return A[i];}friend ostream & operator <<(ostream & out, const Vector<T> & V){cout << "{";for(int i = 0; i < V.size; i++){cout<< V.A[i] << " ";}cout<< "}";return cout;}};
Instruction: Run the above code in the main.cpp
(you may change the values of the loop to test several different values). In our sample test run, it is loading
Here in the animation, you can see how the insertion happens and capacity
and size
manage the insertion operation. The actual cost is associated with the copying happening on the line 24 and line 28 (in Vector.h
).
As we see in the above slides, due to the exponential increase in capacity (doubling by two), most of the time, the insertion cost will shrink to the copying cost of
Think for a moment about what those special positions are.
The total cost of
We can do a small trick, i.e., every single insertion cost is at least 1; by separating that 1 cost of each operation, the new summation will become:
The final expression has a geometric series (). That summation is bounded by ; therefore, the total cost will be approximately:
Why ? The proof we are giving is considered a
“proof without words.” In mathematics, a proof without words is an illustration of an identity or mathematical statement that can be demonstrated as self-evident by a diagram without any accompanying explanatory text. Such proofs can be considered more elegant than formal or mathematically rigorous proofs due to their self-evident nature. As the summation is commutative, we can rewrite the summation as:
We can take common:
As shown in the below diagram: .
Therefore, the total complexity will just be
On average, if we say that in a vector array we have inserted
Now, let’s imagine that we would like to load of data and assuming that one clock tick of a processor executes one copying operation (the same assumption that we took in the above scenario of the growable array), now the total number of copying will just be that is lesser than the processor’s total speed of ; therefore the loading process will take approximately seconds that is less than a fraction of a second. This is a really big improvement as compared to the previous one, which was years of loading time.
This is a clear difference between an algorithm taking time and ; this difference gets extremely important when handling large data.
Vectors are a popular data structure and have many advantages, but they also come with some limitations:
Contiguous memory: Vectors store elements in contiguous memory locations. While this provides faster access to elements, it also means that inserting or deleting elements from the middle of the vector requires shifting all subsequent elements, which can be inefficient for large vectors.
Fixed capacity overhead: Despite being dynamic, vectors have a fixed capacity that grows as elements are added. If the capacity exceeds the required size, it may lead to wasted memory space.
Reallocation overhead: As vectors grow and exceed their capacity, they may need to reallocate memory (though this is very rare), but still a specific event of insertion can be expensive.
Limited performance for front insertions: Adding elements to the front of a vector requires shifting all existing elements, resulting in a time complexity of , where is the number of elements in the vector.
Cost of complex objects: For objects with complex copy constructors or large sizes, the resizing and copying operations can be resource-intensive.
While vectors are suitable for many use cases, it is essential to consider these limitations when deciding on the appropriate data structure for specific scenarios. In situations where real-time performance and frequent resizing are critical concerns, alternative data structures like linked lists, doubly-ended queues, or trees may be more appropriate choices.
The Java ArrayList
class, part of the Java Collections Framework, offers an implementation similar to C++ STL’s vector
. It distinguishes itself with a growing factor of ArrayList
remains a flexible and efficient dynamic array implementation, favored for managing lists of elements with dynamic sizes.
import java.util.Arrays;public class CustomArrayList<T> {private Object[] A;private int size;private int capacity;public CustomArrayList(int initialCapacity){size = 0;capacity = initialCapacity;A = new Object[initialCapacity];}public CustomArrayList() {this(1);}public void push_back(T v){if (capacity == size){capacity = (int) Math.ceil(capacity * 1.5);Object[] HA = new Object[capacity];for (int i = 0; i < size; i++){HA[i] = A[i];}A = HA;// System.out.println(capacity);}A[size] = v;size++;}@SuppressWarnings("unchecked")public T get(int i){if (i >= size)throw new IndexOutOfBoundsException("Out of Boundary access");return (T) A[i];}@Overridepublic String toString() {return Arrays.toString(A);}public static void main(String[] args){CustomArrayList<Integer> G = new CustomArrayList<>(10);for(int i=1;i<=8*(1<<20); // Works with 8 Million times here// i<=40*(1<<20); // This will cause in increase in allocated memory,// which is not feasible for educative platform// but you can test on your own, that it will take more time.i++){G.push_back(i);}System.out.println("G is created... ");}}
Vectors find their utility in various scenarios, especially when data insertion at the end and random access are crucial requirements. It excels in situations where dynamic stacks and queues need to be implemented. By utilizing vector
in the background, we ensure that the memory allocation can adapt to dynamic scenarios, providing flexibility to grow or shrink as needed.
Notably, we have focused on the insertion at the end operation and the time complexity analysis in this blog. However, vector in C++ offers several other functions like pop_back()
and resize()
that can further enhance its versatility. Additionally, there are numerous other applications of vectors beyond what we have covered here. If you are interested in exploring these functions and applications in depth, we recommend checking out our comprehensive Data Structures Course and Grokking Coding Interviews Patterns in C++. There, you can delve into more real-world use cases of vector
/ ArrayList
and their applications.
Grokking Coding Interview Patterns in C++
With thousands of potential questions to account for, preparing for the coding interview can feel like an impossible challenge. Yet with a strategic approach, coding interview prep doesn’t have to take more than a few weeks. Stop drilling endless sets of practice problems, and prepare more efficiently by learning coding interview patterns. This course teaches you the underlying patterns behind common coding interview questions. By learning these essential patterns, you will be able to unpack and answer any problem the right way — just by assessing the problem statement. This approach was created by FAANG hiring managers to help you prepare for the typical rounds of interviews at major tech companies like Apple, Google, Meta, Microsoft, and Amazon. Before long, you will have the skills you need to unlock even the most challenging questions, grok the coding interview, and level up your career with confidence. This course is also available in JavaScript, Python, Go, and C++ — with more coming soon!
Data Structures for Coding Interviews in Java
Data structures are amongst the fundamentals of Computer Science and an important decision in every program. Consequently, they are also largely categorized as a vital benchmark of computer science knowledge when it comes to industry interviews. This course contains a detailed review of all the common data structures and provides implementation level details in Java to allow readers to become well equipped. Now with more code solutions, lessons, and illustrations than ever, this is the course for you!
Data Structures for Coding Interviews in C++
Data structures are amongst the very fundamentals of Computer Science and are often a core decision in developing efficient programs. Consequently, they are also largely categorized as a vital benchmark of computer science knowledge when it comes to industry interviews. This course contains a detailed review of all the common data structures and provides implementation level details in C++ to allow readers to become well equipped with all the different data structures they can leverage to write better code!
Free Resources