Home/Blog/Programming/Why Vector (In C++) And Arraylist (In Java) Are So Fast
Home/Blog/Programming/Why Vector (In C++) And Arraylist (In Java) Are So Fast

Why Vector (In C++) And Arraylist (In Java) Are So Fast

Sarfraz Raza
Oct 17, 2023
14 min read

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

In programming, the need for dynamically resizable arrays is ubiquitous. Growable arrays have long been used to address this requirement, but with the advent of vector, a powerful container class provided by the Standard Template Library (STL), and ArrayList (in the Java Collections Framework), developers have gained access to a highly optimized and efficient solution.

In this blog, we will explore the motivation behind using growable arrays named the GrowAbleArray class (for this blog only), delve into its implementation (specifically the insertion at the end part), and analyze its time complexity. Then, our focus will shift to our own implementation of a Vector class, similar to vector; we'll discuss its insertion at the end part, and analyze its time complexity. We will also highlight the distinguishing factors in ArrayList. Finally, we will compare GrowAbleArray and Vector to understand why vectors are preferred over traditional growable arrays, considering their strengths, weaknesses, performance, and memory management strategies in modern programming scenarios.

The need for growable arrays#

Static arrays, while efficient and powerful, come with a limitation—they can't be resized. This constraint arises from the need to define variables and arrays at compile time, where their sizes are predetermined on the stack. Although this approach enables precise memory allocation and efficient runtime execution, it poses challenges when dealing with data that dynamically expands.

Enter dynamic arrays—a solution that empowers us to overcome the limitations of static arrays. By allowing us to allocate memory at runtime (on heap), dynamic arrays provide the flexibility to handle growing data seamlessly. With this powerful concept, we can tackle programming tasks involving massive data manipulations where the size of the data is dynamic.

canvasAnimation-image
1 of 6

In the above animation, you can see how we first created a dynamic array of size three and then expanded it to size four by inserting a new value at the end of the array. Although this expansion involves creating a new memory of size+1, copying the previous data to the new memory, writing the new value at the end, and then relocating the old pointer to the base of the new memory gives the illusion that a new value is simply added at the end of the original data.

Let's make a customized generic template class by extending this idea.

1. Implementing a generalized growable array#

Now, let's proceed with the implementation of a growable array. In this example, we are creating a user-defined type of a growable array named GrowableArray. We are also overloading the subscript [] operator to provide similar behavior to a primitive array. Additionally, we will implement the operator<< to facilitate the ease of printing our GrowableArray instances.

#include <iostream>
using namespace std;
template <typename T>
class GrowableArray
{
T * A;
int size;
public:
GrowableArray()
{
size = 0;
A = nullptr;
}
void push_back(T v)
{
T * HA = new T [size + 1];
for(int i = 0; i < size; i++)
{
HA[i] = A[i];
}
HA[size] = v;
if(size!=0)
delete [] A; // As deleting a nullptr in some compiler is not safe
A = HA;
HA = nullptr;
size++;
}
T & operator [](int i)
{
if(i>=size) throw "Out of Boundary access";
return A[i]; // This is dangerous if i is out of bound
}
friend ostream & operator <<(ostream & out, const GrowableArray<T> & V)
{ // This requires the data values must have << operator overloaded
out << "{ ";
for(int i = 0; i < V.size; i++)
{
out<< V.A[i] << " ";
}
out<< "}";
return out;
}
~GrowableArray()
{
delete [] A;
}
};
int main()
{
// your code goes here
GrowableArray<int> G;
G.push_back(1);
G.push_back(2);
G.push_back(3);
cout << "G: " << G << endl;
return 0;
}

Let's discuss what we have done in the above code.

Explanation#

C++ implementation

We have created a user-defined type GrowableArray by using the properties of a dynamic array. In the GrowableArray class, we define two data members:

  • Line 6: A pointer A of template type T to enable the user to make a growable array of any type they like.

  • Line 7: A variable size of type int to hold the track of how much of a memory is allocated on heap.

  • Lines 9–13: We define a constructor GrowableArray() to make sure that all the data members must be in a valid state.

  • Lines 14–27: We define the method push_back(int v) to handle the insertion of elements into the array. This method increases the size of the array by one. It does this by creating a new heap array of size size+1 , copying all the elements from the previous memory into the new array, and saving the new value in the additional available space. Subsequently, the method deletes the previously allocated memory, and the pointer is relocated to the new memory, exactly as demonstrated in the example of expanding a growable array from size three to size four

Java implementation

Run the code of the GrowableArray.java file.

  • It has a similar implementation. The push_back() functions repeat a similar expanding factor. Similar to operator<<, this implementation uses the toString() function, which is used to print the Java GrowableArray.

Let’s have a look at the time complexity of these growable arrays (implementations of both C++ and Java).

Time complexity analysis#

Let’s assume we have to load NN records into the growable array. How much time will it take?

  • If we insert the 1st record, it will take 1 copying.

  • If we insert the 2nd record, it will take 2 copying (1 for the previous record relocated and 1 for the new value).

  • Inserting the 3rd record will take 3 copying (2 for the previous records relocated and 1 for the new value).

  • If we insert the NthN^{th} record, it will take NN copying (N1N-1 for the previous records relocated and 1 for the new value).

So, that means the cost of inserting NN records is:

Let's understand the solution of this arithmetic series using the given illustration:

Arithmetic series 1-100
Arithmetic series 1-100
1 of 2

Therefore, the total cost, as illustrated in the animation above, is approximately N22+N2N2\frac{N^2}{2}+\frac{N}{2}\approx N^2. In computer science, we denote this as O(N2)O(N^2), pronounced as “Big O of N squared.” In Big-O notation, we consider only the highest degree term of the polynomial, disregarding the constants associated with it and all smaller degree terms.

Amortized (average) cost#

On average, if we say that in a growable array we have inserted NN values, then the total average cost will be O(N2/N)=O(N)O(N^2/N) =O(N) approximately, meaning on almost every new entry, we are spending approximately O(N)O(N) copying. That is too slow.

Why is the insertion cost of O(N) too slow and impractical?#

Let’s imagine we're loading 10 million integers (in a binary file containing 44 bytes for each integer, i.e., 10×106×4=40MB\approx 10\times 10^{6} \times 4 = 40MB file size). Let’s say we run the above GrowableArray implementation on a machine of, say, a 10GHz10GHz processor. One iteration of the loop where the copying happens from lines 17–20 (in GA.h) will take several clock ticks, but let’s assume that for the sake of an example, if one iteration executes in 1-clock tick, then 40MB40MB of file loading will take around (10×106)2(10\times 10^6)^2 insertion cost; therefore it will take around 10141010\frac{10^{14}}{10^{10}} seconds that is around 1000010000 seconds (around 2.772.77 hours). Similarly, if the data is around 1GB1GB, then the time will be around 10181010\frac{10^{18}}{10^{10}} seconds, which is around 27777.777 hours, which is equal to around 3.173.17 years.

Take a pause and read this: “Only 1GB1GB of data and total data loading time is 3.173.17 years.”

This is an unbearable wait. Now imagine if you are working on Facebook, Amazon, or Google records, where you have to work on petabytes of data that will be impossible to manage if just loading (not processing) takes such a huge amount of time.

2. Vector implementation of a growable array#

The implementation of vectors closely resembles that of a growable array, but with a slight difference. The main concern arises during insertion in the growable array, where the need to relocate previously inserted records leads to recopying. This prompts us to question how we can minimize this recopying process.

What if, instead of increasing every time by one and recopying everything previously inserted to new space, we increase the size by a larger number, let’s say TT. Then, for the next T1T-1 insertions, the cost of insertion will be O(1)O(1) constant, until the memory gets filled completely. This is the main idea behind a vector in C++. Let's look into more details of the vector implementation, how the expansion (regrowing) happens, and what advantage it gives.

The doubling technique#

The concept behind this implementation involves two crucial variables: capacity and size. The capacity represents the actual memory space occupied on the heap, while the size indicates the number of records inserted by the user in the array. The initial capacity can be set to either 1 or any positive size specified by the user.

During insertion, if there is remaining capacity (i.e., size < capacity), the new value can be seamlessly added to the index of size (and size gets incremented accordingly). However, when size equals capacity, the vector expands not by just one but by twice the previous capacity. While copying still occurs during this expansion, the subsequent half of the capacity values (approximately) will have a constant insertion cost of O(1).

Before delving into the efficiency analysis of this approach, let’s proceed with its implementation. The only change required is in lines 18–27.

main.cpp
Vector.h
#include <iostream>
using namespace std;
template <typename T>
class Vector
{
T * A;
int size;
int capacity;
public:
Vector(int _size = 1)
{
size = 0;
capacity = _size;
A = new T[_size];
}
void push_back(T v)
{
if(capacity == size)
{
capacity *= 2;
T * HA = new T [capacity];
for(int i = 0; i < size; i++)
{
HA[i] = A[i];
}
delete [] A; A = HA; HA = nullptr;
}
A[size] = v;
size++;
}
T & operator [](int i)
{
return A[i];
}
friend ostream & operator <<(ostream & out, const Vector<T> & V)
{
cout << "{";
for(int i = 0; i < V.size; i++)
{
cout<< V.A[i] << " ";
}
cout<< "}";
return cout;
}
};

Instruction: Run the above code in the main.cpp (you may change the values of the loop to test several different values). In our sample test run, it is loading 4040 million integers in only 6.386.38 seconds.

Here in the animation, you can see how the insertion happens and capacity and size manage the insertion operation. The actual cost is associated with the copying happening on the line 24 and line 28 (in Vector.h).

Inserting 4
Inserting 4
1 of 10

How good is vector insertion (time complexity analysis)?#

As we see in the above slides, due to the exponential increase in capacity (doubling by two), most of the time, the insertion cost will shrink to the copying cost of O(1)O(1). There are only rare events where the resizing (regrowing) of the vector happens.

Think for a moment about what those special positions are.

The total cost of NN operations looks like this seemingly bizarre pattern:

Cost of N insertions#

We can do a small trick, i.e., every single insertion cost is at least 1; by separating that 1 cost of each operation, the new summation will become:

    N+(11)+(21)+(31)+(11)+(51)+(11)+(11)+(11)+(91)+(11)+...+(N1)\implies N + (1-1)+(2-1)+(3-1)+(1-1)+(5-1)+(1-1)+(1-1)+(1-1)+(9-1)+(1-1)+...+(N-1)\\

    N+0+1+2+0+4+0+0+0+8+0+...+(N1)\implies N + 0+1+2+0+4+0+0+0+8+0+...+(N-1)\\

    N+1+2+4+8+...+(N1)\implies N + 1+2+4+8+...+(N-1)\\

The final expression has a geometric series (1+2+4+8+...+(N1)1+2+4+8+...+(N-1)). That summation is bounded by 2N2N; therefore, the total cost will be approximately: N+2N=3N\approx N+2N = 3N

Why 1+2+4+8+...+(N)<=2N1+2+4+8+...+(N)<=2N? The proof we are giving is considered a “proof without words.”In mathematics, a proof without words is an illustration of an identity or mathematical statement that can be demonstrated as self-evident by a diagram without any accompanying explanatory text. Such proofs can be considered more elegant than formal or mathematically rigorous proofs due to their self-evident nature.

As the summation is commutative, we can rewrite the summation as:

N+N/2+N/4+N/8+...+8+4+2+1N+N/2+N/4+N/8+...+8+4+2+1

We can take NN common: N(1+12+14+18++116+...)N(1+\frac{1}{2}+\frac{1}{4}+\frac{1}{8}++\frac{1}{16}+...)

As shown in the below diagram: 1+12+14+18+116+...21+\frac{1}{2}+\frac{1}{4}+\frac{1}{8}+\frac{1}{16}+... \leq 2.

The sum of an infinite geometric series starting from 1 with common ratio of 1/2
The sum of an infinite geometric series starting from 1 with common ratio of 1/2


Therefore, the total complexity will just be 3N3N.

Amortized (average) cost of vector insertions#

On average, if we say that in a vector array we have inserted NN values, then the total average cost will be O(3N/N)=O(1)O(3N/N) =O(1) approximately, meaning on almost every new entry we are spending approximately constant O(1)O(1) copying. That is an amazing improvement.

How fast is a vector as compared to a growable array?#

Now, let’s imagine that we would like to load 1GB1GB of data and assuming that one clock tick of a 10GHz10GHz processor executes one copying operation (the same assumption that we took in the above scenario of the growable array), now the total number of copying will just be 109×310^9 \times 3 that is lesser than the processor’s total speed of 101010^{10}; therefore the loading process will take approximately 109×31010\frac{10^9\times 3}{10^{10}} seconds that is less than a fraction of a second. This is a really big improvement as compared to the previous one, which was 3.173.17 years of loading time.

This is a clear difference between an algorithm taking O(N)O(N) time and O(N2)O(N^2); this difference gets extremely important when handling large data.

Limitations of vectors#

Vectors are a popular data structure and have many advantages, but they also come with some limitations:

  1. Contiguous memory: Vectors store elements in contiguous memory locations. While this provides faster access to elements, it also means that inserting or deleting elements from the middle of the vector requires shifting all subsequent elements, which can be inefficient for large vectors.

  2. Fixed capacity overhead: Despite being dynamic, vectors have a fixed capacity that grows as elements are added. If the capacity exceeds the required size, it may lead to wasted memory space.

  3. Reallocation overhead: As vectors grow and exceed their capacity, they may need to reallocate memory (though this is very rare), but still a specific event of insertion can be expensive.

  4. Limited performance for front insertions: Adding elements to the front of a vector requires shifting all existing elements, resulting in a time complexity of O(N)O(N), where NN is the number of elements in the vector.

  5. Cost of complex objects: For objects with complex copy constructors or large sizes, the resizing and copying operations can be resource-intensive.

While vectors are suitable for many use cases, it is essential to consider these limitations when deciding on the appropriate data structure for specific scenarios. In situations where real-time performance and frequent resizing are critical concerns, alternative data structures like linked lists, doubly-ended queues, or trees may be more appropriate choices.

How ArrayList in Java is different from C++ STL::vector#

The Java ArrayList class, part of the Java Collections Framework, offers an implementation similar to C++ STL’s vector. It distinguishes itself with a growing factor of 1.51.5 when resizing its internal array to accommodate more elements. This 1.51.5 factor balances memory usage and performance optimization, resulting in less memory wastage. However, this choice leads to more frequent reallocation and copying, causing a slight overhead during insertions compared to a factor of 22. Despite the trade-off, ArrayList remains a flexible and efficient dynamic array implementation, favored for managing lists of elements with dynamic sizes.

import java.util.Arrays;
public class CustomArrayList<T> {
private Object[] A;
private int size;
private int capacity;
public CustomArrayList(int initialCapacity)
{
size = 0;
capacity = initialCapacity;
A = new Object[initialCapacity];
}
public CustomArrayList() {
this(1);
}
public void push_back(T v)
{
if (capacity == size)
{
capacity = (int) Math.ceil(capacity * 1.5);
Object[] HA = new Object[capacity];
for (int i = 0; i < size; i++)
{
HA[i] = A[i];
}
A = HA;
// System.out.println(capacity);
}
A[size] = v;
size++;
}
@SuppressWarnings("unchecked")
public T get(int i)
{
if (i >= size)
throw new IndexOutOfBoundsException("Out of Boundary access");
return (T) A[i];
}
@Override
public String toString() {
return Arrays.toString(A);
}
public static void main(String[] args)
{
CustomArrayList<Integer> G = new CustomArrayList<>(10);
for(int i=1;
i<=8*(1<<20); // Works with 8 Million times here
// i<=40*(1<<20); // This will cause in increase in allocated memory,
// which is not feasible for educative platform
// but you can test on your own, that it will take more time.
i++)
{
G.push_back(i);
}
System.out.println("G is created... ");
}
}

Applications of vectors#

Vectors find their utility in various scenarios, especially when data insertion at the end and random access are crucial requirements. It excels in situations where dynamic stacks and queues need to be implemented. By utilizing vector in the background, we ensure that the memory allocation can adapt to dynamic scenarios, providing flexibility to grow or shrink as needed.

Notably, we have focused on the insertion at the end operation and the time complexity analysis in this blog. However, vector in C++ offers several other functions like pop_back() and resize() that can further enhance its versatility. Additionally, there are numerous other applications of vectors beyond what we have covered here. If you are interested in exploring these functions and applications in depth, we recommend checking out our comprehensive Data Structures Course and Grokking Coding Interviews Patterns in C++. There, you can delve into more real-world use cases of vector / ArrayList and their applications.

Grokking Coding Interview Patterns in C++

Cover
Grokking the Coding Interview Patterns

With thousands of potential questions to account for, preparing for the coding interview can feel like an impossible challenge. Yet with a strategic approach, coding interview prep doesn’t have to take more than a few weeks. Stop drilling endless sets of practice problems, and prepare more efficiently by learning coding interview patterns. This course teaches you the underlying patterns behind common coding interview questions. By learning these essential patterns, you will be able to unpack and answer any problem the right way — just by assessing the problem statement. This approach was created by FAANG hiring managers to help you prepare for the typical rounds of interviews at major tech companies like Apple, Google, Meta, Microsoft, and Amazon. Before long, you will have the skills you need to unlock even the most challenging questions, grok the coding interview, and level up your career with confidence. This course is also available in JavaScript, Python, Go, and C++ — with more coming soon!

85hrs
Intermediate
269 Challenges
270 Quizzes

Data Structures for Coding Interviews in Java

Cover
Data Structures for Coding Interviews in Java

Data structures are amongst the fundamentals of Computer Science and an important decision in every program. Consequently, they are also largely categorized as a vital benchmark of computer science knowledge when it comes to industry interviews. This course contains a detailed review of all the common data structures and provides implementation level details in Java to allow readers to become well equipped. Now with more code solutions, lessons, and illustrations than ever, this is the course for you!

35hrs
Beginner
65 Challenges
22 Quizzes

Data Structures for Coding Interviews in C++

Cover
Data Structures for Coding Interviews in C++

Data structures are amongst the very fundamentals of Computer Science and are often a core decision in developing efficient programs. Consequently, they are also largely categorized as a vital benchmark of computer science knowledge when it comes to industry interviews. This course contains a detailed review of all the common data structures and provides implementation level details in C++ to allow readers to become well equipped with all the different data structures they can leverage to write better code!

25hrs
Beginner
65 Challenges
23 Quizzes

  

Free Resources