Why Vector (In C++) And Arraylist (In Java) Are So Fast

Home/

Blog/

Programming/

13 mins read

Oct 17, 2023

Content

The need for growable arrays

1. Implementing a generalized growable array

Explanation

Time complexity analysis

Amortized (average) cost

Why is the insertion cost of O(N) too slow and impractical?

2. Vector implementation of a growable array

The doubling technique

How good is vector insertion (time complexity analysis)?

Cost of N insertions

Amortized (average) cost of vector insertions

How fast is a vector as compared to a growable array?

Limitations of vectors

How ArrayList in Java is different from C++ STL::vector

Applications of vectors

In programming, the need for dynamically resizable arrays is ubiquitous. Growable arrays have long been used to address this requirement, but with the advent of vector, a powerful container class provided by the Standard Template Library (STL), and ArrayList (in the Java Collections Framework), developers have gained access to a highly optimized and efficient solution.

In this blog, we will explore the motivation behind using growable arrays named the GrowAbleArray class (for this blog only), delve into its implementation (specifically the insertion at the end part), and analyze its time complexity. Then, our focus will shift to our own implementation of a Vector class, similar to vector; we'll discuss its insertion at the end part, and analyze its time complexity. We will also highlight the distinguishing factors in ArrayList. Finally, we will compare GrowAbleArray and Vector to understand why vectors are preferred over traditional growable arrays, considering their strengths, weaknesses, performance, and memory management strategies in modern programming scenarios.

The need for growable arrays#

Static arrays, while efficient and powerful, come with a limitation—they can't be resized. This constraint arises from the need to define variables and arrays at compile time, where their sizes are predetermined on the stack. Although this approach enables precise memory allocation and efficient runtime execution, it poses challenges when dealing with data that dynamically expands.

Enter dynamic arrays—a solution that empowers us to overcome the limitations of static arrays. By allowing us to allocate memory at runtime (on heap), dynamic arrays provide the flexibility to handle growing data seamlessly. With this powerful concept, we can tackle programming tasks involving massive data manipulations where the size of the data is dynamic.

In the above animation, you can see how we first created a dynamic array of size three and then expanded it to size four by inserting a new value at the end of the array. Although this expansion involves creating a new memory of size+1, copying the previous data to the new memory, writing the new value at the end, and then relocating the old pointer to the base of the new memory gives the illusion that a new value is simply added at the end of the original data.

Let's make a customized generic template class by extending this idea.

1. Implementing a generalized growable array#

Now, let's proceed with the implementation of a growable array. In this example, we are creating a user-defined type of a growable array named GrowableArray. We are also overloading the subscript [] operator to provide similar behavior to a primitive array. Additionally, we will implement the operator<< to facilitate the ease of printing our GrowableArray instances.

#include <iostream>
using namespace std;
template <typename T>
class GrowableArray
{
    T * A;
    int size;
    public:
        GrowableArray()
        {
            size = 0;
            A = nullptr;
        }
        void push_back(T v)
        {
            T * HA = new T [size + 1];
            for(int i = 0; i < size; i++)
            {
                HA[i] = A[i];
            }
            HA[size] = v;
            if(size!=0)
                delete [] A; // As deleting a nullptr in some compiler is not safe
            A = HA; 
            HA = nullptr;
            size++;
        }
        T & operator [](int i)
        {
            if(i>=size) throw "Out of Boundary access";
            return A[i]; // This is dangerous if i is out of bound
        }
        friend ostream & operator <<(ostream & out, const GrowableArray<T> & V)
        { // This requires the data values must have << operator overloaded
            out << "{ ";
            for(int i = 0; i < V.size; i++)
            {
                out<< V.A[i] << " ";
            }
            out<< "}";
            return out;
        }
        ~GrowableArray()
        {
            delete [] A;
        }  
};
int main() 
{
  // your code goes here
  GrowableArray<int> G;
  G.push_back(1);
  G.push_back(2);
  G.push_back(3);
  cout << "G: " << G << endl;
  return 0;
}

Let's discuss what we have done in the above code.

Explanation#

C++ implementation

We have created a user-defined type GrowableArray by using the properties of a dynamic array. In the GrowableArray class, we define two data members:

Line 6: A pointer A of template type T to enable the user to make a growable array of any type they like.
Line 7: A variable size of type int to hold the track of how much of a memory is allocated on heap.
Lines 9–13: We define a constructor GrowableArray() to make sure that all the data members must be in a valid state.
Lines 14–27: We define the method push_back(int v) to handle the insertion of elements into the array. This method increases the size of the array by one. It does this by creating a new heap array of size size+1 , copying all the elements from the previous memory into the new array, and saving the new value in the additional available space. Subsequently, the method deletes the previously allocated memory, and the pointer is relocated to the new memory, exactly as demonstrated in the example of expanding a growable array from size three to size four

Java implementation

Run the code of the GrowableArray.java file.

It has a similar implementation. The push_back() functions repeat a similar expanding factor. Similar to operator<<, this implementation uses the toString() function, which is used to print the Java GrowableArray.

Let’s have a look at the time complexity of these growable arrays (implementations of both C++ and Java).

Time complexity analysis#

Let’s assume we have to load $N$ records into the growable array. How much time will it take?

If we insert the 1^st record, it will take 1 copying.
If we insert the 2^nd record, it will take 2 copying (1 for the previous record relocated and 1 for the new value).
Inserting the 3^rd record will take 3 copying (2 for the previous records relocated and 1 for the new value).
If we insert the $N^{th}$ record, it will take $N$ copying ( $N-1$ for the previous records relocated and 1 for the new value).

So, that means the cost of inserting $N$ records is:

Therefore, the total cost, as illustrated in the animation above, is approximately $\frac{N^2}{2}+\frac{N}{2}\approx N^2$ . In computer science, we denote this as $O(N^2)$ , pronounced as “Big O of N squared.” In Big-O notation, we consider only the highest degree term of the polynomial, disregarding the constants associated with it and all smaller degree terms.

Amortized (average) cost#

On average, if we say that in a growable array we have inserted $N$ values, then the total average cost will be $O(N^2/N) =O(N)$ approximately, meaning on almost every new entry, we are spending approximately $O(N)$ copying. That is too slow.

Why is the insertion cost of O(N) too slow and impractical?#

Let’s imagine we're loading 10 million integers (in a binary file containing $4$ bytes for each integer, i.e., $\approx 10\times 10^{6} \times 4 = 40MB$ file size). Let’s say we run the above GrowableArray implementation on a machine of, say, a $10GHz$ processor. One iteration of the loop where the copying happens from lines 17–20 (in GA.h) will take several clock ticks, but let’s assume that for the sake of an example, if one iteration executes in 1-clock tick, then $40MB$ of file loading will take around $(10\times 10^6)^2$ insertion cost; therefore it will take around $\frac{10^{14}}{10^{10}}$ seconds that is around $10000$ seconds (around $2.77$ hours). Similarly, if the data is around $1GB$ , then the time will be around $\frac{10^{18}}{10^{10}}$ seconds, which is around 27777.777 hours, which is equal to around $3.17$ years.

Take a pause and read this: “Only $1GB$ of data and total data loading time is $3.17$ years.”

This is an unbearable wait. Now imagine if you are working on Facebook, Amazon, or Google records, where you have to work on petabytes of data that will be impossible to manage if just loading (not processing) takes such a huge amount of time.

2. Vector implementation of a growable array#

The implementation of vectors closely resembles that of a growable array, but with a slight difference. The main concern arises during insertion in the growable array, where the need to relocate previously inserted records leads to recopying. This prompts us to question how we can minimize this recopying process.

What if, instead of increasing every time by one and recopying everything previously inserted to new space, we increase the size by a larger number, let’s say $T$ . Then, for the next $T-1$ insertions, the cost of insertion will be $O(1)$ constant, until the memory gets filled completely. This is the main idea behind a vector in C++. Let's look into more details of the vector implementation, how the expansion (regrowing) happens, and what advantage it gives.

The doubling technique#

The concept behind this implementation involves two crucial variables: capacity and size. The capacity represents the actual memory space occupied on the heap, while the size indicates the number of records inserted by the user in the array. The initial capacity can be set to either 1 or any positive size specified by the user.

During insertion, if there is remaining capacity (i.e., size < capacity), the new value can be seamlessly added to the index of size (and size gets incremented accordingly). However, when size equals capacity, the vector expands not by just one but by twice the previous capacity. While copying still occurs during this expansion, the subsequent half of the capacity values (approximately) will have a constant insertion cost of O(1).

Before delving into the efficiency analysis of this approach, let’s proceed with its implementation. The only change required is in lines 18–27.

C++

Files

#include <iostream>
using namespace std;
template <typename T>
class Vector
{
    T * A;
    int size;
    int capacity;
    public:
        Vector(int _size = 1)
        {
            size = 0;
            capacity = _size;
            A = new T[_size];
        }
        void push_back(T v)
        {
            if(capacity == size)
            {
                capacity *= 2;
                T * HA = new T [capacity];
                for(int i = 0; i < size; i++)
                {
                    HA[i] = A[i];
                }
                delete [] A; A = HA; HA = nullptr;
            }
            A[size] = v;
            size++;  
        }
        T & operator [](int i)
        {
            return A[i];
        }
        friend ostream & operator <<(ostream & out, const Vector<T> & V)
        {
            cout << "{";
            for(int i = 0; i < V.size; i++)
            {
                cout<< V.A[i] << " ";
            }
            cout<< "}";
            return cout;
        }     
};

Cost of N insertions#

We can do a small trick, i.e., every single insertion cost is at least 1; by separating that 1 cost of each operation, the new summation will become:

\implies N + (1-1)+(2-1)+(3-1)+(1-1)+(5-1)+(1-1)+(1-1)+(1-1)+(9-1)+(1-1)+...+(N-1)\\

\implies N + 0+1+2+0+4+0+0+0+8+0+...+(N-1)\\

\implies N + 1+2+4+8+...+(N-1)\\

The final expression has a geometric series ( $1+2+4+8+...+(N-1)$ ). That summation is bounded by $2N$ ; therefore, the total cost will be approximately: $\approx N+2N = 3N$

Why $1+2+4+8+...+(N)<=2N$ ? The proof we are giving is considered a “proof without words.”In mathematics, a proof without words is an illustration of an identity or mathematical statement that can be demonstrated as self-evident by a diagram without any accompanying explanatory text. Such proofs can be considered more elegant than formal or mathematically rigorous proofs due to their self-evident nature.

As the summation is commutative, we can rewrite the summation as:
$N+N/2+N/4+N/8+...+8+4+2+1$
We can take $N$ common: $N(1+\frac{1}{2}+\frac{1}{4}+\frac{1}{8}++\frac{1}{16}+...)$

As shown in the below diagram: $1+\frac{1}{2}+\frac{1}{4}+\frac{1}{8}+\frac{1}{16}+... \leq 2$ .

How fast is a vector as compared to a growable array?#

Now, let’s imagine that we would like to load $1GB$ of data and assuming that one clock tick of a $10GHz$ processor executes one copying operation (the same assumption that we took in the above scenario of the growable array), now the total number of copying will just be $10^9 \times 3$ that is lesser than the processor’s total speed of $10^{10}$ ; therefore the loading process will take approximately $\frac{10^9\times 3}{10^{10}}$ seconds that is less than a fraction of a second. This is a really big improvement as compared to the previous one, which was $3.17$ years of loading time.

This is a clear difference between an algorithm taking $O(N)$ time and $O(N^2)$ ; this difference gets extremely important when handling large data.

Limitations of vectors#

Vectors are a popular data structure and have many advantages, but they also come with some limitations:

Contiguous memory: Vectors store elements in contiguous memory locations. While this provides faster access to elements, it also means that inserting or deleting elements from the middle of the vector requires shifting all subsequent elements, which can be inefficient for large vectors.
Fixed capacity overhead: Despite being dynamic, vectors have a fixed capacity that grows as elements are added. If the capacity exceeds the required size, it may lead to wasted memory space.
Reallocation overhead: As vectors grow and exceed their capacity, they may need to reallocate memory (though this is very rare), but still a specific event of insertion can be expensive.
Limited performance for front insertions: Adding elements to the front of a vector requires shifting all existing elements, resulting in a time complexity of $O(N)$ , where $N$ is the number of elements in the vector.
Cost of complex objects: For objects with complex copy constructors or large sizes, the resizing and copying operations can be resource-intensive.

While vectors are suitable for many use cases, it is essential to consider these limitations when deciding on the appropriate data structure for specific scenarios. In situations where real-time performance and frequent resizing are critical concerns, alternative data structures like linked lists, doubly-ended queues, or trees may be more appropriate choices.

How ArrayList in Java is different from C++ STL::vector#

The Java ArrayList class, part of the Java Collections Framework, offers an implementation similar to C++ STL’s vector. It distinguishes itself with a growing factor of $1.5$ when resizing its internal array to accommodate more elements. This $1.5$ factor balances memory usage and performance optimization, resulting in less memory wastage. However, this choice leads to more frequent reallocation and copying, causing a slight overhead during insertions compared to a factor of $2$ . Despite the trade-off, ArrayList remains a flexible and efficient dynamic array implementation, favored for managing lists of elements with dynamic sizes.

Java

import java.util.Arrays;
public class CustomArrayList<T> {
    private Object[] A;
    private int size;
    private int capacity;
    public CustomArrayList(int initialCapacity) 
    {
        size = 0;
        capacity = initialCapacity;
        A = new Object[initialCapacity];
    }
    public CustomArrayList() {
        this(1);
    }
    public void push_back(T v) 
    {
        if (capacity == size) 
        {
            capacity = (int) Math.ceil(capacity * 1.5);
            Object[] HA = new Object[capacity];
            for (int i = 0; i < size; i++) 
            {
                HA[i] = A[i];
            }
            A = HA;
            // System.out.println(capacity);
        }
        A[size] = v;
        size++;
    }
    @SuppressWarnings("unchecked")
    public T get(int i) 
    {
        if (i >= size)
            throw new IndexOutOfBoundsException("Out of Boundary access");
        return (T) A[i];
    }
    @Override
    public String toString() {
        return Arrays.toString(A);
    }
    public static void main(String[] args) 
    {
        CustomArrayList<Integer> G = new CustomArrayList<>(10);
        for(int i=1; 
            i<=8*(1<<20);     // Works with 8 Million times here
            // i<=40*(1<<20); // This will cause in increase in allocated memory, 
            // which is not feasible for educative platform 
            // but you can test on your own, that it will take more time.
            
            i++)
        {
            G.push_back(i);
        }   
        System.out.println("G is created... ");
        
    }
}

Applications of vectors#

Vectors find their utility in various scenarios, especially when data insertion at the end and random access are crucial requirements. It excels in situations where dynamic stacks and queues need to be implemented. By utilizing vector in the background, we ensure that the memory allocation can adapt to dynamic scenarios, providing flexibility to grow or shrink as needed.

Notably, we have focused on the insertion at the end operation and the time complexity analysis in this blog. However, vector in C++ offers several other functions like pop_back() and resize() that can further enhance its versatility. Additionally, there are numerous other applications of vectors beyond what we have covered here. If you are interested in exploring these functions and applications in depth, we recommend checking out our comprehensive Data Structures Course and Grokking Coding Interviews Patterns in C++. There, you can delve into more real-world use cases of vector / ArrayList and their applications.

Grokking Coding Interview Patterns in C++

Grokking the Coding Interview Patterns

With thousands of potential questions to account for, preparing for the coding interview can feel like an impossible challenge. Yet with a strategic approach, coding interview prep doesn’t have to take more than a few weeks. Stop drilling endless sets of practice problems, and prepare more efficiently by learning coding interview patterns. This course teaches you the underlying patterns behind common coding interview questions. By learning these essential patterns, you will be able to unpack and answer any problem the right way — just by assessing the problem statement. This approach was created by FAANG hiring managers to help you prepare for the typical rounds of interviews at major tech companies like Apple, Google, Meta, Microsoft, and Amazon. Before long, you will have the skills you need to unlock even the most challenging questions, grok the coding interview, and level up your career with confidence. This course is also available in JavaScript, Python, Go, and C++ — with more coming soon!

85hrs

Intermediate

433 Challenges

434 Quizzes

Data Structures for Coding Interviews in Java

Data structures are amongst the fundamentals of Computer Science and an important decision in every program. Consequently, they are also largely categorized as a vital benchmark of computer science knowledge when it comes to industry interviews. This course contains a detailed review of all the common data structures and provides implementation level details in Java to allow readers to become well equipped. Now with more code solutions, lessons, and illustrations than ever, this is the course for you!

35hrs

Beginner

65 Challenges

22 Quizzes

Data Structures for Coding Interviews in C++

Data structures are amongst the very fundamentals of Computer Science and are often a core decision in developing efficient programs. Consequently, they are also largely categorized as a vital benchmark of computer science knowledge when it comes to industry interviews. This course contains a detailed review of all the common data structures and provides implementation level details in C++ to allow readers to become well equipped with all the different data structures they can leverage to write better code!

25hrs

Beginner

65 Challenges

23 Quizzes

Written By:

Sarfraz Raza

New on Educative

Learn to Code

Learn any Language as a beginner

Develop a human edge in an AI powered world and learn to code with AI from our beginner friendly catalog

🎁 G i v e a w a y

30 Days of Code

Complete Educative’s daily coding challenge every day in September, and win exciting Prizes.

Free Resources

blog

What are REST APIs? HTTP API vs. REST API

blog

How does prompt engineering differ from traditional programming?

blog

10 common mistakes Python programmers make (and how to fix them)

Why Vector (In C++) And Arraylist (In Java) Are So Fast

The need for growable arrays#

1. Implementing a generalized growable array#

Explanation#

Time complexity analysis#

Amortized (average) cost#

Why is the insertion cost of O(N) too slow and impractical?#

2. Vector implementation of a growable array#

The doubling technique#

How good is vector insertion (time complexity analysis)?#

Cost of N insertions#

Amortized (average) cost of vector insertions#

How fast is a vector as compared to a growable array?#

Limitations of vectors#

How ArrayList in Java is different from C++ STL::vector#

Applications of vectors#