Arrays Boundary, Capacity and Size, CString, and Filing

Learn about what arrays are, why we need them, and how we can and cannot use arrays.

Accessing arrays

In this lesson, we will learn what happens if we access arrays out of the boundary. Also, we will learn about character arrays and how they are handled differently. Lastly, we will discuss reading data from files instead of cumbersomely initializing it on our own.

Accessing arrays out of bound

Let’s start by asking a few questions to ourselves.

What if we try to access the array index that is out of bounds for our array? What would happen if we mistakenly modify the memory indexed outside the array size?

Note that the array index can’t be negative.

See the program below as an example.

#include <iostream>
using namespace std;

int main()  
{
    int A[5]={};    // array of size 5

    for(int i=0; i<10000; i++)   
    // accessing invalid array indexes (after index 4) 
    {
        /* logical error since we are trying to access 
            indexes beyond the array boundaries */
        cout<<A+i<<": "<<"A["<<i<<"]: "<< A[i]<<"\n";

    }
    return 0;
}
Printing out of bounds array elements and their addresses

After running the code above, we saw that, though we were accessing indexes outside of the array (after index 4), the code did not give us any error.

However, this is a logical error.

The code above printed the five addresses and the 0s stored there, but then it also printed the next five addresses and the values stored on those addresses, even though these memory locations are past the size of our array!

We are not supposed to access or modify the memory that is not part of our array.

C++ doesn’t necessarily warn us against such errors, which is why these errors can be very dangerous.

Some programming languages like Python and Java have protection against accessing an out-of-bounds element and throw an error if any such thing happens. However, to be safe, we should always be careful about array boundaries.

See the animation below to understand this undefined behavior. The red memory addresses represent the addresses that are out of bounds.

In the code widget above, we attempted to access memory outside of its permitted range. Let us try the same code as above on a different compiler. The code widget below uses a different compiler (and it has a strict policy of not allowing illegal memory access beyond the allocated space). Run the code and see the result.

Press + to interact
#include <iostream>
using namespace std;
int main()
{
int A[5]={}; // array of size 5
for(int i=0; i<10000; i++)
// accessing invalid array indexes (after index 4)
{
/* logical error since we are trying to access
indexes beyond the array boundaries */
cout<<A+i<<": "<<"A["<<i<<"]: "<< A[i]<<"\n";
}
return 0;
}

This indicates that printing elements from out-of-bounds memory locations in an array can result in unpredictable behavior. It may cause the program to crash immediately with a segmentation fault, or it may permit additional memory accesses before eventually leading to a crash.

Let us see how the GDB debugger behaves if we access the invalid indexes and modify the values.

Run and observe the program below as an example.

Press + to interact
#include <iostream>
using namespace std;
int main() // logical Error
{
int A[5]={};
for(int i=0; i<100; i++)
// error because we are trying to access beyond boundaries
A[i] = 0; // "*** stack smashing detected ***: terminated"
return 0;
}

When we try to access the array out-of-bounds memory and modify the content stored there, the content that is already stored in those locations will be overwritten.

Now, as long as there is memory in the allocated stack, the memory contents will be overwritten. However, as soon as a point is reached when the allocated stack memory ends, our program crashes with the message stack smashing detected: terminated. In short, when modifying a large number of out-of-bounds indexes, our program will crash, but when modifying only a few invalid indexes, the values of any of our memory variables will most likely be modified, causing undefined behavior.

Now, this undefined behavior where we might accidentally overwrite the memory contents is more dangerous than where we might only read the out-of-bounds memory locations.

See the animation below to understand this undefined behavior.

We have a question for you. Solve the quiz below.

Q

Given the code below, what will be its output?

#include <iostream>
using namespace std;

int main() 
{
    int A[5]={1,2,3,4,5,6}; 

    for(int i=0; i<5; i++)
        cout <<A+i<<": "<< A[i]<<"\n";
    return 0;
}
A)

The program will print the five addresses and values 1, 2, 3, 4, 5.

B)

The program will not compile because there is a syntax error.

Setting array size and capacity

Suppose we ask the user for the size of the array and store it inside the size variable and allocate the array of variable size:

Press + to interact
int size;
cout<< "The size of your array: "<< endl;
cin>>size;
int A[size]; // logical error

This is a logical error, because we cannot create a variable-sized array.

If we run the code above, we should get an error because we cannot initialize a variable-sized object. Some compilers do not give this warning, but it is still incorrect and does not make it valid.

Click “Run” to see what we mean.

#include <iostream>
using namespace std;

int main()
{
    int size;
    cout<< "The size of your array: "<< endl;
    cin>>size;
    int A[size]; // logical error 
    
    return 0;
}



Setting the capacity of an array

While it may be convenient to create an array of a size that we might think we need, C++ does not allow variable-size arrays. Its reason is beyond the scope of this course. In short though, the reason is that, before the execution, a compiler should know what exactly is the memory requirement of each block of the code, including functions and all the control structures like if, if-else, for, while, and so on; hence, variable-size memory is not possible.

Still, if you need to have a variable-size array, then, as a precautionary measure, one way is to make the size of the array a bit larger than the size we might think we need. So, say for some program, the user needs to make an array of size 50 or 75, but not more than 100; then, we can set a certain larger capacity (say 100) of the array.

For example, look at the code below. In line 7, we have created a variable capacity and set it to 100. Note that we used the keyword const with the variable capacity. The const keyword ensures that the value cannot be changed by telling the compiler that the variable or function with which the const is used will not be modified. This way, using const prevents accidental modification of variables that should not be modified.

In line 8, we then declare an array of size capacity.

Run and observe the program below.

#include <iostream>
using namespace std;
int main()
{
// larger capacity of our array
const int capacity=100;
int A[capacity]={};
// expected size of the array
int size;
cin>>size; // Use only size many entries...
// the rest will be wasted
for(int i=0; i<size; i++)
{
A[i] = i*10; // do whatever
}
for(int i=0; i<size; i++)
{
cout << A+i << ": A["<<i<<"]: "<<" "<<A[i] << "\n";
}
return 0;
}

Enter the input below

Creating an array with a larger capacity

Instruction: Add size in the input stream (below 100). Execute the code and see what the array is after initialization.

Dynamic allocation is a technique that enables the use of a variable within square brackets. It is used to specify the size of an array at runtime by allocating memory during execution through the use of particular functions or keywords, which vary based on the programming language.

Character arrays

C++ handles character arrays in a special way. Before we talk about this specialty, let us see how to create and initialize character arrays.

Declaration and initialization

Look at the following codes and go through each one by one.

#include <iostream>
using namespace std;
int main()
{
char A[100]; // A contains garbage values
for(int i=0; i<5; i++) // Will print arbitrary 5 character
cout << A[i] << endl; // grabage values
cout <<endl;
return 0;
}
The cStringBasics tab

There is nothing new inside the code of the “cStringBasics” tab. A character array is declared with all garbage characters. Like an integer array, the entire character array can be displayed, just like any integer array.

The second example is a bit different inside the “cStringBasicsSpecialPrint” tab. An array is initialized with only three characters. The rest of the two characters are automatically assigned zeros (which is the ASCII value of '\0'). Whenever a character address is placed on a character stream, instead of taking the address, the cout reads character by character all the letters until it reads null character '\0'. Hence, line 14 will print “ant”, line 18 will print “nt”, line 21 will print “t”, and line 24 will print “ant”.

In the third example, inside the “sizeOfCString” tab, a character array is declared of size=4 where the A[3] = '\0' is automatically added. If we apply the sizeof(A), it counts the null character in the allocated size of the array too. The double quotes in line 6 of the code indicate that the value being assigned to the character array A is a #key# A sequence of characters enclosed in double quotes and stored in memory as an array of characters with a null terminator at the end #key#. The string "ant" is being assigned to the character array A.

These null-terminating character arrays are called cstrings.

In C++, a cstring is a null-terminated character array, while a string is a general term that refers to a sequence of characters. We will further look into its details and solve several applications in the next chapter.

Reading data from the file

For reading the data from the file, we need to use #include<fstream>. After that, we need to make sure the ifstream variable (usually called object), initiated with the file name (as cstring), is passed like a function call.

ifstream Reader("fileName.ext");

Reader (you can use any name) acts like cin, but instead of reading input from the console, it reads from a file. It functions as a file stream, which takes data from the file and reads it character by character into the code’s variables, like cin. An example of file reading is shown here.

main.cpp
Data.txt
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
const int capacity=100;
int size;
int A[capacity]={};
char B[capacity], asymbol, C[capacity];
ifstream Rdr("Data.txt");
Rdr>>size; // this will read 10 into size
for(int i=0; i<size; i++) // Loop will run 10 times
Rdr>>A[i]; // it will read 10 20 30 40 50 60 70 80 90 100
// note that it ignores new line
Rdr>>B; /* this will read "Hello": Again character arrays are read directly without
any loop, the reading stops until the read character in the stream is
' ' or '\n' or '\t' */
Rdr>>asymbol; // it will ignore space ' ' and read 'T'
Rdr>>C; // it will read "uring"
cout << "size: "<<size << endl;
for(int i=0; i<size; i++)
cout << "A["<<i<<"]="<< A[i] << " ";
cout << endl;
cout << "B[] = \""<<B<<"\""<<endl;
cout << "asymbol: '"<<asymbol<<"'"<<endl;
cout << "C[] = \""<<C<<"\""<<endl;
return 0;
}
Reading data from the file
  • In line 7, the const int capacity=100 sets a constant integer variable capacity to a value of 100.

  • In line 9, int A[capacity]={}; declares an integer array A with a capacity of 100 and initializes all elements to 0.

  • In line 12, an input file stream object Rdr is created that is associated with the file “Data.txt”.

    • Inside the “Data.txt” file, the first value 10 represents the number of numeric values that follow, which are 10 20 30 40 50 60 70 80 90 100. This is then followed by the text Hello Turing! Thank you for everything!!!. Also, the variable size is used to control the loop that reads the remaining values in the file, so it needs to know the size of the array A. It’s also why the first line in the data file is the size of the array A.
  • In line 13, the first value 10 in the file is read and stored inside the variable size.

    The operator >> in the code Rdr>>size; extracts the next value from the input file stream Rdr and stores it in the variable size. The operator >> skips whitespace characters and reads until it encounters a non-numeric character. In this case, the first value in the file is 10, which is a number, so it stops reading after it encounters the first non-numeric character which, in this case, is a newline. That’s why only one value is read.

  • In line 15, in each iteration of the for loop that runs 10 times, the next value in the file is read and stored in the current element of the array A. Although the newline character is not considered a whitespace or skipped by the operator >>, it is ignored when the next value is read, which is the next number in the array A.

  • In line 18, the next string in the file is read and stored in the character array B.

  • In line 22, the next character in the file is read and stored in the character variable asymbol.

  • In line 23, the next string in the file is read and stored in the character array C.

  • In lines 26–28, the values in the array A are printed to the console.

Instruction: Open and look at the “Data.txt” file and then at the code and how the data is read instruction by instruction. There is no difference between the cin and ifstream readers, except for the file stream initialization process (because it needs to be initialized with the file name).

Summary

We hope you’ve learned how arrays can be declared, initialized, and easily traversed to find any of their elements. We also looked at character arrays and how special null-terminating cstring is dealt with in C++.

In the upcoming lessons, we will solve several challenging problems related to arrays.