String Processing

Strings processing

String processing refers to manipulation and analysis of strings. It encompasses various operations such as searching, sorting, concatenation, tokenization, parsing, and pattern matching. It is essential in extracting insights, interpreting human languages, validating and cleaning data, compressing data, encrypting and decrypting data, integrating data, and generating text.

Finding string length

Let’s make a function, str_length():

int str_length(char str[ ]);

In this function, we’ll pass the string as an argument, calculate the length using a loop, and return the length from the function. The loop will terminate when the indexed value is '\0' or 0 or NULL.

Let’s write down a complete code below:

Press + to interact
#include <iostream>
using namespace std;
int str_length(char str[ ])
{ // measures the length of the string excluding '\0'
int length=0;
for(int si=0; str[si]!='\0'; si++)
length++;
/* the above code can also be written as:
int length=0, si=0;
while(str[li]) // This loop will terminate when str[length] is 0 i.e. '\0'
{ // note '\0' has a ascii value of 0
si++;
length++;
}
*/
return length;
}
int main()
{
char A[ ]={"Hello"};
char B[ ]={"Educative"};
cout << "The length of '"<<A<<"' : \t"<<str_length(A)
<< "\t sizeof(A): "<<sizeof(A)<<endl;
cout << "The length of '"<<B<<"' : \t"<<str_length(B)
<< "\t sizeof(B): "<<sizeof(B)<<endl;
return 0;
}

String copying

Let’s make a str_copy() function in which we have two character arrays as parameters:

void str_copy(char Destination[ ], char Src[ ])

This function assumes that Destination[ ] has ample space to be copied from Src[]. We need to iterate the loop on Src[] until the null value \0. In each iteration, we’ll assign each character of Src[] to Destination[ ]. Lastly, we’ll assign the null value at the last index of Destination[ ].

Let’s write down a complete code below:

Press + to interact
#include <iostream>
#include <string.h>
using namespace std;
void str_copy(char Destination[ ], char Src[ ]) // This function assumes that D has ample space to be copied from S
{
int di=0;
for(int si=0; Src[si]!='\0'; si++, di++) // Condition can also be written as: S[si]!=0 as ascii value of '\0' is zero:0
{
Destination[di] = Src[si];
}
Destination[di] = '\0';
}
int main()
{
char A[ ]={"Hello "};
char B[ ]={"Turing"};
// A = B; // is an error
str_copy(A, B);
cout << "A: "<<A<<endl;
cout << "B: "<<B<<endl;
return 0;
}

String concatenation

Let’s make a function, str_cat(), in which we have two character arrays as parameters:

void str_cat(char Destination[ ], char Src[ ])

Here, we want to append Src[] into Destination[ ]. So first, we need to calculate the length of the Destination[ ] array in the di variable. We need to iterate the loop on Src[] until the null value is \0. In each iteration, we’ll append each character of Src[] to Destination[ ]. In the end, we’ll assign the null value at the last index of Destination[ ].

str_cat() function assumes that Destination[ ] has ample space for Src[] to be copied after Destination[ ].

Press + to interact
#include <iostream>
#include <string.h>
using namespace std;
int str_length(char str[ ])
{
int length=0;
for(int si=0; str[si]!='\0'; si++)
length++;
return length;
}
void str_cat(char Destination[ ], char Src[ ])
{
int di=str_length(Destination); // di holds the index where null is placed.
for(int si=0; Src[si]!=0; si++, di++)
{
Destination[di] = Src[si]; /* Note Destination[di] will start writing
after the text of the already present text in D[ ] */
}
Destination[di] = '\0';
}
int main()
{
char B[ ]={"Turing"};
char C[ 100 ] = {"Hello "};
str_cat(C, B);
cout << "C: "<<C<<endl;
return 0;
}

String comparison

Let’s make a str_compare() function in which we will compare two strings in dictionary order:

  • return 0: If both are the same.
  • return -1: If the first word comes before the second word.
  • return 1: If the second word comes before than the first word 1.
Press + to interact
#include <iostream>
#include <string.h>
using namespace std;
int str_length(char str[ ])
{
int length=0;
for(int si=0; str[si]!='\0'; si++)
length++;
return length;
}
int str_compare(char w1[ ], char w2[])
{
int smaller = min(str_length(w1), str_length(w2));
for(int i=0;i<=smaller; i++) // look carefully: the loop is running <= hence
{ // even if the smaller string consumesit will
// keep executing and for the last '\0' character
// ascii 0 will be compared with any ascii
// character hence -1 will be returned, in case of substring
if(w1[i] < w2[i])
return -1;
if(w1[i] > w2[i])
return 1;
}
return 0; // both the words are equal.
}
int main()
{
char A[ ] = "cat";
char B[ ] = "cat";
char C[ ] = "cattle";
char D[ ] = "dog";
cout << A << " vs "<<B<<" : "<<str_compare(A, B)<<endl;
cout << A << " vs "<<C<<" : "<<str_compare(A, C)<<endl;
cout << C << " vs "<<A<<" : "<<str_compare(C, A)<<endl;
cout << C << " vs "<<D<<" : "<<str_compare(C, D)<<endl;
cout << D << " vs "<<C<<" : "<<str_compare(D, A)<<endl;
return 0;
}

String reversal

Let’s make a str_reverse() function in which we’ll simply swap the last index with the first index, the second last index with the second index, and so on.

Press + to interact
#include <iostream>
#include <string.h>
using namespace std;
int str_length(char str[ ])
{
int length=0;
for(int si=0; str[si]!='\0'; si++)
length++;
return length;
}
void str_reverse(char S[ ])
{
int si = 0,
li = str_length(S)-1; // the last legal character
while(si<li)
{
swap(S[si],S[li]);
si++;
li--;
}
}
int main()
{
char R[ ] = "This is a cat.";
str_reverse(R);
cout<<"Reverse: "<<R;
return 0;
}

The isPalindrome() function

We have already discussed palindromic numbers in previous lessons. Let’s use the above functions to make this function more elegant. Look at the code below:

Press + to interact
#include <iostream>
#include <string.h>
using namespace std;
int str_length(char str[ ])
{
int length=0;
for(int si=0; str[si]!='\0'; si++)
length++;
return length;
}
void str_copy(char Destination[ ], char Src[ ])
{
int di=0;
for(int si=0; Src[si]!='\0'; si++, di++)
{
Destination[di] = Src[si];
}
Destination[di] = '\0';
}
int str_compare(char w1[ ], char w2[])
{
int smaller = min(str_length(w1), str_length(w2));
for(int i=0;i<=smaller; i++)
{
if(w1[i] < w2[i])
return -1;
if(w1[i] > w2[i])
return 1;
}
return 0;
}
void str_reverse(char S[ ])
{
int si = 0,
li = str_length(S)-1;
while(si<li)
swap(S[si], S[li]), si++, li--;
}
bool isPalindrome(char S[ ])
{
char C[100];
str_copy(C, S);
str_reverse(C);
return str_compare(C, S)==0;
}
int main()
{
char A[ ] = "refer";
if(isPalindrome(A))
{
cout << A<<" is a palindrome!!!"<<endl;
}
else
{
cout << A<<" is not a palindrome!!!"<<endl;
}
return 0;
}

In lines 44–46, we call the three functions we have discussed above.

We call the str_copy() function to copy the contents of S into C. Then, we call the str_reverse() function to reverse the characters in the C array. Finally, the str_compare() function is called to compare the reversed C array with the original S array.

If the str_compare() function returns 0, it means that the two arrays are identical and the original string is a palindrome. In this case, the function returns true. If the str_compare() function returns a non-zero value, it means that the two arrays are not identical, and the original string is not a palindrome. In this case, the function returns false.