String Processing
Learn to process strings using different string operations.
Strings processing
String processing refers to manipulation and analysis of strings. It encompasses various operations such as searching, sorting, concatenation, tokenization, parsing, and pattern matching. It is essential in extracting insights, interpreting human languages, validating and cleaning data, compressing data, encrypting and decrypting data, integrating data, and generating text.
Finding string length
Let’s make a function, str_length()
:
int str_length(char str[ ]);
In this function, we’ll pass the string as an argument, calculate the length using a loop, and return the length from the function. The loop will terminate when the indexed value is '\0'
or 0
or NULL
.
Let’s write down a complete code below:
#include <iostream>using namespace std;int str_length(char str[ ]){ // measures the length of the string excluding '\0'int length=0;for(int si=0; str[si]!='\0'; si++)length++;/* the above code can also be written as:int length=0, si=0;while(str[li]) // This loop will terminate when str[length] is 0 i.e. '\0'{ // note '\0' has a ascii value of 0si++;length++;}*/return length;}int main(){char A[ ]={"Hello"};char B[ ]={"Educative"};cout << "The length of '"<<A<<"' : \t"<<str_length(A)<< "\t sizeof(A): "<<sizeof(A)<<endl;cout << "The length of '"<<B<<"' : \t"<<str_length(B)<< "\t sizeof(B): "<<sizeof(B)<<endl;return 0;}
String copying
Let’s make a str_copy()
function in which we have two character arrays as parameters:
void str_copy(char Destination[ ], char Src[ ])
This function assumes that Destination[ ]
has ample space to be copied from Src[]
. We need to iterate the loop on Src[]
until the null value \0
. In each iteration, we’ll assign each character of Src[]
to Destination[ ]
. Lastly, we’ll assign the null value at the last index of Destination[ ]
.
Let’s write down a complete code below:
#include <iostream>#include <string.h>using namespace std;void str_copy(char Destination[ ], char Src[ ]) // This function assumes that D has ample space to be copied from S{int di=0;for(int si=0; Src[si]!='\0'; si++, di++) // Condition can also be written as: S[si]!=0 as ascii value of '\0' is zero:0{Destination[di] = Src[si];}Destination[di] = '\0';}int main(){char A[ ]={"Hello "};char B[ ]={"Turing"};// A = B; // is an errorstr_copy(A, B);cout << "A: "<<A<<endl;cout << "B: "<<B<<endl;return 0;}
String concatenation
Let’s make a function, str_cat()
, in which we have two character arrays as parameters:
void str_cat(char Destination[ ], char Src[ ])
Here, we want to append Src[]
into Destination[ ]
. So first, we need to calculate the length of the Destination[ ]
array in the di
variable. We need to iterate the loop on Src[]
until the null value is \0
. In each iteration, we’ll append each character of Src[]
to Destination[ ]
. In the end, we’ll assign the null value at the last index of Destination[ ]
.
str_cat()
function assumes thatDestination[ ]
has ample space forSrc[]
to be copied afterDestination[ ]
.
#include <iostream>#include <string.h>using namespace std;int str_length(char str[ ]){int length=0;for(int si=0; str[si]!='\0'; si++)length++;return length;}void str_cat(char Destination[ ], char Src[ ]){int di=str_length(Destination); // di holds the index where null is placed.for(int si=0; Src[si]!=0; si++, di++){Destination[di] = Src[si]; /* Note Destination[di] will start writingafter the text of the already present text in D[ ] */}Destination[di] = '\0';}int main(){char B[ ]={"Turing"};char C[ 100 ] = {"Hello "};str_cat(C, B);cout << "C: "<<C<<endl;return 0;}
String comparison
Let’s make a str_compare()
function in which we will compare two strings in dictionary order:
return 0
: If both are the same.return -1
: If the first word comes before the second word.return 1
: If the second word comes before than the first word 1.
#include <iostream>#include <string.h>using namespace std;int str_length(char str[ ]){int length=0;for(int si=0; str[si]!='\0'; si++)length++;return length;}int str_compare(char w1[ ], char w2[]){int smaller = min(str_length(w1), str_length(w2));for(int i=0;i<=smaller; i++) // look carefully: the loop is running <= hence{ // even if the smaller string consumesit will// keep executing and for the last '\0' character// ascii 0 will be compared with any ascii// character hence -1 will be returned, in case of substringif(w1[i] < w2[i])return -1;if(w1[i] > w2[i])return 1;}return 0; // both the words are equal.}int main(){char A[ ] = "cat";char B[ ] = "cat";char C[ ] = "cattle";char D[ ] = "dog";cout << A << " vs "<<B<<" : "<<str_compare(A, B)<<endl;cout << A << " vs "<<C<<" : "<<str_compare(A, C)<<endl;cout << C << " vs "<<A<<" : "<<str_compare(C, A)<<endl;cout << C << " vs "<<D<<" : "<<str_compare(C, D)<<endl;cout << D << " vs "<<C<<" : "<<str_compare(D, A)<<endl;return 0;}
String reversal
Let’s make a str_reverse()
function in which we’ll simply swap the last index with the first index, the second last index with the second index, and so on.
#include <iostream>#include <string.h>using namespace std;int str_length(char str[ ]){int length=0;for(int si=0; str[si]!='\0'; si++)length++;return length;}void str_reverse(char S[ ]){int si = 0,li = str_length(S)-1; // the last legal characterwhile(si<li){swap(S[si],S[li]);si++;li--;}}int main(){char R[ ] = "This is a cat.";str_reverse(R);cout<<"Reverse: "<<R;return 0;}
The isPalindrome()
function
We have already discussed palindromic numbers in previous lessons. Let’s use the above functions to make this function more elegant. Look at the code below:
#include <iostream>#include <string.h>using namespace std;int str_length(char str[ ]){int length=0;for(int si=0; str[si]!='\0'; si++)length++;return length;}void str_copy(char Destination[ ], char Src[ ]){int di=0;for(int si=0; Src[si]!='\0'; si++, di++){Destination[di] = Src[si];}Destination[di] = '\0';}int str_compare(char w1[ ], char w2[]){int smaller = min(str_length(w1), str_length(w2));for(int i=0;i<=smaller; i++){if(w1[i] < w2[i])return -1;if(w1[i] > w2[i])return 1;}return 0;}void str_reverse(char S[ ]){int si = 0,li = str_length(S)-1;while(si<li)swap(S[si], S[li]), si++, li--;}bool isPalindrome(char S[ ]){char C[100];str_copy(C, S);str_reverse(C);return str_compare(C, S)==0;}int main(){char A[ ] = "refer";if(isPalindrome(A)){cout << A<<" is a palindrome!!!"<<endl;}else{cout << A<<" is not a palindrome!!!"<<endl;}return 0;}
In lines 44–46, we call the three functions we have discussed above.
We call the str_copy()
function to copy the contents of S
into C
. Then, we call the str_reverse()
function to reverse the characters in the C
array. Finally, the str_compare()
function is called to compare the reversed C
array with the original S
array.
If the str_compare()
function returns 0
, it means that the two arrays are identical and the original string is a palindrome. In this case, the function returns true
. If the str_compare()
function returns a non-zero value, it means that the two arrays are not identical, and the original string is not a palindrome. In this case, the function returns false
.