Solution Review: Find the Highest Achiever

In this review, we give a detailed analysis of the solution to the problem of finding the highest achiever.

Solution: Merging Data Frames

Press + to interact
main.r
math.csv
english.csv
science.csv
findTopper <- function()
{
findIndexWithMaxNum <- function(myVector) # helper function that returns the index
# of the element that has the highest value
{
maxNumber = -Inf # We want this to be the lowest possible value for comparison
maxIndex = 0
index = 1
for(i in myVector)
{
if(maxNumber < i)
{
maxNumber = i # set the max element
maxIndex = index # set the max element's index
}
index = index + 1
}
return(maxIndex) # return the index of the max element
}
# MAIN FILE HANDLING CODE
mathData = read.csv("math.csv") # fetch data from math.csv
englishData = read.csv("english.csv") # fetch data from english.csv
scienceData = read.csv("science.csv") # fetch data from science.csv
tempData <- merge(mathData, englishData) # we use the merge function on data frames
finalData <- merge(tempData, scienceData) # another merge function to merge the remaining data frame
print(finalData)
result <- vector("numeric", 0) # vector to store the total marks of each student
for(student in 1:length(finalData)) # loop over all the rows/students
{
temp <- 0.0 # temporarily stores the total marks of the current student
for(i in 2:ncol(finalData)){ # loop over all the columns (math, english, science)
# We iterate from 2 to ncol(finalData) because the 1st column is just names of students
temp <- temp + as.double(finalData[student, i]) # fetch respective student's marks
}
result <- c(result, temp) # store the total marks of the current student
}
return(findIndexWithMaxNum(result)) # return the index of the highest scoring student
}
# Driver Code
findTopper()

Explanation

The code starts executing from line number 47 when the function findTopper() is executed.

This function starts executing from line number 22 (main file handling code)

Steps Performed:

  • Line number 23-25: Read all the subject files in the variables: mathData, englishData and scienceData respectively. These will also act as the data frames.

Remember, data fetched from a .csv is already in the form of a data frame.

  • Line number 27-28: Merge the three data frames into one data frame. In the code snippet above, we have broken merging of the three data frames into two steps. First, merge mathData and englishData and save in tempData. Then merge tempData and scienceData in finalData.

  • Now that we have all the data compiled in one data frame finalData, we can begin performing our analysis on it.

  • Line number 32-43: We use nested for loop to iterate over the whole data frame. The outer loop:

for(student in 1:length(finalData))

keeps track of the rows/students. Since the value of length(finalData) is 44 we are basically executing loop from student 11 to student 44.

The inner loop:

for(i in 2:ncol(finalData))

iterates over all the columns (math, english, science). Notice, we iterate from column 22 to ncol(finalData) because the 1st1st column is just names of students. Then we add the marks of all subjects of each student. The loop can be illustrated as follows:

The findIndexWithMaxNum() returns the index of the largest/maximum number in an array. We have created this as a helper function to find the index of the maximum element in the result vector.


In the next chapter, we discuss installing and loading packages in R language.