Solution Review: Find the Highest Achiever
In this review, we give a detailed analysis of the solution to the problem of finding the highest achiever.
We'll cover the following
Solution: Merging Data Frames
findTopper <- function(){findIndexWithMaxNum <- function(myVector) # helper function that returns the index# of the element that has the highest value{maxNumber = -Inf # We want this to be the lowest possible value for comparisonmaxIndex = 0index = 1for(i in myVector){if(maxNumber < i){maxNumber = i # set the max elementmaxIndex = index # set the max element's index}index = index + 1}return(maxIndex) # return the index of the max element}# MAIN FILE HANDLING CODEmathData = read.csv("math.csv") # fetch data from math.csvenglishData = read.csv("english.csv") # fetch data from english.csvscienceData = read.csv("science.csv") # fetch data from science.csvtempData <- merge(mathData, englishData) # we use the merge function on data framesfinalData <- merge(tempData, scienceData) # another merge function to merge the remaining data frameprint(finalData)result <- vector("numeric", 0) # vector to store the total marks of each studentfor(student in 1:length(finalData)) # loop over all the rows/students{temp <- 0.0 # temporarily stores the total marks of the current studentfor(i in 2:ncol(finalData)){ # loop over all the columns (math, english, science)# We iterate from 2 to ncol(finalData) because the 1st column is just names of studentstemp <- temp + as.double(finalData[student, i]) # fetch respective student's marks}result <- c(result, temp) # store the total marks of the current student}return(findIndexWithMaxNum(result)) # return the index of the highest scoring student}# Driver CodefindTopper()
Explanation
The code starts executing from line number 47 when the function findTopper()
is executed.
This function starts executing from line number 22 (main file handling code)
Steps Performed:
- Line number 23-25: Read all the subject files in the variables:
mathData
,englishData
andscienceData
respectively. These will also act as the data frames.
Remember, data fetched from a
.csv
is already in the form of a data frame.
-
Line number 27-28: Merge the three data frames into one data frame. In the code snippet above, we have broken merging of the three data frames into two steps. First, merge
mathData
andenglishData
and save intempData
. Then mergetempData
andscienceData
infinalData
. -
Now that we have all the data compiled in one data frame
finalData
, we can begin performing our analysis on it. -
Line number 32-43: We use nested
for
loop to iterate over the whole data frame. The outer loop:
for(student in 1:length(finalData))
keeps track of the rows/students. Since the value of length(finalData)
is we are basically executing loop from student to student .
The inner loop:
for(i in 2:ncol(finalData))
iterates over all the columns (math, english, science). Notice, we iterate from column to ncol(finalData)
because the column is just names of students.
Then we add the marks of all subjects of each student. The loop can be illustrated as follows:
The findIndexWithMaxNum()
returns the index of the largest/maximum number in an array. We have created this as a helper function to find the index of the maximum element in the result
vector.
In the next chapter, we discuss installing and loading packages in R language.