Benchmarking Our Tool
Learn how to determine the performance of our tool.
We'll cover the following...
Before we start thinking about improving the performance of our tools or programs, we first need to determine what the current status is and define a baseline for comparison.
For this exercise, we’ll state that performance means how long the tool takes to process its workload. Perhaps it’s currently good enough, but we don’t know. To determine the current state, we need to measure it.
Time command
In the Linux/Unix world, the quickest way to determine how fast our applications works is by using the time
command. The time
command executes the application and prints out how long it took to run. For example, to measure how long our tool takes to process data from the two test files in the testdata
directory, we run this command:
time ./colStats -op avg -col 3 testdata/example.csv testdata/example2.csv
The files have been updated for testing purposes.
Press the “Run” button below to start the terminal, and then run the command above:
module usercode/performance/colStats go 1.16
In this example, it took 0.002 seconds to process those two files. The output
line starting with real
shows the total elapsed time.
This value doesn’t look bad. In fact, if all we’re planning to do with this tool is process a few small files, then this is good enough, and we don’t need to do anything more. But let’s assume this tool will be used to process performance data coming from hundreds or thousands of files.
Benchmarking
When we’re benchmarking our tools or programs, it’s important to know our workload. Programs behave differently depending on the type of load they’re submitted to. Let’s change our example to process a thousand files at
once. The code included with this course has a tarball file containing one thousand CSV files. We copy the file colStatsBenchmarkData.tar.gz
to ...