Benchmarking Our Tool

Learn how to determine the performance of our tool.

Before we start thinking about improving the performance of our tools or programs, we first need to determine what the current status is and define a baseline for comparison.

For this exercise, we’ll state that performance means how long the tool takes to process its workload. Perhaps it’s currently good enough, but we don’t know. To determine the current state, we need to measure it.

Time command

In the Linux/Unix world, the quickest way to determine how fast our applications works is by using the time command. The time command executes the application and prints out how long it took to run. For example, to measure how long our tool takes to process data from the two test files in the testdata directory, we run this command:

Press + to interact
time ./colStats -op avg -col 3 testdata/example.csv testdata/example2.csv

The files have been updated for testing purposes.

Press the “Run” button below to start the terminal, and then run the command above:

module usercode/performance/colStats

go 1.16
Benchmarking tool

In this example, it took 0.002 seconds to process those two files. The output line starting with real shows the total elapsed time.

This value doesn’t look bad. In fact, if all we’re planning to do with this tool is process a few small files, then this is good enough, and we don’t need to do anything more. But let’s assume this tool will be used to process performance data coming from hundreds or thousands of files.

Benchmarking

When we’re benchmarking our tools or programs, it’s important to know our workload. Programs behave differently depending on the type of load they’re submitted to. Let’s change our example to process a thousand files at once. The code included with this course has a tarball file containing one thousand CSV files. We copy the file colStatsBenchmarkData.tar.gz to ...