Advanced Processing with awk

Learn how to process text with awk.

The awk command

The awk command lets us process text in several ways. We can use it to extract fields in a file, calculate totals, and even change the order of the output of a file. The awk command is a command-line utility. It’s also a scripting language with loops, variables, and conditionals.

Much like sed, awk works slightly differently depending on the version installed on our operating system. On Ubuntu, the awk command that ships out of the box is missing a few features, so we need to install a more full-featured version before moving on:

$ sudo apt install gawk

Let’s explore awk by using some data from the 2010 U.S. Census that shows the population by state. We first create the file population.txt with cat by running the commands on the terminal below.

cat << 'EOF' > population.txt
State,Population,Reps
Alabama,4802982,7
California,37341989,53
Florida,18900773,27
Hawaii,1366862,2
Illinois,12864380,18
New York,19421055,27
South Dakota,819761,1
Wisconsin,5698230,8
Wyoming,568300,1
EOF
Terminal 1
Terminal
Loading...

The legislative branch of the United States government has two houses: the Senate and the House of Representatives. The Senate consists of 100 seats, with two members from each of the 50 states. The House of Representatives has 435 members, but the number from each state is based on the state’s population, as recorded by the most recent census. As we can see from the data we have here, California has 53 members of the House of Representatives and is the most populous state. By contrast, Wyoming has only one member.

We’ll use awk to view, transform, and manipulate this data.

First, let’s look at how to print a specific line of the file. We use this command to print out the line associated with Wisconsin:

$ awk '/Wisconsin/' population.txt

Practice the command on the terminal below.

Terminal 1
Terminal
Loading...

Useful switches and operations

That isn’t very useful since we already know we could have used grep to get the same results. However, we can use awk to manipulate the data further. Let’s try this command, which will print out just the name of the state from that line:

$ awk -F ',' '/Wisconsin/ {print
...