Advanced Processing with awk
Learn how to process text with awk.
We'll cover the following...
The awk
command
The awk
command lets us process text in several ways. We can use it to extract fields in a file, calculate totals, and even change the order of the output of a file. The awk
command is a command-line utility. It’s also a scripting language with loops, variables, and conditionals.
Much like sed
, awk
works slightly differently depending on the version installed on our operating system. On Ubuntu, the awk
command that ships out of the box is missing a few features, so we need to install a more full-featured version before moving on:
$ sudo apt install gawk
Let’s explore awk
by using some data from the 2010 U.S. Census that shows the population by state. We first create the file population.txt
with cat
by running the commands on the terminal below.
cat << 'EOF' > population.txt
State,Population,Reps
Alabama,4802982,7
California,37341989,53
Florida,18900773,27
Hawaii,1366862,2
Illinois,12864380,18
New York,19421055,27
South Dakota,819761,1
Wisconsin,5698230,8
Wyoming,568300,1
EOF
The legislative branch of the United States government has two houses: the Senate and the House of Representatives. The Senate consists of 100 seats, with two members from each of the 50 states. The House of Representatives has 435 members, but the number from each state is based on the state’s population, as recorded by the most recent census. As we can see from the data we have here, California has 53 members of the House of Representatives and is the most populous state. By contrast, Wyoming has only one member.
We’ll use awk
to view, transform, and manipulate this data.
First, let’s look at how to print a specific line of the file. We use this command to print out the line associated with Wisconsin:
$ awk '/Wisconsin/' population.txt
Practice the command on the terminal below.
Useful switches and operations
That isn’t very useful since we already know we could have used grep
to get the same results. However, we can use awk
to manipulate the data further. Let’s try this command, which will print out just the name of the state from that line:
$ awk -F ',' '/Wisconsin/ {print
...