Case Study: Street Addresses
We'll cover the following...
This series of examples was inspired by a real-life problem I had in my day job several years ago, when I needed to scrub and standardize street addresses exported from a legacy system before importing them into a newer system. (See, I don’t just make this stuff up; it’s actually useful.) This example shows how I approached the problem.
s = '100 NORTH MAIN ROAD'print(s.replace('ROAD', 'RD.')) #①#'100 NORTH MAIN RD.'s = '100 NORTH BROAD ROAD'print(s.replace('ROAD', 'RD.')) #②#'100 NORTH BRD. RD.'print(s[:-4] + s[-4:].replace('ROAD', 'RD.')) #③#'100 NORTH BROAD RD.'import re #④print(re.sub('ROAD$', 'RD.', s)) #⑤#'100 NORTH BROAD RD.'
① My goal is to standardize a street address so that 'ROAD'
is always abbreviated as 'RD.'
. At first glance, I thought this was simple enough that I could just use the string method replace()
. After all, all the data was already uppercase, so case mismatches would not be a problem. And the search string, 'ROAD'
, was a constant. And in this deceptively simple example, s.replace()
does indeed work.
② Life, unfortunately, is full of counterexamples, and I quickly discovered this one. The problem here is that 'ROAD'
appears twice in the address, once as part of the street name 'BROAD'
and once as its own word. The replace()
method ...