Case study: Parsing Phone Numbers
So far you’ve concentrated on matching whole patterns. Either the pattern matches, or it doesn’t. But regular expressions are much more powerful than that. When a regular expression does match, you can pick out specific pieces of it. You can find out what matched where.
This example came from another real-world problem I encountered, again from a previous day job. The problem: parsing an American phone number. The client wanted to be able to enter the number free-form (in a single field), but then wanted to store the area code, trunk, number, and optionally an extension separately in the company’s database. I scoured the Web and found many examples of regular expressions that purported to do this, but none of them were permissive enough.
\d matches any numeric digit (0–9). \D matches anything but digits.
Here are the phone numbers I needed to be able to accept:
- 800-555-1212
- 800 555 1212
- 800.555.1212
- (800) 555-1212
- 1-800-555-1212
- 800-555-1212-1234
- 800-555-1212x1234
- 800-555-1212 ext. 1234
- work 1-(800) 555.1212 #1234
Quite a variety! In each of these cases, I need to know that the area code was 800
, the trunk was 555
, and the rest of the phone number was 1212
. For those with an extension, I need to know that the extension was 1234
.
Let’s work through developing a solution for phone number parsing. This example shows the first step.
Get hands-on with 1400+ tech skills courses.