More Bad Input
Now that the from_roman()
function works properly with good input, it’s time to fit in the last piece of the puzzle: making it work properly with bad input. That means finding a way to look at a string and determine if it’s a valid Roman numeral. This is inherently more difficult than validating numeric input in the to_roman()
function, but you have a powerful tool at your disposal: regular expressions. (If you’re not familiar with regular expressions, now would be a good time to read the regular expressions chapter.)
As you saw in Case Study: Roman Numerals, there are several simple rules for constructing a Roman numeral, using the letters M
, D
, C
, L
, X
, V
, and I
. Let’s review the rules:
- Sometimes characters are additive.
I
is1
,II
is2
, andIII
is3
.VI
is 6 (literally, “5 and 1”),VII
is7
, andVIII
is8
. - The tens characters (
I
,X
,C
, andM
) can be repeated up to three times. At4
, you need to subtract from the next highest fives character. You can’t represent4
asIIII
; instead, it is represented asIV
(“1
less than5
”).40
is written asXL
(“10
less than50
”),41
asXLI
,42
asXLII
,43
asXLIII
, and then44
asXLIV
(“10
less than50
, then1
less than5
”). - Sometimes characters are… the opposite of additive. By putting certain characters before others, you subtract from the final value. For example, at
9
, you need to subtract from the next highest tens character:8
isVIII
, but9
isIX
(“1 less than 10”), notVIIII
(since the I character can not be repeated four times).90
isXC
,900
isCM
. - The fives characters can not be repeated.
10
is always represented asX
, never asVV
.100
is alwaysC
, neverLL
. - Roman numerals are read left to right, so the order of characters matters very much.
DC
is600
;CD
is a completely different number (400
, “100
less than500
”).CI
is101
;IC
is not even a valid Roman numeral (because you can’t subtract1
directly from100
; you would need to write it asXCIX
, “10
less than100
, then 1 less than10
”).
Thus, one useful test would be to ensure that the from_roman()
function should fail when you pass it a string with too many repeated numerals. How many is “too many” depends on the numeral.
Get hands-on with 1400+ tech skills courses.