Handling Diacritics
Let’s look at how we’ll solve the problem of handling diacritics.
We'll cover the following
The problem at hand
If we play a few games in Spanish or French, we’ll notice that the letters “e” and “'e” are different. The general problem is how to handle diacritics, including accents, umlauts, and so on. In the current version of the game, when the secret word includes a diacritic, guessing the word spelled with “e” instead of "'e"counts as a bad guess. Is that the desired behavior?
Imagine that we’ve designed this web application and are now halfway through developing it, and realize that there are issues appearing across languages. This is a good thought exercise because surprises will inevitably arise in the development of any new and moderately complex application.
Possible solutions
First, we should consider what is affected by this new information. The database already stores accents without issue. The API needs to know how to match letters, and whether or not it should require that the accents match as well. If the letters with and without accents are different, then the client needs to know what all of the possibilities are so that it can provide a button for each one. The problem seems to have at least three possible solutions:
- Only allow secret words that have no accents.
- Treat each diacritical as forming a new letter.
- Ignore diacritics when determining if a guess is correct.
In a real-world situation, the question would come down to what the users prefer. In a hobby project such as this, it is up to the hobbyist.
-
Option 1: This would complicate the example generation but might be a good solution if diacritics are uncommon, or if we’re most interested in the simplicity of the solution.
-
Option 2: This is the simplest solution for the database and API since it is what is already coded. However, the client will be more complicated since the number and set of buttons will vary from language to language. Experience suggests that implementing the client will generally already be the most complex task, so further complicating it may not be desirable.
-
Option 3: This follows what many people would expect when playing hangman. On the other hand, it raises the question of how to handle it. A quick web search suggests that the Python module
unidecode
will do the trick. There’s no right or wrong answer, only answers that are more appropriate to different goals. In this case, we’ll follow option 3.
Our implementation
To ignore diacritics, we install the unidecode
module using the following command:
pipenv install unidecode
Then we import the function unidecode
from the module of the same name, that is, from unidecode import unidecode
. When the letters are compared, we compare them after the function is applied, but when they are displayed, we display their original form.
This is accomplished by removing:
Get hands-on with 1200+ tech skills courses.