Imagine you’re developing a word game application where users can input words, and the application needs to determine if a given string exists within the collection of entered words. Additionally, users may employ wildcard characters (‘.’) to represent any letter, allowing for flexible pattern matching.
Example:
Suppose a player adds words like “apple,” “banana,” and “cherry” to their word dictionary. During gameplay, the player enters the pattern “b..ana”, indicating a desire to find words similar to “banana” but with unspecified letters at certain positions. The word dictionary efficiently identifies “banana” as a match, allowing the player to score points and progress.
You are tasked with creating a specialized data structure, WordDictionary
, to manage a collection of words efficiently. This structure should enable the addition of new words and the subsequent matching of input strings against the stored words. Each word may contain letters and dots (.
), where dots can represent any single letter.
Your task is to implement the WordDictionary
class, which should provide the following functionalities:
WordDictionary()
: This initializes a new instance of the data structure.
void addWord(word)
: Adds the specified word to the collection within the data structure, enabling future matching against it.
bool search(word)
: Determines whether any word within the data structure matches the input string word
. Matching involves comparing each word
character against the stored words, where dots in a word
can match any letter.
Example usage
word_dict = WordDictionary()word_dict.addWord("bad")word_dict.addWord("dad")word_dict.addWord("mad")print(word_dict.search("pad")) # Output: Falseprint(word_dict.search("bad")) # Output: Trueprint(word_dict.search(".ad")) # Output: Trueprint(word_dict.search("b..")) # Output: True
Test your understanding of the concepts above by taking the quiz below!
Choose the correct answer.
What type of search algorithm is typically used to find matches in the word dictionary?
Linear search
Breadth-first search
Depth-first search
To solve this problem we will use the concept of a trie. A trie, also known as a prefix tree, is a tree-like data structure that stores a dynamic set of strings to allow for efficient prefix-based searching. Each node in the trie represents a single character of the string.
In the trie given above:
Each level represents a character in the word.
The path from the root to a node represents a word.
Nodes with isEndOfWord
set to True
indicate the end of a valid word.
Note: To learn about tries, refer to the What is a prefix tree? Answer.
The WordDictionary
class utilizes a trie structure. Each node of the trie consists of:
children
: This is an array to store references to child nodes, representing all possible characters (‘a’ to ‘z’).
isEndOfWord
: This is a boolean flag indicating whether the node represents the end of a valid word.
Initialization:
Initialize the trie structure in the constructor of the WordDictionary
class.
Adding a word:
To add a word to the dictionary:
Iterate through each character of the word.
If the current character’s corresponding child node does not exist, create a new node and link it.
Set the isEndOfWord
flag for the last character to mark the end of a word.
Searching for a word:
To search for a word in the dictionary:
Perform a Breadth-first search (BFS) traversal through the trie.
Use a deque to store tuples of (current node, index)
pairs.
For each character in the word:
If it’s a wildcard ‘.
’, explore all possible child nodes.
If it’s a specific character, follow the corresponding child node.
If the search reaches the end of the word and the isEndOfWord
flag is set, return True
.
If no valid word is found, return False
.
Let’s implement our algorithm in Python!
from collections import dequeclass WordDictionary(object):def __init__(self):self.children = [None] * 26self.isEndOfWord = Falsedef addWord(self, word):curr = selffor char in word:c = ord(char) - ord('a')if curr.children[c] is None:curr.children[c] = WordDictionary()curr = curr.children[c]curr.isEndOfWord = Truedef search(self, word):curr = selfq = deque([(curr, 0)])wsize = len(word)while q:curr, idx = q.popleft()if idx == wsize:if curr.isEndOfWord:return Truecontinuec = word[idx]if c == '.':for child in curr.children:if child is not None:q.append((child, idx + 1))else:curr = curr.children[ord(c) - ord('a')]if curr is not None:q.append((curr, idx + 1))return False# Test caseobj = WordDictionary()obj.addWord("bad")obj.addWord("dad")obj.addWord("mad")print(obj.search("pad")) # Falseprint(obj.search("bad")) # Trueprint(obj.search(".ad")) # Trueprint(obj.search("b..")) # True
The step-by-step explanation is as follows:
Lines 4–6: We are defining the initializing function for the WordDictionary
object, where our class contains two attributes; self.children
which is an array of size 26, initialized with None
, to represent child nodes for each character from ‘a’ to ‘z’ and self.isEndOfWord
which is a boolean flag initialized as False
, indicating whether the current node represents the end of a valid word.
Line 8: We are defining a method addWord
that takes a parameter word
to add that word to the dictionary.
Lines 10–11: We iterate through each character (char
) in the input word and calculate the index c
in the children
array based on the ASCII value of the character.
Lines 12–14: We check if the child node for the current character doesn’t exist, and if it doesn’t, then we create a new WordDictionary
node, link it, and move to the next child node.
Line 15: We set the isEndOfWord
flag for the last character of the word to mark the end of a valid word.
Lines 17: We define a method search
with a parameter word
to search for that word in the dictionary.
Lines 18: We initialize a variable curr
to reference the current instance of the WordDictionary
class. This will be used to traverse the trie.
Line 19: We initialize a deque q
with a single tuple (curr, 0)
. This tuple contains the current node (curr
) and the index 0
, indicating the starting point of the word.
Line 20: We calculate the length of the input word and assign it to the variable wsize
. This will determine when the entire word has been processed during the search.
Lines 21: We initiate a while loop that continues as long as the deque q
is not empty, i.e., as long as there are nodes to explore in the trie.
Line 22: We dequeue a tuple (curr, idx)
from the left side of the deque q
. curr
represents the current node being processed and idx
represents the index of the character in the word being examined.
Lines 23–26: We check if the index idx
is equal to the length of the word wsize
. If it is, then we check if the current node curr
represents the end of a valid word (curr.isEndOfWord
). If it does, it returns True
. Otherwise, we continue to the next iteration of the while loop to process the next node.
Lines 27–31: We retrieve the character at the index idx
from the input word word
and assign it to the variable c
. Next, we check if the character c
is a wildcard (‘.’). If it is, then we iterate through all children of the current node curr
. For each non-null child, we enqueue a tuple (child, idx + 1)
into the deque q
, indicating the child node and the following index in the word.
Lines 32–36: After checking that c
is not a wildcard, we update the current node curr
to be the child node corresponding to the character c
and enqueue a tuple (curr, idx + 1)
into the deque q
to explore the next character in the word. Lastly, if no valid word is found, we return False
.
The time complexity for adding a word to the WordDictionary is
The time complexity for searching a word in the WordDictionary is
The space complexity for the WordDictionary class is
Free Resources