Home/Blog/Interview Prep/Ace the Google Interview: Must-Know Python Coding Problems

Ace the Google Interview: Must-Know Python Coding Problems

33 min read

Jun 14, 2024

content

Pattern 1: Hash Maps

Logger Rate Limiter

Solution

Pattern 2: Merge Intervals

Employee Free Time

Solution

Meeting Rooms II

Solution

Pattern 3: Knowing What to Track

Two Sum

Solution

Pattern 4: Custom Data Structures

Implement LRU Cache

Solution

Pattern 5: Sliding Window

Longest Repeating Character Replacement

Solution

Pattern 6: Modified Binary Search

Random Pick with Weight

Solution

Pattern 7: Two Pointers

Trapping Rain Water

Solution

Pattern 8: Graphs

Pacific Atlantic Water Flow

Solution

Pattern 9: Dynamic Programming

Longest Palindromic Substring

Solution

Google looks for software engineers who want to grow and evolve with their fast-paced business. If you're eager to embrace new problems, you can find many opportunities to switch teams, lead projects, and pursue your career goals at Google.

Google coding interviews are crucial for screening candidates who will thrive in the long term. This allows interviewers to evaluate your technical skills and ability to think through complex problems.

The exact setup of your interviews can vary across teams. However, software engineering candidates can expect to write code several times during the interview. You'll need to be ready to solve brand-new coding problems in real time.

So, how can you prepare to tackle coding problems in your Google interview?

Fortunately, you can solve most of Google's coding questions by identifying the underlying patterns. These will help you break down problems into more manageable parts. From there, you can apply strategies and algorithms that help you reach a solution quickly.

To streamline your interview prep, we’ve identified 9 patterns that you're most likely to see in Google coding problems.

We'll review each pattern in depth and share examples of common coding problems that use it. Let's get started!

Pattern 1: Hash Maps#

The Hash Maps pattern is a tool for storing and retrieving key-value pairs. On average, it offers constant-time complexity for insertion, deletion, and lookup operations. Quicker lookup time makes hash maps ideal for tasks like caching, indexing, or frequency counting.

The core operations of hash maps are the following:

1. Insert: A key-value pair is added, with the hash function determining the index for storage. While typically quick ( $O(1)$ on average), this can degrade to $O(n)$ if many keys collide at the same index.

2. Search: Values are retrieved by applying the hash function to compute the key’s index, typically a quick operation ( $O(1)$ ), though it can degrade to $O(n)$ when handling collisions or during resizing.

3. Remove: Values are removed by deleting the entry at the key’s computed index. Usually quick ( $O(1)$ ), the process can slow down to $O(n)$ in certain scenarios like collisions or resizing.

Using hash maps significantly reduces lookup times to an average of $O(1)$ compared to managing separate arrays for related data. This efficiency makes hash maps a useful tool for optimizing performance in algorithmic problems. Common interview questions where the Hash Maps pattern demonstrates effectiveness include Logger Rate Limiter, Next Greater Element, Isomorphic Strings, and Longest Palindrome.

Let’s see how the following example illustrates the application of the Hash Maps pattern to efficiently solve the given coding problem:

Logger Rate Limiter#

For the given stream of message requests and their timestamps as input, you must implement a logger rate limiter system that decides whether the current message request is displayed. The decision depends on whether the same message has already been displayed in the last $S$ seconds. If yes, then the decision is FALSE, as this message is considered a duplicate. Otherwise, the decision is TRUE.

Note: Though received at different timestamps, several message requests may carry identical messages.

Solution#

We need to know if a message already exists and keep track of its time limit. For problems where two associated values need to be checked, we can use a hash map.

We can use all incoming messages as keys and their respective time limits as values. This will help us eliminate duplicates while respecting the time limit of $S$ seconds.

Here is how we’ll implement our algorithm using hash maps:

Initialize a hash map.
When a request arrives, check if it’s a new request (the message is not among the keys stored in the hash map) or a repeated request (an entry for this message already exists in the hash map). If it’s a new request, accept it and add it to the hash map.
If it’s a repeated request, compare the difference between the incoming request's timestamp and the previous request’s timestamp with the same message. If this difference is greater than the time limit, $S$ , accept it and update the timestamp for that specific message in the hash map. Otherwise, reject it.

Let’s look at the code for this solution below:

class RequestLogger:
    # Initailization of requests hash map
    def __init__(self, time_limit):
        self.requests = {}
        self.limit = time_limit
    # Function to accept and deny message requests
    def message_request_decision(self, timestamp, request):
        # Check whether the specific request exists in
        # the hash map or not if it exists, check whether its
        # time duration lies within the defined timestamp
        if request not in self.requests or timestamp - self.requests[request] >= self.limit:
            # Store this new request in the hash map and return the true
            self.requests[request] = timestamp
            return True
        else:
            # The request already exists within the timestamp
            # and is identical, the request should
            # be rejected, return false
            return False
# Driver code
def main():
    # Here we will set the time limit to 7
    new_requests = RequestLogger(7)
    times = [1, 5, 6, 7, 15]
    messages = ["good morning",
                "hello world",
                "good morning",
                "good morning",
                "hello world"]
    # Loop to execute over the input message requests
    for i in range(len(messages)):
        print(i + 1, ".\t Time, Message: {",
              times[i], ", '", messages[i], "'}", sep="")
        print("\t Message request decision: ",
              new_requests.message_request_decision(
                times[i], messages[i]), sep="")
        print("-" * 100)
if __name__ == '__main__':
    main()

Logger Rate Limiter

With our understanding of Hash Maps established, let's discuss the next coding pattern.

Pattern 2: Merge Intervals#

The Merge Intervals pattern is a powerful coding technique for problems involving meeting times or intervals of some nature. This technique is particularly useful when we need to deal with a set of intervals and perform operations such as merging overlapping intervals or determining their intersections.

In this technique, we typically start by sorting the given intervals based on their start or end times, which helps identify the overlapping intervals efficiently. Once we have this interval information, we can swiftly perform the tasks based on the problem's requirements. The Merge Intervals pattern has many applications in multiple scenarios, including scheduling algorithms, resource allocation problems, and calendar management systems. From analyzing time-based data to consolidating meeting schedules, this coding technique offers an elegant solution for handling interval-related operations effectively.

Let’s see how the following examples illustrate the application of the Merge Intervals pattern to efficiently solve these problems:

Employee Free Time#

We're given a list containing the schedules of multiple employees. Each person's schedule is a list of non-overlapping intervals in sorted order. An interval is specified with the start and end times, both positive integers. Find the list of finite intervals representing the free time for all the employees.

Solution#

The intuition behind this solution involves merging the individual schedules of all employees into a unified timeline. By doing this, we can identify the common free time intervals where none of the employees are occupied. The key idea is to find the gaps or intervals between the time slots of these merged schedules.

We use the following variables in our solution:

previous: Stores the end time of the previously processed interval.
i: Stores the employee’s index value.
j: Stores the interval’s index of the employee, i.
result: Stores the free time intervals.

The steps of the algorithm are given below:

We store the start time of each employee’s first interval along with its index value and a value $0$ into a min-heap.
We set previous to the start time of the first interval present in a heap.
Then we iterate a loop until the heap is empty, and in each iteration, we do the following:
- Pop an element from the min-heap and set i and j to the second and third values, respectively, from the popped value.
- Select the interval from input located at i,j.
- If the selected interval’s start time is greater than previous, it means that the time from previous to the selected interval’s start time is free. So, add this interval to the result array.
- Now, update the previous as $max(previous,end \;time \;of \;selected\;interval)$ .
- If the current employee has any other interval, push it into the heap.
After all the iterations, when the heap becomes empty, return the result array.

Let’s look at the code for this solution below:

main.py

interval.py

from interval import Interval
import heapq
def employee_free_time(schedule):
    heap = []
    # Iterate for all employees' schedules
    # and add start of each schedule's first interval along with
    # its index value and a value 0.
    for i in range(len(schedule)):
        heap.append((schedule[i][0].start, i, 0))
    
    # Create heap from array elements.
    heapq.heapify(heap)
    # Take an empty array to store results.
    result = []
    # Set 'previous' to the start time of first interval in heap.
    previous = schedule[heap[0][1]][heap[0][2]].start
    # Iterate till heap is empty
    while heap:
        # Pop an element from heap and set value of i and j
        _, i, j = heapq.heappop(heap)
        
        # Select an interval
        interval = schedule[i][j]
        
        # If selected interval's start value is greater than the
        # previous value, it means that this interval is free.
        # So, add this interval (previous, interval's end value) into result.
        if interval.start > previous:
            result.append(Interval(previous, interval.start))
        
        # Update the previous as maximum of previous and interval's end value.
        previous = max(previous, interval.end)
        # If there is another interval in current employees' schedule,
        # push that into heap.
        if j + 1 < len(schedule[i]):
            heapq.heappush(heap, (schedule[i][j+1].start, i, j+1))
    
    # When the heap is empty, return result.
    return result
# Function for displaying interval list
def display(vec):
    string = "["
    if vec:
        for i in range(len(vec)):
            string += str(vec[i])
            if i + 1 < len(vec):
                string += ", "
    string += "]"
    return string
# Driver code
def main():
    inputs = [
        [[Interval(1, 2), Interval(5, 6)], [Interval(1, 3)], [Interval(4, 10)]],
        [[Interval(1, 3), Interval(6, 7)], [Interval(2, 4)], [Interval(2, 5), Interval(9, 12)]],
        [[Interval(2, 3), Interval(7, 9)], [Interval(1, 4), Interval(6, 7)]],
        [[Interval(3, 5), Interval(8, 10)], [Interval(4, 6), Interval(9, 12)], [Interval(5, 6), Interval(8, 10)]],
        [[Interval(1, 3), Interval(6, 9), Interval(10, 11)], [Interval(3, 4), Interval(7, 12)], [Interval(1, 3), Interval(7, 10)], [Interval(1, 4)], [Interval(7, 10), Interval(11, 12)]],
        [[Interval(1, 2), Interval(3, 4), Interval(5, 6), Interval(7, 8)], [Interval(2, 3), Interval(4, 5), Interval(6, 8)]],
        [[Interval(1, 2), Interval(3, 4), Interval(5, 6), Interval(7, 8), Interval(9, 10), Interval(11, 12)], [Interval(1, 2), Interval(3, 4), Interval(5, 6), Interval(7, 8), Interval(9, 10), Interval(11, 12)], [Interval(1, 2), Interval(3, 4), Interval(5, 6), Interval(7, 8), Interval(9, 10), Interval(11, 12)], [Interval(1, 2), Interval(3, 4), Interval(5, 6), Interval(7, 8), Interval(9, 10), Interval(11, 12)]]
    ]
    i = 1
    for schedule in inputs:
        print(i, '.\tEmployee Schedules:', sep="")
        for s in schedule:
            print("\t\t", display(s), sep="")
        print("\tEmployees' free time", display(employee_free_time(schedule)))
        print('-'*100)
        i += 1
if __name__ == "__main__":
    main()

Employee Free Time

Now, let's look at another problem that can be solved using the Merge Intervals pattern.

Meeting Rooms II#

Given an input array of meeting time intervals, intervals, where each interval has a start time and an end time, find the minimum number of meeting rooms required to hold these meetings.

An important thing to note here is that the specified end time for each meeting is exclusive.

Solution#

The optimized approach to solve this problem is to use the Merge Intervals technique. In this approach, we sort the given meetings by their start time and keep track of the end times of each meeting. We do this by initializing a min-heap and adding the end time of the first meeting to the heap. The heap data structure enables us to efficiently keep track of the meeting with the earliest end time, thereby providing insight into the availability of meeting rooms.

Then, for each subsequent meeting, we check if the room occupied by the earliest ending meeting, the minimum element of the heap, is free at the time of the current meeting. If the meeting with the earliest end time has already ended before the start of the current meeting, then we can use the same meeting room again for the current meeting. We remove the meeting with the earliest end time from the heap and add the end time of the current meeting to the heap. If the earliest ending meeting has not ended by the start of the current meeting, then we know that we have to allocate a new room for the current meeting therefore, we add its end time to the heap.

After processing all the meeting intervals, the size of the heap will be equal to the number of meeting rooms allocated. This will be the minimum number of rooms needed to accommodate all the meetings.

Let’s look at the code for this solution:

import heapq
def find_sets(intervals):
    if not intervals:
        return 0
    # Meetings are sorted according to their start time
    intervals.sort(key=lambda x: x[0])
    # Initialize a new heap and add the ending time of the first meeting to the heap
    end_times_heap = []
    heapq.heappush(end_times_heap, intervals[0][1])
    
    for i in range(1, len(intervals)):
        # Check if the minimum element of the heap (i.e., the earliest ending meeting) is free
        if intervals[i][0] >= end_times_heap[0]:
            # If the room is free, extract the earliest ending meeting and add the ending time of the current meeting
            heapq.heappop(end_times_heap)
        # Add the ending time of the current meeting to the heap
        heapq.heappush(end_times_heap, intervals[i][1])
    
    # The size of the heap tells us the number of rooms allocated
    return len(end_times_heap)
# Driver code
def main():
    schedule_meetings = [[[0,10], [2,10], [11,30]], 
                        [[3,7], [2,12], [10,20], [8,24]],
                        [[1,9], [5,8], [4,14], [3,10], [11,25]],
                        [[1,4], [3,8], [8,11], [3,17], [9,15], [16,18]],
                        [[4,12], [5,11], [4,9], [2,12], [9,22]],]
    
    for i in range(len(schedule_meetings)):
        print(str(i+1),".\tScheduled meetings:", schedule_meetings[i])
        print("\tRooms required:", find_sets(schedule_meetings[i]))
        print("-"*100)
if __name__ == '__main__':
    main()

Meeting Rooms II

Now that we've covered the Merge Intervals, let's move on to another frequently asked coding pattern.

Pattern 3: Knowing What to Track#

The Knowing What to Track pattern is a strategy for efficiently solving problems by tracking certain properties of the input elements. An example of such a property is the frequency of the occurrence of elements in an array or a string. Tracking such a property can often help derive efficient solutions. This pattern operates in two main phases: the tracking phase and the utilization phase. During the tracking phase, we iterate through the dataset and tally the frequency of each element using appropriate data structures like hash maps or arrays. Once frequencies are calculated, we transition to the utilization phase, where we apply this frequency information to solve specific problems. These problems often involve finding the most frequent element, identifying unique occurrences, or detecting patterns within the dataset. To determine if a problem can be solved using this pattern, look for scenarios where frequency tracking or pattern recognition is essential. The famous interview problems that can be solved with this pattern include Palindrome Permutation, Valid Anagram, Design Tic-Tac-Toe, and Group Anagrams.

Let’s take a closer look at how the following coding problem can be efficiently solved with the Knowing What to Track pattern:

Two Sum#

Given an array of integers, arr, and a target, t, identify and return the two indexes of the two elements that add up to the target t. Moreover, the same index can’t be used twice, and there will be only one solution.

Note: We will assume that the array is zero-indexed and the output order doesn’t matter.

Solution#

We will use a hash map to solve the two-sum problem because it allows us to perform lookups in a constant time, enabling us to quickly check if the difference between the target value and each value of the array already exists in the hash map. If the difference exists in the hash map, we have found the two numbers that add up to the target value, and we can return their indexes. If not, we add the current number and index to the hash map and continue iterating through the input array.

We will first create an empty hash map to store the numbers and their indexes to implement this algorithm. Then, we will iterate over the input array, and for each number in the array, we will calculate its difference (the difference between the target value and the number). Next, we will check if the difference exists in the hash map as a key. If it does, we will retrieve the value of the difference from the hash map, which is the index of the difference value in the array and return it with the current index as the solution to the problem. If the difference is not in the hash map, we will add the current number as a key and its index i as a value to the hash map. We will continue iterating through the input array until we find a pair of numbers adding to the target value. Once we find such a pair, we will return their indexes.

Let’s look at the code for this solution below:

def two_sum(arr, t):
    # Create an empty hash map to store numbers and their indices
    hashmap = {}
    
    # Iterating over the array of numbers
    for i in range(len(arr)):
        
		# Calculating the difference between the current and target number
        difference = t - arr[i]
        
		# Checking if the difference already exists in the hash map
        if difference in hashmap:
            
			# Returning the indices of the two numbers that add up to the target
            return [i, hashmap[difference]]
        
		# Adding the current number and its index to the hash map
        hashmap[arr[i]] = i
# Driver code
def main():
    inputs = [
                [1, 10, 8, 4, 9],
                [5, 12, 15, 21, 6, 17],
                [2, 4, 6, 8, 10, 19],
                [-4, -8, 0, -7, -3, -10],
                [49, 17, 15, 22, -45, 29, 18, -15, 11, 37, 12, -52]
             ]
    
    targets = [17, 33, 21, -15, 0]
    for i in range(len(targets)):
        print(i + 1, ". Input array = ", inputs[i], sep="")
        print("   Target = ", targets[i], sep="")
        print("   Indices of two numbers = ", two_sum(inputs[i], targets[i]), sep="")
        print("-" * 100)
if __name__ == "__main__":
    main()

Two Sum

Now that we've discussed Knowing What to Track, let's focus on another important coding pattern.

Pattern 4: Custom Data Structures#

Custom data structures are essentially modified versions of existing data structures tailored to address specific needs. We often must go beyond standard data structures like arrays and hash tables to tackle unique challenges more effectively. For instance, a web crawler that processes numerous pages and URLs might use a specialized "URL queue" to manage these URLs efficiently, ensuring they are unique and prioritized based on relevance. Custom data structures involve creating custom classes that encapsulate the necessary functionality and properties to efficiently manage and manipulate the data. By designing data structures optimized for the problem domain, we can improve the performance and readability of our code while simplifying complex operations. To determine if a problem can benefit from the Custom Data Structures pattern, consider scenarios where standard data structures like arrays, lists, or maps are not sufficient or where specialized operations need to be performed frequently. Common problems suitable for this pattern include implementing priority queues, disjoint-set data structures, or specialized graph representations.

Let’s see how the following example illustrates the application of the Custom Data Structures pattern to efficiently solve the given coding problem:

Implement LRU Cache#

Implement an LRU cache class with the following functions:

Init(capacity): Initializes an LRU cache with the capacity size.
Set(key, value): Adds a new key-value pair or updates an existing key with a new value.
Get(key): Returns the value of the key, or −1 if the key does not exist.

If the number of keys has reached the cache capacity, evict the least recently used key and add the new key.

As caches use relatively expensive, faster memory, they are not designed to store large data sets. Whenever the cache becomes full, we must evict some data from it. There are several caching algorithms to implement a cache eviction policy. LRU is a very simple and commonly used algorithm. The core concept of the LRU algorithm is to evict the oldest data from the cache to accommodate more data.

Solution#

This problem can be solved efficiently if we combine two data structures and use their respective functionalities, as well as the way they interact with each other, to our advantage. A doubly linked list allows us to arrange nodes by the time they were last accessed. However, accessing a value in a linked list is $O(n)$ . On the other hand, a hash table has $O(1)$ lookup time but has no concept of recency. We can combine these two data structures to get the best properties.

Here is the algorithm for the LRU cache:

Set:

If the element exists in the hash map, then update its value and move the corresponding linked list node to the head of the linked list.
Otherwise, if the cache is already full, remove the tail element from the doubly linked list. Then delete its hash map entry, add the new element at the head of the linked list, and add the new key-value pair to the hash map.

Get:

If the element exists in the hash map, move the corresponding linked list node to the head of the linked list and return the element value.
Otherwise, return -1.

Note that the doubly linked list keeps track of the most recently accessed elements. The element at the head of the doubly linked list is the most recently accessed element. All newly inserted elements (in Set) go to the head of the list. Similarly, any element accessed (in the Get operation) goes to the head of the list.

Let’s look at the code for this solution below:

main.py

linked_list_node.py

linked_list.py

from linked_list import LinkedList
# We will use a linkedlist of a pair of integers
# where the first integer will be the key
# and the second integer will be the value
class LRUCache:
    # Initializes an LRU cache with the capacity size
    def __init__(self, capacity):
        self.cache_capacity = capacity
        self.cache_map = {}
        self.cache_list = LinkedList()
    # Returns the value of the key, or -1 if the key does not exist.
    def get(self, key):
        # If the key doesn't exist, we return -1
        found_itr = None
        if key in self.cache_map:
            found_itr = self.cache_map[key]
        else:
            return -1
        list_iterator = found_itr
        # If the key exists, we need to move it to the front of the list
        self.cache_list.move_to_head(found_itr)
        return list_iterator.pair[1]
    # Adds a new key-value pair or updates an existing key with a new value
    def set(self, key, value):
        # Check if the key exists in the cache hashmap
        # If the key exists
        if key in self.cache_map:
            found_iter = self.cache_map[key]
            list_iterator = found_iter
            # Move the node corresponding to key to front of the list
            self.cache_list.move_to_head(found_iter)
            # We then update the value of the node
            list_iterator.pair[1] = value
            return
        # If key does not exist and the cache is full
        if len(self.cache_map) == self.cache_capacity:
            # We will need to evict the LRU entry
            # Get the key of the LRU node
            # The first element of each cache entry is the key
            key_tmp = self.cache_list.get_tail().pair[0]
            # This is why we needed to store a <key, value> pair
            # in the cacheList. We would not have been able to get
            # the key if we had just stored the values
            # Remove the last node in list
            self.cache_list.remove_tail()
            # Remove the entry from the cache
            del self.cache_map[key_tmp]
        # The insert_at_head function inserts a new element at the front
        # of the list in constant time
        self.cache_list.insert_at_head([key, value])
        # We set the value of the key as the list begining
        # since we added the new element at head of the list
        self.cache_map[key] = self.cache_list.get_head()
    def print(self):
        print("Cache current size: ", self.cache_list.size,
              ", ", end="")
        print("Cache contents: {", end="")
        node = self.cache_list.get_head()
        while node:
            print("{", str(node.pair[0]), ",", str(node.pair[1]),
                  "}", end="")
            node = node.next
            if node:
                print(", ", end="")
        print("}")
        print("-"*100, "\n")
def main():
    # Creating a cache of size 2
    cache_capacity = 2
    cache = LRUCache(cache_capacity)
    print("Initial state of cache")
    print("Cache capacity: " + str(cache_capacity))
    cache.print()
    keys = [10, 10, 15, 20, 15, 25, 5]
    values = ["20", "get", "25", "40", "get", "85", "5"]
    for i in range(len(keys)):
        if values[i] == "get":
            print("Getting by Key: ", keys[i])
            print("Cached value returned: ", cache.get(keys[i]))
        else:
            print("Setting cache: Key: ", keys[i], ", Value: ", values[i])
            cache.set(keys[i], int(values[i]))
        cache.print()
if __name__ == '__main__':
    main()

Implement LRU Cache

Now that we've explored the design and implementation of Custom Data Structures, let's explore the next coding pattern in the list.

Pattern 5: Sliding Window#

The Sliding Window pattern is a useful tool for efficiently solving problems involving sequential data such as arrays or strings, where computations on subsets of data must be repeated. In this technique, a window is defined as a contiguous subset of elements within the data that adjusts its boundaries as it moves through it. Sequential information processing is efficient because the window only focuses on relevant subsets of the data at any given time, avoiding unnecessary computations on the entire dataset. Computations are typically updated in constant time by considering elements entering or exiting the window. By subtracting leaving elements and adding new ones, the computational time remains constant with each movement of the window. Problems like Find Maximum in Sliding Window, Repeated DNA Sequences, and Best Time to Buy and Sell Stock are commonly solved using the Sliding Window pattern.

Let’s see how the following example illustrates the application of the Sliding Window pattern to efficiently solve the given coding problem:

Longest Repeating Character Replacement#

Given a string s and an integer k, find the length of the longest substring in s, where all characters are identical, after replacing, at most, k characters with any other lowercase English character.

Solution#

We can use the Sliding Window pattern which utilizes two pointers to slide a window over the input string. We initialize the start and the end pointer with $0$ . To move the window over the input string, we change the positions of the pointers as follows:

Increment the end pointer until the window becomes invalid.
Increment the start pointer only if the window is invalid to make it valid again.

We keep track of the frequency of characters in the current window using a hash map. We also maintain a variable, lengthOfMaxSubstring, to keep track of the longest substring with the same characters after replacements and mostFreqChar to keep track of the frequency of the most occurring character.

We check whether the new character is in the hash map in each iteration. If it is present in the hash map, we increment its frequency by $1$ ; otherwise, we add the character in the hash map and set its frequency to $1$ . Next, we compare the frequency of the new character with mostFreqChar to update the frequency of the most occurring character so far using the following expression:

$max(most \space freq \space character, \space frequency \space of \space new \space character)$

Then, we use the following expression to check if the number of characters in the window other than the most occurring character is greater than k:

$end \space - \space start \space + \space 1 \space - \space most \space freq \space char \space > \space k$

If the expression above returns TRUE, the number of replacements required in the current window has exceeded our limit, that is, k. In this case, we decrement the frequency of the character to be dropped out of the window and adjust the window by moving the start pointer by $1$ to make the window valid again. We continue this process until we have traversed the entire string.

Then, we update lengthOfMaxSubstring with the current window size if the window size is greater than lengthOfMaxSubstring.

Finally, when the entire input string has been traversed, we return the length of the longest substring such that all the characters in the substring are the same.

Let’s have a look at the code for the algorithm we just discussed.

def longest_repeating_character_replacement(s, k):
    # initialize variables
    string_length = len(s)
    length_of_max_substring = 0
    start = 0
    char_freq = {}
    most_freq_char = 0
    # iterate over the input string
    for end in range(string_length):
        # if the new character is not in the hash map, add it, else increment its frequency
        if s[end] not in char_freq:
            char_freq[s[end]] = 1
        else:
            char_freq[s[end]] += 1
        # update the most frequent char
        most_freq_char = max(most_freq_char, char_freq[s[end]])
        # if the number of replacements in the current window have exceeded the limit, slide the window
        if end - start + 1 - most_freq_char > k:
            char_freq[s[start]] -= 1
            start += 1
        
        # if this window is the longest so far, update the length of max substring
        length_of_max_substring = max(end - start + 1, length_of_max_substring)
        
    # return the length of the max substring with same characters after replacement(s)
    return length_of_max_substring
# Driver code
def main():
    input_strings = ["aabccbb", "abbcb", "abccde", "abbcab", "bbbbbbbbb"]
    values_of_k = [2, 1, 1, 2, 4]
    for i in range(len(input_strings)):
        print(i+1, ".\tInput String: ", input_strings[i], sep="")
        print("\tk: ", values_of_k[i], sep="")
        print("\tLength of longest substring with repeating characters: ", longest_repeating_character_replacement(input_strings[i], values_of_k[i]))
        print("-" * 100)
if __name__ == '__main__':
    main()

Longest Repeating Character Replacement

With our understanding of Sliding Window established, let's move on to discussing the next coding pattern.

Pattern 6: Modified Binary Search#

In many coding interviews, candidates often encounter problems where binary search comes in handy. It's known for its logarithmic time complexity which makes it super efficient. However, it only works when the input data is already sorted. That's where the Modified Binary Search pattern steps in. It is an advanced adaptation of the traditional binary search algorithm, modified to handle more complex scenarios where elements may not strictly meet the standard sorted criteria. This pattern excels in efficiently locating elements or conditions that are not straightforward to find through linear searching, particularly when dealing with rotated arrays, finding boundaries, or solving the random pick weight problem.

By dividing the search space in half, this method significantly reduces the time complexity to $O(\log n)$ , making it highly effective for large datasets. Like the traditional binary search, it operates by comparing midpoints and systematically narrowing down the search range, but it incorporates additional logic to handle irregularities and special cases.

The adaptability of the Modified Binary Search pattern makes it a powerful tool in software development, enhancing the ability to manage and retrieve data efficiently in scenarios where direct comparisons and typical ordering do not apply. This pattern not only streamlines data retrieval processes but also aids in optimizing performance across various programming tasks.

Let’s see how the following example illustrates the application of the Modified Binary Search pattern to efficiently solve the given coding problem:

Random Pick with Weight#

We’re given an array of positive integers, weights, where weights[i] is the weight of the $i^{th}$ index. The task here is to write a function, Pick Index(), which performs weighted random selection to return an index from the weights array. The larger the value of weights[i], the heavier the weight is, and the higher the chances of its index being picked.

Suppose that the array consists of the weights $[12, 84, 35]$ . In this case, the probabilities of picking the indexes will be as follows:

Index 0: $12/(12 + 84 + 35) = 9.2\%$
Index 1: $84/(12+84+35)=64.1\%$
Index 2: $35/(12+84+35)=26.7\%$

Note: Since we’re randomly choosing from the options, there is no guarantee that in any specific run of the program, any of the elements will be selected with the exact expected frequency.

Solution#

We can use the Modified Binary Search pattern to speed up the random index-picking process. It reduces the index searching time to $O(\log n)$ which otherwise would have been $O(n)$ with linear scanning. We start by preprocessing the given weights by summing them up to obtain the total weight. Then, we construct a prefix sum array where each index i stores the cumulative sum of weights up to index i. Next, we generate a random number between 1 and the total weight. Finally, we use binary search to find the index corresponding to the randomly generated number in the prefix sum array. This approach ensures that elements with higher weights have a proportionally higher chance of being selected while maintaining randomness.

Here’s how the algorithm works:

The Init() method generates a list of cumulative sums using the given list of weights.
The Pick Index() method returns a randomly selected index while considering the provided weights. It works as follows:
- Generates a random number, target, between $1$ and $k$ , where $k$ is the largest value in the list of cumulative sums of weights.
- Uses binary search to find the index of the first cumulative sum greater than the random value. Initialize the low index to $0$ and the high index to the length of the list of cumulative sums of weights. While the low index is less than or equal to the high index, the algorithm:
  - Calculates the mid index as $⌊$ (low $+$ (high $−$ low) $//2⌋$ .
  - If the cumulative sum at the mid index is less than or equal to target, the low index is updated to mid + 1.
  - Otherwise, the high index is updated to mid.
- At the end of the binary search, the low pointer will point to the index of the first cumulative sum greater than target. Return this index as the chosen index.

Let’s look at the code for this solution below:

import random
class RandomPickWithWeight:
    
	# Constructor
    def __init__(self, weights):
        
		# List to store running sums of weights
        self.running_sums = []
        
		# Variable to calculate running sum
        running_sum = 0
        # Iterate through the given weights
        for w in weights:
            
			# Add the current weight to the running sum
            running_sum += w
            
			# Append the running sum to the running_sums list
            self.running_sums.append(running_sum)
        
        # Store the total sum of weights
        self.total_sum = running_sum
    # Method to pick an index based on the weights
    def pick_index(self):
        
		# Generate a random number between 1 and the total sum of the array
        target = random.randint(1, self.total_sum)
        
		# Initialize low and high variables for binary search
        low = 0
        high = len(self.running_sums)
        # Perform binary search to find the first value higher than the target
        while low < high:
            mid = low + (high - low) // 2
            if target > self.running_sums[mid]:
                low = mid + 1
            else:
                high = mid
        
        # Return the index (low) found
        return low
# Driver code
def main():
    counter = 900
    weights = [[1, 2, 3, 4, 5],
                [1, 12, 23, 34, 45, 56, 67, 78, 89, 90],
                [10, 20, 30, 40, 50],
                [1, 10, 23, 32, 41, 56, 62, 75, 87, 90],
                [12, 20, 35, 42, 55],
                [10, 10, 10, 10, 10],
                [10, 10, 20, 20, 20, 30],
                [1, 2, 3],
                [10, 20, 30, 40],
                [5, 10, 15, 20, 25, 30]]
    dict = {}
    for i in range(len(weights)):
        print(i + 1, ".\tList of weights: ", weights[i], ", pick_index() called ", counter, " times", "\n", sep="")
        [dict.setdefault(l, 0) for l in range(len(weights[i]))]
        sol = RandomPickWithWeight(weights[i])
        for j in range(counter):            
            index = sol.pick_index()
            dict[index] += 1
        print("-"*105)
        print("\t{:<10}{:<5}{:<10}{:<5}{:<15}{:<5}{:<20}{:<5}{:<15}".format( \
            "Indexes", "|", "Weights", "|", "Occurences", "|", "Actual Frequency", "|", "Expected Frequency"))
        print("-"*105)
        for key, value in dict.items():
            print("\t{:<10}{:<5}{:<10}{:<5}{:<15}{:<5}{:<20}{:<5}{:<15}".format(key, "|", weights[i][key], "|", value, "|", \
                str(round((value/counter)*100, 2)) + "%", "|", str(round(weights[i][key]/sum(weights[i])*100, 2))+"%"))
        dict = {}
        print("\n", "-"*105, "\n", sep="")
if __name__ == '__main__':
    main()

Random Pick with Weight

Now that we've discussed Modified Binary Search, let's focus on another important coding pattern.

Pattern 7: Two Pointers#

The Two Pointers technique is one of the must-know coding techniques for effective problem-solving. It involves traversing linear data structures like arrays or linked lists using two pointers moving in a coordinated way. These pointers move in either one direction or opposite directions based on the given problem’s requirements until a condition is met or the input is exhausted. This technique is the go-to solution for problems with sequentially arranged data like arrays, strings, or linked lists or if we need to find some paired elements to satisfy certain constraints or conditions. From verifying a palindrome to detecting cycles in the given data, the Two Pointers technique showcases its efficiency by providing solutions with linear time complexity.

The dynamic movement of these pointers makes the technique both efficient and versatile. Pointers can move independently of each other based on specific criteria, advancing through the same or different data structures. Whether they move in tandem or diverge, their synchronized traversal enables swift problem resolution.

Let’s see how the following example illustrates the application of the Two Pointers pattern to efficiently solve the given coding problem:

Trapping Rain Water#

Given a sequence of non-negative integers representing the heights of bars in an elevation map, the goal is to determine the amount of rainwater that can be trapped between the bars after rain.

Solution#

An optimized approach to solving this problem utilizes the Two Pointers technique. Instead of separately processing each element's left and right sides, we simplify it into a single iteration using two pointers, left and right, initially positioned at the elevation map's extremes. The key idea is to maintain two variables, left_max and right_max, which track the maximum heights encountered on the left and right. As the pointers move inwards, they calculate the trapped water for each bar based on the lower of the two maximum heights.

Here's the step-by-step algorithm to find the solution:

Start iterating the heights array using two pointers, left and right. To keep track of maximum heights on the leftmost side and the rightmost side, use two variables left_max and right_max.
If left_max is greater than right_max then it means the maximum height on the left side is greater than the maximum height on the right side.
Hence, we proceed to the right side and calculate the trapped water at the current right position based on right_max. Otherwise, we move on to the left side.
Store the amount of water that can be accumulated by taking a difference between the maximum of the respective sides (left_max or right_max) and the current bar’s height.
Keep iterating and updating the pointers at each step until left becomes greater than right.

Let’s look at the code for this solution below:

def rain_water(heights):
	
    # Initialize two pointers at the leftmost and rightmost positions of the elevation map
	left = 0
	right = len(heights) - 1
    # Variables to store the accumulated rainwater and maximum heights
	stored_water = 0
	left_max, right_max = 0, 0
	
	while left <= right:
		# If the maximum height on the right is less than or equal to the maximum height on the left
		if  left_max > right_max:
			stored_water +=  max(0, right_max - heights[right])
			# Update the right maximum height if necessary
			right_max = max(right_max, heights[right])
			right -= 1
        
		# If the maximum height on the left is less than the maximum height on the right
		else:
			stored_water += max(0, left_max - heights[left])
			# Update the left maximum height if necessary
			left_max = max(left_max, heights[left])
			left +=1
	return stored_water
# Driver code
def main():
    input_list = [
                [1, 0, 1, 2, 1, 4, 0, 3, 5], 
                [2, 0, 9, 6],
                [3, 1, 2, 0, 2],
                [4, 2, 5, 3], [3, 0]
				
                ]
    index = 1
    for i in input_list:
        print(str(index)+".\tHeights: "+str(i))
        print("\tMaximum rainwater: " + str(rain_water(i)))
        index += 1
        print("-" * 100)
if __name__ == "__main__":
    main()

Trapping Rain Water

Now that we've covered the Two Pointers, let's move on to another frequently asked coding pattern.

Pattern 8: Graphs#

The Graphs pattern offers a structured approach for solving problems involving graph data structure, where entities are represented as nodes and their relationships are represented as edges. This pattern involves traversing the graph using various algorithms like Depth-First Search (DFS) or Breadth-First Search (BFS) to explore its vertices and edges systematically.

Breadth-First Search (BFS) starts at a selected node and explores all its neighboring nodes at the current depth level before moving to the nodes at the next depth level. It traverses the graph level by level, often using a queue data structure to keep track of the nodes to be visited. BFS is well-suited for finding the shortest path between two nodes in an unweighted graph.
Depth-First Search (DFS) starts at a selected node and explores as far as possible along each branch before backtracking. It explores one branch fully before moving on to the next branch. DFS is often implemented using recursion or a stack data structure to keep track of the visited nodes and explore the unvisited ones. This algorithm is useful for tasks like finding cycles in a graph, topological sorting, or exploring all possible paths from a starting node.

Common problems suitable for this pattern include finding the shortest path between two nodes, determining the connectivity of a network, or identifying cycles within a graph.

Let’s check out the following interview problem to see how the Graphs pattern works:

Pacific Atlantic Water Flow#

Imagine an island with a rectangular shape that touches both the Pacific and Atlantic oceans. The northern and western sides meet the Pacific, while the southern and eastern sides touch the Atlantic. This island is divided into square cells.

To depict the height above sea level of each cell, we use an integer matrix, heights, of size $(m \times n)$ . Here, heights[r][c] represents the elevation of the cell at coordinate $(r, c)$ .

When heavy rain pours down on the island every few months, water flows from the island to both the Pacific and Atlantic oceans. The path of flow depends on the heights of the cells.

Consider a cell with a height of $9$ that’s higher than or equal to its neighboring cells to the north, east, west, and south. This indicates that water can flow from this cell to adjacent ones. Similarly, this process is repeated until the flow of water either reaches the Pacific or Atlantic border or a cell that can not flow water to any of its neighbors. If the flow of water starting from a cell can direct water to both the Pacific and Atlantic Ocean borders, we include it in the output.

Note: Any cell adjacent to an ocean can channel water into the ocean.

With this information, our task is to return a 2-D array of coordinates. Each entry $(r_i, c_i)$ in this array indicates that rainwater can flow from the cell at coordinate $(r_i, c_i)$ to both the Pacific and Atlantic oceans.

Solution#

The idea stems from the observation that water flows downhill, seeking the lowest possible path. By starting the exploration from the ocean edges and traversing cells equal to or higher in elevation, we can identify regions where water can reach both the Pacific and Atlantic Oceans. The depth-first search (DFS) algorithm is chosen since it naturally emulates the flow of water, recursively exploring cells and marking those that are reachable. The approach focuses on identifying cells that can be reached from both ocean edges, implying that water can flow in both directions, and by finding the intersection of the sets of reachable cells from each ocean, the algorithm effectively pinpoints locations where water can flow to both oceans.

Here’s how the algorithm works:

We initialize the following variables to assist us in performing DFS:
- num_rows: This is the total number of rows in the matrix.
- num_cols: This is the total number of columns in the matrix.
- pacific_reach: This is a hash set that will contain the coordinates of the cells that can be reached from the Pacific Ocean.
- atlantic_reach: This is a hash set that will contain the coordinates of the cells that can be reached from the Atlantic Ocean.
The dfs function has the following parameters:
- row, col: The coordinates of the initial cell where DFS will begin.
- reach: The hash set of the respective ocean for which DFS is called.
Here’s how the dfs function works:
- The very first cell passed to the function will always be reachable since this cell is from the border of the matrix about the respective ocean. Therefore, the coordinates of this cell are added to reach.
- For the cell above that has been added to reach, we perform DFS by exploring all four directions (top, bottom, left, right) from the cell. This is achieved through the use of a loop that adds the following coordinates to the current (row, col):
  - (1,0): The cell to the immediate top of the current cell is explored.
  - (0,1): The cell to the immediate bottom of the current cell is explored.
  - (-1,0): The cell to the immediate left of the current cell is explored.
  - (0,-1): The cell to the immediate right of the current cell is explored.
- The following conditions are checked for each cell that is explored:
  - If the cell is out of bounds, it is skipped since it doesn’t exist in the matrix.
  - If the cell is already in reach, it is skipped since it has already been visited.
  - If the cell's height is greater than the height of the previous cell, it is skipped since the previous cell can not be reached from this cell.
- If the three conditions above are not satisfied, the current cell is a valid cell on which we can continue DFS. So the dfs function is called again for this cell, where it is added to reach, and then its neighbors are explored again.
The dfs function is then called for each cell of the Pacific and Atlantic border in the matrix, which populates pacific_reach and atlantic_reach with the coordinates of the cells that can flow water to the respective oceans.
Finally, we determine the common coordinates in pacific_reach and atlantic_reach and store them in the output array. Because these cells can flow water to both oceans, we return the output array containing them.

Let’s look at the code for this solution below:

main.py

print.py

from print import *
def estimate_water_flow(heights):
    # Get the number of rows and columns in the heights
    num_rows, num_cols = len(heights), len(heights[0])
    # Initialize sets to track cells reachable from the Pacific and Atlantic oceans
    pacific_reach = set()
    atlantic_reach = set()
    
    # Define a Depth-First Search (DFS) function
    def dfs(row, col, reach):
        reach.add((row, col))
        # Explore all 4 adjacent directions: down, right, up, left
        for (x, y) in [(1, 0), (0, 1), (-1, 0), (0, -1)]:
            new_row, new_col = row + x, col + y
            # Check if the new coordinates are outside the height bounds
            if new_row < 0 or new_row >= num_rows or new_col < 0 or new_col >= num_cols:
                continue
            # Skip if the cell is already visited
            if (new_row, new_col) in reach:
                continue
            # Skip if the cell's height is lower than the current cell's height
            if heights[new_row][new_col] < heights[row][col]:
                continue
            # Recursively explore the reachable cells
            dfs(new_row, new_col, reach)
    
    # Start DFS from the left and right borders of the heights for Pacific and Atlantic Oceans
    for i in range(num_rows):
        dfs(i, 0, pacific_reach)  # Start from left
        dfs(i, num_cols - 1, atlantic_reach)  # Start from right
    for i in range(1, num_cols, 1):
        dfs(0, i , pacific_reach) # Start from top 
        dfs(num_rows - 1, i - 1, atlantic_reach) # Start from bottom
    
    # Find the cells that are reachable from both oceans
    return list(pacific_reach.intersection(atlantic_reach))
# Driver code
def main():
    matrices = [
        [[1, 2, 2, 3, 5], [3, 2, 3, 4, 4], [2, 4, 5, 3, 1], [6, 7, 1, 4, 5], [5, 1, 1, 2, 4]],
        [[4, 4, 4, 3, 1], [1, 5, 3, 7, 7], [3, 1, 3, 7, 5], [1, 2, 4, 4, 7], [4, 3, 1, 7, 1]],
        [[7, 3, 5, 2, 8], [2, 3, 4, 5, 6], [3, 9, 6, 8, 4]],
        [[1, 0, 1], [1, 1, 0], [1, 1, 1]],
        [[2, 3, 4, 5, 6, 7, 8], [2, 9, 3, 8, 4, 5, 6],[3, 4, 6, 7, 8, 5, 4], [9, 1, 5, 6, 7, 5, 5]]
    ]
    for i in range(len(matrices)):
        print(i + 1, ".\t Input heights: ", sep="")
        print_matrix(matrices[i])
        print("\n\t Common coordinates: ", estimate_water_flow(matrices[i]))
        print("-" * 100)
if __name__ == "__main__":
    main()

Pacific Atlantic Water Flow

With our understanding of Graphs established, let’s explore the last, but certainly not least, coding pattern from Google’s list of frequently asked patterns.

Pattern 9: Dynamic Programming #

The Dynamic Programming pattern is a technique that helps solve complex problems by breaking them down into simpler subproblems and storing their solutions to avoid redundant computations. This technique is useful when the problem can be divided into overlapping subproblems and optimal substructures. By storing and reusing intermediate results, dynamic programming enables us to solve problems with improved time and space complexity. For instance, a naive recursive approach to check if a string like "rotator" is a palindrome or to calculate Fibonacci numbers can be inefficient due to the repeated calculations of the same subproblems. Dynamic programming addresses these inefficiencies through two main strategies:

Memoization (top-down approach): This technique optimizes recursion by storing the results of subproblems the first time they are computed, preventing redundant calculations.
Tabulation (bottom-up approach): Tabulation constructs a table to store the results of smaller subproblems, gradually building up to solve the larger problem.

Let’s see how the following example illustrates the application of the Dynamic Programming pattern to efficiently solve the given coding problem:

Longest Palindromic Substring#

Given a string s, return the longest palindromic substring in s.

Solution#

If we look at the example above, we notice that any substring of length $3$ contains a substring of length $1$ at the center. Although we had already checked all substrings of length $1$ , our algorithm rechecked them for substrings of longer lengths. This rechecking consumes a lot of time, which can be avoided by storing and reusing the results of the earlier computations. To do this, we can create a lookup table, dp, of size $n \times n$ , where $n$ is the length of the input string. Each cell dp[i][j] will store whether the string s[i..j] is a palindromic substring. If the cell dp[i][j] holds the result of the earlier computation, we will utilize it in the $O(1)$ lookup time instead of making a recursive call again.

Create a resultant array, res, to store the starting and ending indexes of the longest palindromic substring. Initialize it with $[0, 0]$ depicting that the longest palindromic substring seen so far is the first character of the input string. By definition, any string of length $1$ is always a palindrome.
Initialize a lookup table, dp, with FALSE.
Base case 1: The diagonal in the lookup table is populated with TRUE because any cell in the diagonal corresponds to a substring of length $1$ , and any string of length $1$ is always a palindrome.
Base case 2: We check whether all two-letter substrings are palindromes and update the res and dp accordingly. We do this by iterating over the string, comparing s[i] and s[i+1], and storing the result at dp[i][i+1]. After that, we also update res if the value of dp[i][i+1] is TRUE, which tells us that the two-letter substring was a palindrome.
After these base cases, we check all substrings of lengths greater than $2$ . However, we only compare the first and the last characters. The rest of the string is checked using the lookup table. For example, for a given string “zing”, we want to check whether “zin” is a palindrome. We’ll only compare ‘z’ and ‘n’ and check the value of dp[1][1], which will tell whether the remaining string “i”, represented by s[1..1], is a palindrome. We’ll take the logical AND of these two results and store it at dp[0][2] because “zin” is represented by the substring s[0..2]. This way, we’ll avoid redundant computations and check all possible substrings using the lookup table.

Let's implement the algorithm as discussed above:

def longest_palindromic_substring(s):
	
    # To store the starting and ending indexes of LPS
    res = [0, 0]
    n = len(s)
    # Initialize a lookup table of dimensions len(s) * len(s)
    dp = [[False for i in range(len(s))] for i in range(len(s))]
    # Base case: A string with one letter is always a palindrome
    for i in range(len(s)):
        dp[i][i] = True
    # Base case: Substrings of length 2
    for i in range(len(s)-1):
        dp[i][i + 1] = (s[i] == s[i + 1]) # Check if two characters are equal
        if dp[i][i + 1]:
            res = [i, i + 1] # Update the resultant array
    # Substrings of lengths greater than 2
    for length in range(3, len(s)+1):
        i = 0
        # Checking every possible substring of any specific length
        for j in range(length - 1, len(s)): # Iterate over possible ending indexes
            dp[i][j] = dp[i + 1][j - 1] and (s[i] == s[j])
            if dp[i][j]:
                res = [i, j]
            i += 1
    
    return s[res[0]:res[1] + 1] # Return the longest palindromic substring
# Driver code
def main():
    strings = ['cat', 'lever', 'xyxxyz', 'wwwwwwwwww', 'tattarrattat']
    
    for i in range(len(strings)):
        print(i + 1, ".\t Input string: '", strings[i], "'", sep="")
        result = longest_palindromic_substring(strings[i])
        print("\t Number of palindromic substrings: ", result, sep="")
        print("-" * 100)
    
if __name__ == '__main__':
    main()

Longest Palindromic Substring

That's about exploring the coding patterns based on the frequently asked coding questions by Google.

Mastering the patterns we have just discovered to ace your Google interview is important. Understanding the underlying patterns behind the solutions you devise will not only help you tackle similar problems in the future but also demonstrate your depth of understanding to interviewers. We have explored some of the most common coding patterns with the help of interview questions frequently asked by Google, but it’s just a start. Remember, practice makes perfect, so dedicate time to solving problems regularly and seek feedback to improve further. You must focus on the Google behavioral interview as well and may explore the following courses by Educative for even better preparation because they cover a wide range of coding patterns as well as Dynamic Programming patterns, and that too in various programming languages:

Written By:

Minahil Yaser

Join 2.5 million developers at

Explore the catalog

Free Resources