Delete Duplicate Folders

Solve a hard-level problem of deleting duplicate folders from a file system using tries.

Problem statement

A new file system doesn't allow duplicate directory structures. Two directories are said to have similar structures if they contain the same set of identical directories.

The file system has a duplicate checker that regularly identifies the directories with similar structures and marks them for deletion in the next cycle. The file system only runs the deletion once, so any folder that becomes identical after the initial deletion is not deleted.

Return the 2D array containing the paths of the remaining folders after deleting all the marked folders.

Example

Sample input

Explanation

Directories "/Ben" and "/Emilia" in the file structure below are identical. They, as well as their sub-directories, should all be marked.

/Ben/ Ben/Movies/ Ben/College/ Ben/College/Physics

/Emilia /Emilia/Movies /Emilia/College/ /Emilia/College/Physics

However, if the file structure also included the path "/Emilia/Office", then the folders "/Ben" and "/Emilia" would not be identical. Note that "/Emilia/College" and "/Ben/College" would still be considered identical even with the added folder.

Try it yourself

Try to solve the problem yourself before reading the solution.

After inserting all the folders in the trie, we can traverse the trie to identify the subdirectory structure. In addition, a comparison of the post-order traversal of the trie nodes can help us determine if the same subdirectory structure has been encountered before.

Algorithm

Below is the summary of the algorithm.

Step 1: Insert the directories in a trie

Construct a trie with the directories given in the input list.

Step 2: Find directories with similar structure

Create a deduplication mechanism. Traverse the trie in post-order fashion. Convert the directory structures to strings and store them in a hashmap. If a string already exists in the hashmap as the key, it implies that a similar subdirectory structure already exists in the file system. The hashmap stores the directory structure string as the key and the pointer to the trie node as the value. This information is used for marking the directories that have been deleted in the trie.

Step 3: Fetch the remaining directories

In the end, traverse the entire trie again to gather the directories which are not marked as deleted, add them to the output, and return the output.

Visualization

The picture below represents the duplicate folder deletion procedure.

Access this course and 1400+ top-rated courses and projects.

Preview Free Lessons→

Preview Free Lessons

Operation	Time Complexity
Inserting a directory path in the trie	$O(F)$
Inserting all input directory paths in the trie	$O(NF)$
Traversing the trie to perform deduplication	$O(NF)$
Traversing the trie to collect remaining directories	$O(NF)$

Introduction to Tries

Prefix Search

Suffix Search

Bitwise Tries

Pattern Matching

File Systems

Trie Traversal

Search Engine

Miscellaneous

Conclusion

Delete Duplicate Folders

Problem statement

Example

Try it yourself

Intuition

Algorithm

Visualization

Complexity analysis

Variables

Time complexity