Delete Duplicate Folders
Solve a hard-level problem of deleting duplicate folders from a file system using tries.
Problem statement
A new file system doesn't allow duplicate directory structures. Two directories are said to have similar structures if they contain the same set of identical directories.
The file system has a duplicate checker that regularly identifies the directories with similar structures and marks them for deletion in the next cycle. The file system only runs the deletion once, so any folder that becomes identical after the initial deletion is not deleted.
Return the 2D array containing the paths of the remaining folders after deleting all the marked folders.
Example
Sample input
folders: [["Ben"], ["Ben", "Movies"], ["Ben", "College"], ["Ben", "College", "Physics"],["Emilia"], ["Emilia", "Movies"], ["Emilia", "College"], ["Emilia", "College", "Physics"],["Adam"], ["Adam", "Work"]]
Sample output
[["Adam"], ["Adam", "Work"]]
Explanation
Directories "/Ben"
and "/Emilia"
in the file structure below are identical. They, as well as their sub-directories, should all be marked./Ben/
Ben/Movies/
Ben/College/
Ben/College/Physics/Emilia
/Emilia/Movies
/Emilia/College/
/Emilia/College/Physics
However, if the file structure also included the path "/Emilia/Office"
, then the folders "/Ben"
and "/Emilia"
would not be identical. Note that "/Emilia/College"
and "/Ben/College"
would still be considered identical even with the added folder.
Try it yourself
Try to solve the problem yourself before reading the solution.
#include<iostream>#include<vector>using namespace std;vector<vector<string>> deleteDuplicateFolders(vector<vector<string>> data){// your code goes herevector<vector<string>> answer;return answer;}
Intuition
Directory structure can be very well represented as a trie. The main task here is to identify the directories with similar subdirectory structures and mark them as deleted. For this purpose, we need to store some extra information in the trie node.
// Definition of a trie nodeclass TrieNode{public:// string to hold the name of the folderstring name;// mapping from folder name to the corresponding child nodeunordered_map<string, TrieNode*> children;// whether this folder is deleted.bool isDeleted = false;TrieNode(string n = ""): name(n) {}};
After inserting all the folders in the trie, we can traverse the trie to identify the subdirectory structure. In addition, a comparison of the post-order traversal of the trie nodes can help us determine if the same subdirectory structure has been encountered before.
Algorithm
Below is the summary of the algorithm.
Step 1: Insert the directories in a trie
Construct a trie with the directories given in the input list.
Step 2: Find directories with similar structure
Create a deduplication mechanism. Traverse the trie in post-order fashion. Convert the directory structures to strings and store them in a hashmap. If a string already exists in the hashmap as the key, it implies that a similar subdirectory structure already exists in the file system. The hashmap stores the directory structure string as the key and the pointer to the trie node as the value. This information is used for marking the directories that have been deleted in the trie.
Step 3: Fetch the remaining directories
In the end, traverse the entire trie again to gather the directories ...