Delete Duplicate Folders

Solve a hard-level problem of deleting duplicate folders from a file system using tries.

Problem statement

A new file system doesn't allow duplicate directory structures. Two directories are said to have similar structures if they contain the same set of identical directories.

The file system has a duplicate checker that regularly identifies the directories with similar structures and marks them for deletion in the next cycle. The file system only runs the deletion once, so any folder that becomes identical after the initial deletion is not deleted.

Return the 2D array containing the paths of the remaining folders after deleting all the marked folders.

Example

Sample input

folders: [["Ben"], ["Ben", "Movies"], ["Ben", "College"], ["Ben", "College", "Physics"],
["Emilia"], ["Emilia", "Movies"], ["Emilia", "College"], ["Emilia", "College", "Physics"],
["Adam"], ["Adam", "Work"]]

Sample output

[["Adam"], ["Adam", "Work"]]

Explanation

Directories "/Ben" and "/Emilia" in the file structure below are identical. They, as well as their sub-directories, should all be marked.

/Ben/
Ben/Movies/
Ben/College/
Ben/College/Physics


/Emilia
/Emilia/Movies
/Emilia/College/
/Emilia/College/Physics


However, if the file structure also included the path "/Emilia/Office", then the folders "/Ben" and "/Emilia" would not be identical. Note that "/Emilia/College" and "/Ben/College" would still be considered identical even with the added folder.

Try it yourself

Try to solve the problem yourself before reading the solution.

#include<iostream>
#include<vector>
using namespace std;
vector<vector<string>> deleteDuplicateFolders(vector<vector<string>> data){
// your code goes here
vector<vector<string>> answer;
return answer;
}
Solve delete duplicate folders in C++

Intuition

Directory structure can be very well represented as a trie. The main task here is to identify the directories with similar subdirectory structures and mark them as deleted. For this purpose, we need to store some extra information in the trie node.

// Definition of a trie node
class TrieNode
{
public:
// string to hold the name of the folder
string name;
// mapping from folder name to the corresponding child node
unordered_map<string, TrieNode*> children;
// whether this folder is deleted.
bool isDeleted = false;
TrieNode(string n = ""): name(n) {}
};
Definition of a trie node in C++

After inserting all the folders in the trie, we can traverse the trie to identify the subdirectory structure. In addition, a comparison of the post-order traversal of the trie nodes can help us determine if the same subdirectory structure has been encountered before.

Algorithm

Below is the summary of the algorithm.

Step 1: Insert the directories in a trie

Construct a trie with the directories given in the input list.

Step 2: Find directories with similar structure

Create a deduplication mechanism. Traverse the trie in post-order fashion. Convert the directory structures to strings and store them in a hashmap. If a string already exists in the hashmap as the key, it implies that a similar subdirectory structure already exists in the file system. The hashmap stores the directory structure string as the key and the pointer to the trie node as the value. This information is used for marking the directories that have been deleted in the trie.

Step 3: Fetch the remaining directories

In the end, traverse the entire trie again to gather the directories ...

Access this course and 1400+ top-rated courses and projects.