DIY: Find Duplicate Files in System
Solve the interview question "Find Duplicate Files in a System" in this lesson.
We'll cover the following
Problem statement
Suppose you are given a list, paths
, of directory information, including the directory path and all the files with contents in this directory. Your task is to return a list where each element is the path of all the files in the file system that have the same content.
Note: You can return the final answer in any order. At least two files that have the same content are considered to be a group of duplicate files.
A single directory information string in paths
has the following format:
"root/dir1/dir2/.../dirm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"
The above entry represents n
files (f1.txt, f2.txt ... fn.txt)
with content (f1_content, f2_content ... fn_content)
respectively in the directory root/dir1/dir2/.../dirm
. Note that n >= 1
and m >= 0
. If m = 0
, it means the directory is just the root directory.
The output is a list of groups of duplicate file paths. Each group contains all the file paths of the files that have the same content. A file path is a string that has the following format:
directory_path/file_name.txt
Constraints
paths[i]
consist of English letters, digits,'/'
,'.'
,'('
,')'
, and' '
.- We can assume that no files or directories share the same name in the same directory.
- You can assume each given directory information represents a unique directory. A single blank space separates the directory path and file information.
Input
The input will be an array of integers and two integers. The following are examples of input to the function:
// Sample Input 1
paths = ["root/a 4.txt(xyz) 1.txt(algorithms)","root/c 3.txt(educative)","root/c/d 2.txt(algorithms)","root 4.txt(educative) 5.txt(abcd)"]
// Sample Input 2
paths = ["root 1.txt(abcd) 2.txt(algo)","root/a 2.txt(abcd)","root/c/d 4.txt(algo)"]
// Sample Input 3
paths = ["root 1.txt(abcd) 2.txt(algo)","root/a 2.txt(xyzc)","root/c/d 4.txt(educative)"]
Output
The output will be a list of integers containing k
closest integers to x
. The following are examples of the outputs:
// Sample Output 1
[["root/a/1.txt","root/c/d/2.txt"],["root/c/3.txt","root/4.txt"]]
// Sample Output 2
[["root/2.txt","root/c/d/4.txt"],["root/1.txt","root/a/2.txt"]]
// Sample Output 3
[]
Level up your interview prep. Join Educative to access 80+ hands-on prep courses.