Go Find Duplicates
Introduction
A blazingly-fast simple-to-use tool to find duplicate files (photos, videos, music, documents etc.) on your computer, portable hard drives etc.
Note:
- This tool just reads your files and creates a 'duplicates report' file
- It does not delete or otherwise modify your files in any way π
- So, it's very safe to use π
How to install?
- Install Go version at least 1.19
- Run command:
go install github.com/m-manu/go-find-duplicates@latest
- Add following line in your
.bashrc
/.zshrc
file:export PATH="$PATH:$HOME/go/bin"
How to use?
go-find-duplicates {dir-1} {dir-2} ... {dir-n}
Command line options
Running go-find-duplicates --help
displays following:
go-find-duplicates is a tool to find duplicate files and directories
Usage:
go-find-duplicates [flags] <dir-1> <dir-2> ... <dir-n>
where,
arguments are readable directories that need to be scanned for duplicates
Flags (all optional):
-x, --exclusions string path to file containing newline-separated list of file/directory names to be excluded
(if this is not set, by default these will be ignored:
.DS_Store, System Volume Information, $RECYCLE.BIN etc.)
-h, --help display help
-m, --minsize uint minimum size of file in KiB to consider (default 4)
-o, --output string following modes are accepted:
text = creates a text file in current directory with basic information
csv = creates a csv file in current directory with detailed information
print = just prints the report without creating any file
json = creates a JSON file in the current directory with basic information
(default "text")
-p, --parallelism uint8 extent of parallelism (defaults to number of cores minus 1)
-t, --thorough apply thorough check of uniqueness of files
(caution: this makes the scan very slow!)
--version Display version (1.6.0) and exit (useful for incorporating this in scripts)
For more details: https://github.com/m-manu/go-find-duplicates
Running this through a Docker container
docker run --rm -v /Volumes/PortableHD:/mnt/PortableHD manumk/go-find-duplicates:latest go-find-duplicates -o print /mnt/PortableHD
In above command:
- option
--rm
removes the container when it exits - option
-v
is mounts host directory/Volumes/PortableHD
as/mnt/PortableHD
inside the container
How does this identify duplicates?
By default, this tool identifies duplicates if all of the following conditions match:
- file extension is same
- file size is same
- CRC32 hash of "crucial bytes" is same
If above default isn't enough for your requirements, you could use the command line option --thorough
to switch to
SHA-256 hash of entire file contents. But remember, with this, scan becomes much slower!
When tested on my portable hard drive containing >172k files (videos, audio files, images and documents), with and
without --thorough
option, the results were same!