Finding and deleting duplicate files
Duplicate files are copies of the same files. In some circumstances, we may need to remove duplicate files and keep a single copy of them. Identification of duplicate files by looking at the file content is an interesting task. It can be done using a combination of shell utilities. This recipe deals with finding duplicate files and performing operations based on the result.
Getting ready
We can identify the duplicate files by comparing file content. Checksums are ideal for this task, since files with exactly the same content will produce the same checksum values. We can use this fact to remove duplicate files.
How to do it...
Generate some test files as follows:
$ echo "hello" > test ; cp test test_copy1 ; cp test test_copy2; $ echo "next" > other; # test_copy1 and test_copy2 are copy of test
The code for the script to remove the duplicate files is as follows:
#!/bin/bash #Filename: remove_duplicates.sh #Description: Find and remove duplicate files...