Duplicate File Management
Find, analyze, and safely remove duplicate files to reclaim storage space
Understanding Duplicate Files
Duplicate files are identical copies of the same file stored in multiple locations. They waste storage space and can make file management confusing. FileFortress helps you identify and manage these duplicates across all your cloud storage providers.
This guide covers finding duplicates, understanding detection methods, exporting deletion lists, and safely cleaning up your storage.
Detection Methods
How FileFortress identifies duplicates
1. Name & Size Matching (Fast)
Files with the same name and size are considered potential duplicates. This method is fast but may include false positives (different files that happen to have the same name and size).
Use case: Quick overview of potential duplicates
Caution: Review carefully before deletion - not 100% accurate
2. Hash Verification (Guaranteed)
Files with identical content hashes (MD5, SHA1, etc.) are guaranteed to be exact duplicates. This method is slower but 100% accurate.
Use case: Safe deletion with confidence
Recommended: Always use --hash-verified-only for exports
Finding Duplicates
Basic duplicate detection
Interactive Mode
filefortress find duplicates
This opens an interactive menu where you can explore duplicate groups, view details, and navigate through your duplicates.
Script Mode (Non-Interactive)
filefortress find duplicates --non-interactive --view summary
Displays a summary of duplicates without interactive prompts - perfect for automation.
Keep Strategies
Deciding which file to keep
When exporting duplicates for deletion, you must choose which file to keep from each group. FileFortress offers several strategies:
oldest - Keep Earliest Modified Date
Preserves the original file based on modification date. Best for maintaining file history.
newest - Keep Latest Modified Date
Keeps the most recently modified version. Useful if files have been updated.
first - Keep First Found
Simple strategy that keeps the first file in the list. Predictable behavior.
smallest - Keep Smallest File
Useful when files have different compression levels or quality settings.
largest - Keep Largest File
Preserves highest quality version when files differ in compression.
by-remote - Keep from Specific Remote
Prioritizes files from your preferred storage provider. Requires --keep-remote option.
--keep-strategy by-remote --keep-remote "Primary Backup"
Exporting Deletion Lists
Generate files for safe cleanup
FileFortress can export lists of duplicate files to delete, allowing you to review before taking action and use external tools for deletion.
Paths Format (Simple Text List)
filefortress find duplicates --non-interactive \
--export-format paths \
--keep-strategy oldest \
--hash-verified-only \
--output-file duplicates-to-delete.txt
Generates a simple text file with one file path per line, plus comments showing which file is being kept.
JSON Format (Structured Data)
filefortress find duplicates --non-interactive \
--export-format json \
--keep-strategy newest \
--hash-verified-only \
--output-file duplicates.json
Creates a structured JSON file with complete information about each duplicate group, including which file to keep and why.
Rclone Script (Cloud Storage)
filefortress find duplicates --non-interactive \
--export-format rclone \
--keep-strategy oldest \
--hash-verified-only \
--include-keep-file \
--output-file cleanup.sh
Generates a ready-to-run bash script with rclone delete commands for cloud storage cleanup.
PowerShell Script (Local Files)
filefortress find duplicates --non-interactive \
--export-format powershell \
--keep-strategy newest \
--hash-verified-only \
--output-file cleanup.ps1
Creates a PowerShell script with multiple deletion options (dry-run, with confirmation, or force).
Bash Script (Local Files)
filefortress find duplicates --non-interactive \
--export-format bash \
--keep-strategy oldest \
--hash-verified-only \
--output-file cleanup.sh
Generates a bash script with rm -i commands for interactive deletion on Linux/macOS systems.
Safety Best Practices
Protect your data
Always use --hash-verified-only
Only export hash-verified duplicates to ensure 100% accuracy
Review before deleting
Always manually review the exported list before executing deletions
Test with small batches
Start with a small subset to verify your process works correctly
Use --include-keep-file
Add comments showing which file is kept for easier verification
Backup before bulk deletion
Ensure you have backups before deleting large numbers of files
Example Workflows
Complete examples for common scenarios
Workflow 1: Safe Cleanup for Beginners
# Step 1: Find duplicates interactively
filefortress find duplicates
# Step 2: Export safe deletion list
filefortress find duplicates --non-interactive \
--export-format paths \
--keep-strategy oldest \
--hash-verified-only \
--include-keep-file \
--output-file safe-to-delete.txt
# Step 3: Review the file manually
notepad safe-to-delete.txt
# Step 4: Delete files (use your preferred method)
Workflow 2: Prioritize Primary Storage
# Keep files from your primary storage, delete from backups
filefortress find duplicates --non-interactive \
--export-format json \
--keep-strategy by-remote \
--keep-remote "Google Drive" \
--hash-verified-only \
--output-file cleanup-backups.json
Workflow 3: Preview Without Saving
# See what would be deleted without creating a file
filefortress find duplicates \
--export-format paths \
--keep-strategy newest \
--hash-verified-only \
--include-keep-file
Integration with External Tools
Using exported lists with other tools
FileFortress exports are designed to work with external deletion tools. Here are some common integrations:
Using with rclone (Cloud Storage)
# Read paths from file and delete with rclone
while IFS= read -r file; do
[[ "$file" =~ ^# ]] && continue # Skip comments
rclone delete "remote:$file"
done < safe-to-delete.txt
Using with PowerShell (Local Files)
# Read paths and delete with confirmation
Get-Content safe-to-delete.txt | Where-Object { $_ -notmatch '^#' } | ForEach-Object {
if (Test-Path $_) {
Remove-Item $_ -Confirm
}
}
Using JSON with Custom Scripts
The JSON export format provides structured data perfect for custom automation scripts:
# Python example
import json
with open('duplicates.json') as f:
data = json.load(f)
for group in data['groups']:
print(f"Keeping: {group['keepFile']['path']}")
for file in group['deleteFiles']:
print(f" Delete: {file['path']}")