From Detection to Deletion: Safely Removing Duplicate Files

What You'll Learn

How to choose the right keep-strategy for each set of duplicates
Why FileFortress exports a plan instead of deleting files for you
Exporting removal plans as paths, JSON, rclone, PowerShell, or Bash
Executing a cleanup safely with a review-first workflow
Verifying results and re-scanning to confirm the space is reclaimed

You have already scanned your remotes and confirmed that duplicates exist. Now comes the part that actually frees up space: deciding which copy of each file to keep, producing a removal plan, and executing it without putting a single original file at risk. This guide focuses on exactly that — the keep-strategy decision, the export, and safe execution.

If you are still unsure how duplicates are found and grouped, start with the conceptual Duplicate Management workflow guide, then return here to act on what you found. That guide covers the detection mechanics in depth; this one picks up at the decision point and carries you all the way through to verified, reclaimed storage.

FileFortress Never Deletes Your Files

This is the most important thing to understand before you begin. FileFortress detects duplicates and exports a removal plan — a list of paths or a ready-to-run script. The actual deletion is performed by you, or by a tool you trust such as rclone. Nothing is removed until you run that step yourself, which makes the entire process review-first and reversible up to the moment you act.

Prerequisites

Before exporting a cleanup plan, make sure your local index is current and trustworthy. A removal plan is only as accurate as the scan it was built from.

A completed scan of every remote you want to clean. Duplicate detection works across all configured remotes, so an out-of-date remote can hide or invent matches.
Hashes available, if you want guaranteed matches. Hash verification needs file hashes, which come either from provider metadata or from local hashing via the tools run command (the built-in FileHasher tool).
rclone or a shell, for execution. The plan is data; you need a way to act on it. rclone is the cleanest cross-cloud option, but a PowerShell or Bash script works too.

FileFortress supports Google Drive, OneDrive, AWS S3, Backblaze B2, and Local storage, and duplicate detection spans all of them at once — so a single plan can de-duplicate files that are scattered across multiple clouds.

1. Detect: Name & Size vs. Hash

Detection produces two kinds of groups, and the distinction matters enormously once you start deleting. Name & Size matching is a fast heuristic: two files with the same name and the same byte count are probably identical. Hash verification compares an MD5 or SHA256 checksum and tells you, with certainty, that the bytes are the same.

# Find duplicates across all configured remotes
filefortress find duplicates

# Restrict to hash-confirmed groups only (safest)
filefortress find duplicates --hash-verified-only

The summary separates the two: a count of Name & Size groups, a count of hash-verified groups, a guaranteed space-savings figure for the hash groups, and a larger potential-savings figure that includes the heuristic matches. When you are about to delete, prefer the guaranteed number — it is the one you can stand behind.

Hashes do not appear by magic. They come from one of two sources: provider metadata, when a cloud storage API exposes a checksum for the file, or local hashing, when FileFortress downloads a file and computes the hash itself through the FileHasher tool. If --hash-verified-only returns fewer groups than you expected, it usually means many of your files simply do not have hashes yet. Running local hashing first widens the set of guaranteed matches and, in turn, the amount of space you can safely reclaim. The enrichment guide explains exactly where each kind of hash originates.

Confidence: Hash vs. Name-Only

Two files can share a name and size and still differ in content — think of report.pdf at exactly 2 MB in two different folders. Deleting based on Name & Size alone is convenient but carries real risk. When the files matter, generate hashes first and add --hash-verified-only so the plan contains only matches that are guaranteed identical.

2. Review the Groups

Before you commit to a strategy, look at the actual groups. There are two convenient ways to do this. In the desktop GUI, open the Duplicates page to browse groups visually, expand each one, and see exactly which remote every copy lives on. On the command line, the interactive explorer walks you through the same data.

# Default summary view
filefortress find duplicates --view summary

# Launch the interactive explorer to browse groups and files
filefortress find duplicates

The interactive explorer lets you page through Name & Size groups, hash-verified groups, and a "top duplicate groups" list sorted by how much space each group would reclaim. Drilling into a group shows every copy with its remote, full path, size, and last-modified date — the exact attributes the keep-strategies act on. Spending a few minutes here pays off, because it is where you confirm that the groups represent files you genuinely want to de-duplicate.

The "top duplicate groups" list deserves special attention. Because it is sorted by reclaimable space, it surfaces the handful of groups that account for most of your wasted storage. A single 4 GB video duplicated three times reclaims far more than a thousand tiny text files, so reviewing the top of the list first lets you make the biggest gains with the least risk. It also tends to expose the groups where a careful keep-strategy choice matters most — large media files are exactly where keeping the wrong copy is most costly.

3. Choose a Keep-Strategy

For every duplicate group, FileFortress keeps exactly one file and marks the rest for removal. The --keep-strategy option decides which copy survives. The default is oldest, which is a sensible, conservative choice for most libraries.

Strategy	Keeps	When to Use It
`oldest`	The earliest-modified copy	The safe default. Preserves the original you first saved and treats later copies as redundant.
`newest`	The most recently modified copy	When the latest copy reflects the canonical version and older ones are stale leftovers.
`first`	The first copy encountered in the group	When you want a deterministic, predictable pick and modification dates are unreliable.
`smallest`	The copy with the fewest bytes	For Name & Size groups where you suspect a compressed or trimmed version is the keeper.
`largest`	The copy with the most bytes	When the bigger file is the higher-fidelity original (full-resolution photos, uncompressed exports).
`by-remote`	The copy on a remote you name	Consolidating onto one provider — keep everything on your primary cloud and prune the rest.

The by-remote strategy is special: it requires a companion --keep-remote value naming the remote whose copy should always survive. If you ask for by-remote without --keep-remote, or you name a remote that does not exist, the export stops with a clear error instead of guessing.

# Keep the newest copy in each group
filefortress find duplicates --export-format paths --keep-strategy newest

# Consolidate onto one provider: keep whatever lives on "My Drive"
filefortress find duplicates --export-format paths \
  --keep-strategy by-remote --keep-remote "My Drive"

4. Preview Before You Act

Never jump straight to a destructive script. Run the export and read the on-screen summary first — it reports how many groups were processed, how many files are marked for deletion, and how much space the plan will reclaim. Treat that summary as your dry run: if the file count or reclaimed-space figure looks wildly off, your scan, strategy, or detection mode needs another look before you execute anything.

# Preview the plan as a plain list of paths, printed to the screen
filefortress find duplicates --export-format paths --hash-verified-only

# Read the summary line: groups processed, files to delete, space to reclaim

When you export without an output file, the plan is written to the console so you can eyeball it. Scan the list for anything that looks like a unique original rather than a redundant copy. Only once the preview matches your intent should you save it to a file and move toward execution.

5. Export a Removal Plan

The --export-format option turns the duplicates into an artifact you can act on. There are five formats, each suited to a different downstream workflow.

Format	What You Get
`paths`	A plain list of the files marked for removal — ideal for review or feeding into your own tooling.
`json`	Structured data with group and summary detail — best for auditing or programmatic processing.
`rclone`	rclone delete commands — the cleanest way to remove files directly from cloud remotes.
`powershell`	A PowerShell script — natural on Windows or for Local storage cleanup.
`bash`	A Bash script — natural on macOS and Linux.

Use --output-file (or its short form -o) to write the plan to disk instead of the console. When you save to a file, FileFortress also prints a summary of groups processed, files to delete, and space to reclaim, so you get a confirmation receipt for the artifact.

# Export an rclone plan for hash-verified duplicates, keeping the oldest
filefortress find duplicates \
  --export-format rclone \
  --hash-verified-only \
  --keep-strategy oldest \
  -o cleanup.sh

# Export a JSON audit record of the same decision
filefortress find duplicates \
  --export-format json \
  --hash-verified-only \
  --output-file duplicates-plan.json

If you want the kept file recorded alongside the ones being removed, add --include-keep-file. The kept file is written as a comment, so it never becomes a delete target — it is there purely so you can see, for each group, which copy survived. That makes a script far easier to audit before you run it.

6. Execute Safely

You now hold a plan, not a fait accompli. The execution step is where the deletion actually happens, and it is entirely in your hands. The golden rule is review first, run second. Because FileFortress hands off a static artifact rather than deleting anything itself, there is a clean boundary between detection and destruction: everything up to this point is reversible simply by choosing not to run the plan.

Using rclone

For cloud remotes, the rclone export gives you delete commands you can read line by line. Open the file, confirm the targets are the redundant copies you expected, and then run it. rclone performs the deletion against the provider — FileFortress is no longer involved at this point.

# 1. Read the plan end-to-end before doing anything
cat cleanup.sh

# 2. Run it only after you are satisfied with every target
bash cleanup.sh

Using a Shell Script

The PowerShell and Bash exports are ordinary scripts. Review them the same way, then execute with your shell. Because they are plain text, you can also delete or comment out any individual line you are unsure about, turning the generated plan into a curated one.

Mind the --include-keep-file Output

When you export with --include-keep-file, the kept copy appears in the output as a comment. Do not uncomment those lines or convert them into delete commands while editing — doing so would target the very file you intended to preserve. The comment is a reference, not a to-do.

Whichever path you take, consider running against one group or one remote first as a small-scale trial. A successful trial deletion builds confidence before you turn the full plan loose on thousands of files.

7. Verify and Re-scan

Cleanup is not finished when the script exits — it is finished when a fresh scan confirms the duplicates are gone. Re-scan the affected remotes so your local index reflects the deletions, then run detection again to check the numbers moved in the direction you expected.

# Re-scan your remotes so the index reflects the deletions
filefortress remotes scan

# Confirm the duplicate count and reclaimable space have dropped
filefortress find duplicates --hash-verified-only

The guaranteed space-savings figure should fall, and the groups you cleaned should no longer appear. If a group lingers, the deletion may not have completed on that provider, or the remote may not have been re-scanned yet — re-run the scan for that remote and check again.

Keep a Record

Export a json plan alongside your executable script and keep it. It is a precise audit trail of which files were removed, which copy was kept in each group, and how much space the operation reclaimed — invaluable if anyone later asks what happened to a file.

Putting It All Together

Here is the full sequence for a cautious, repeatable cleanup. It assumes you have already scanned your remotes and want guaranteed results. Each step maps directly to a section above, so you can dip back into the detail wherever you need it.

# 1. Make sure hashes exist for guaranteed matching
filefortress tools run --remote "My Drive"

# 2. Look at what detection found
filefortress find duplicates --hash-verified-only

# 3. Preview the plan on screen before saving anything
filefortress find duplicates --export-format paths \
  --hash-verified-only --keep-strategy oldest

# 4. Save an executable rclone plan and a JSON audit record
filefortress find duplicates --export-format rclone \
  --hash-verified-only --keep-strategy oldest \
  --include-keep-file -o cleanup.sh
filefortress find duplicates --export-format json \
  --hash-verified-only --keep-strategy oldest \
  -o duplicates-plan.json

# 5. Read the script, then execute it
cat cleanup.sh
bash cleanup.sh

# 6. Re-scan and confirm the duplicates are gone
filefortress remotes scan
filefortress find duplicates --hash-verified-only

This sequence is deliberately conservative: it hashes first, restricts to guaranteed matches, previews before saving, keeps an audit record, reads the script before running it, and re-scans to verify. Once you trust the workflow, you can streamline it — but starting strict is how you learn what each strategy and format does to your data without risking anything.

Best Practices

Lead with hash verification. For anything you would be upset to lose, generate hashes and use --hash-verified-only so the plan contains only guaranteed-identical copies.
Match the strategy to the group. Photos and exports often favor largest; consolidation favors by-remote; archival libraries favor the default oldest.
Always read the plan before running it. Exporting to a file and inspecting it costs seconds and prevents irreversible mistakes.
Trial on a subset first. Run the cleanup against one remote or one group before unleashing the full plan.
Re-scan to close the loop. A cleanup is only verified once a fresh scan shows the duplicates gone.

Common Pitfalls

Deleting From a Stale Index

If files changed on a remote after your last scan, the plan can reference paths that have moved or matches that no longer hold. Always export from a current scan, and if a cleanup spans days, re-scan before generating the final plan.

A few other traps are worth naming. Forgetting --keep-remote with the by-remote strategy stops the export — supply the remote name. Choosing smallest when the larger file is the true original will quietly discard quality. And running a plan you have not read is how unique files get lost; the review-first habit is the single most reliable safeguard in this entire workflow.

Related Guides

Duplicate Management Workflow — how detection finds and groups duplicates across your remotes
find Command Reference — full syntax and options for find duplicates
Local Tools Guide — generate file hashes locally for guaranteed matches
Enrichment Guide — where provider and local hashes come from
remotes Command Reference — scan your remotes before and after cleanup