The 8GB You're Paying to Store Twice: Finding Duplicates Across Clouds

7 min read Duplicates Cost Savings

That 2GB video of last summer's trip is in your Google Drive. It's also in Backblaze, because you backed up your laptop. And it's sitting in an old S3 bucket you spun up two years ago and never cleaned out. Same file. Three bills.

Multiply that across thousands of photos, exports, and archives, and the math gets ugly fast. You're not paying to store your data twice - you're paying to store the same bytes three, four, even five times, and every provider sends you a separate invoice.

The Problem Nobody Bills You For Solving

No single cloud provider can see your other clouds. Google Drive has no idea the same file exists in Backblaze. S3 can't check OneDrive. So the duplicates pile up silently, and you keep paying for them month after month.

Why Cross-Cloud Dedup Is So Hard

Plenty of tools can find duplicate files within a single account. Google has its own storage management view. Backblaze shows you what's in your bucket. But every one of them is blind to the others.

Each provider only sees itself. That's the entire problem in one sentence. A file in S3 and the identical file in OneDrive are, as far as either provider is concerned, completely unrelated. There's no shared catalog, no common identifier, no cross-provider "find my duplicates" button - because the providers compete with each other and have zero incentive to build one.

  • No shared metadata: each cloud reports names, sizes, and dates in its own way, so even comparing them by hand is tedious.
  • No common hash: some providers expose an MD5 or ETag, others don't, and they rarely agree on the algorithm.
  • No single view: to compare clouds, something has to gather everything into one place first.

That last point is the key. To find duplicate files across cloud storage, you first need one place that knows about all of them at once.

One Local Index Across Every Remote

FileFortress solves this by building a single, local index of every file across every remote you've connected - Google Drive, OneDrive, AWS S3, Backblaze B2, and local storage. When you scan your remotes, the file metadata lands in one encrypted database on your own machine.

Because that index spans all your clouds at once, duplicate detection becomes a query rather than a manual cross-referencing nightmare. The same trip video in three providers shows up as a single duplicate group, regardless of which clouds it's scattered across.

Truly Cross-Cloud

The duplicate finder works across all your configured remotes at the same time. A copy in Google Drive and a copy in an old S3 bucket land in the same group - exactly the duplicates no single provider could ever show you.

And because the index is built and stored locally, the comparison happens on your device. Your file metadata isn't shipped off to a third-party service to be deduplicated - it stays with you.

Two Ways to Match: Fast vs. Guaranteed

Not all "duplicates" are created equal. Two files with the same name and size are probably the same - but "probably" isn't good enough when you're about to free up space. FileFortress gives you two detection modes so you can pick your confidence level.

Name + Size (Fast)

  • Groups files by matching name and size
  • Instant - no hashing required
  • Heuristic: a strong hint, not proof
  • Great for a first-pass survey

Hash Verification (Guaranteed)

  • Compares MD5 / SHA256 checksums
  • Byte-for-byte certainty
  • Hashes from provider metadata or local hashing
  • The safe choice before deleting

Where do the hashes come from? Some providers expose them in their metadata. When a provider doesn't, you can generate hashes yourself by running the FileHasher tool with filefortress tools run, which fills in the missing checksums in your local index.

Restrict to Certainty

Add --hash-verified-only to limit results to groups confirmed by matching checksums. You only act on duplicates that are provably identical - no guessing from a name collision.

Choosing Which Copy to Keep

Finding duplicates is only half the job. Once a group of identical files is identified, you have to decide which copy survives. FileFortress lets you set that policy with --keep-strategy instead of picking by hand, group by group.

# Keep the oldest copy of each duplicate (the default)
filefortress find duplicates --hash-verified-only --keep-strategy oldest

# Keep the newest copy instead
filefortress find duplicates --keep-strategy newest

# Always keep whatever lives on a specific remote
filefortress find duplicates --keep-strategy by-remote --keep-remote "My Google Drive"

The available strategies cover the decisions people actually make: oldest (the default), newest, first, smallest, largest, and by-remote. The by-remote strategy is the multi-cloud favorite - "keep everything on my primary provider, mark the redundant copies elsewhere" - and it requires --keep-remote so FileFortress knows which cloud wins.

Pick the Strategy on Purpose

A keep-strategy decides what stays and what gets proposed for removal. Think about retention before you choose: keeping the oldest copy preserves your earliest version, while by-remote consolidates everything onto the cloud you trust most.

Acting Safely: A Plan, Not a Delete Button

Here's the part that matters most: FileFortress never deletes your files. It detects duplicates and exports a removal plan you can review, edit, and run yourself. Nothing is touched in your clouds until you decide to act.

You choose the output with --export-format. Want a plain list to eyeball? Use paths or json. Want a ready-to-run cleanup? Export an rclone, powershell, or bash script and inspect every line before you run it.

# Export a hash-verified rclone removal plan to a file for review
filefortress find duplicates \
  --hash-verified-only \
  --keep-strategy oldest \
  --export-format rclone \
  --output-file remove-duplicates.txt

# Want the kept files annotated too? Add them as comments
filefortress find duplicates \
  --export-format bash \
  --include-keep-file \
  -o cleanup.sh

The script lists exactly which copies are marked for removal and, with --include-keep-file, which copy is being kept. You read it, you trust it, and then you - or rclone - perform the deletion. The tool that finds the problem is never the tool that pulls the trigger.

Review-First by Design

Because the output is a plan rather than an action, you get a checkpoint. Open the file, confirm the right copies are being removed, then run it on your own terms. No surprise deletions, no irreversible button.

Two Ways to Do It: GUI or CLI

You don't have to live in the terminal to clean up your clouds. FileFortress gives you the same cross-cloud duplicate detection in two places, depending on how you like to work.

  • The Duplicates page in the desktop GUI lets you browse duplicate groups visually, see where each copy lives, and choose your keep-strategy and export format with a few clicks.
  • The find duplicates command gives you the full power of the multi-cloud duplicate finder for scripting, scheduling, and automation - the same flags shown above, ready to drop into your workflow.

Either way, the engine underneath is identical: one local index, two detection modes, your choice of which copy to keep, and a removal plan you control. The savings show up the moment you stop paying for the same bytes in three places.

Learn More

Stop Paying to Store the Same File Twice

Build one index across every cloud, find the duplicates no single provider can see, and export a removal plan you control.