Understanding File Enrichment in FileFortress
Unlock advanced search with comprehensive metadata

What You'll Learn
  • The difference between scanning and enriching files
  • What metadata is collected during enrichment
  • How enrichment enables advanced search capabilities
  • When and how to enrich your files
  • Filtering options for targeted enrichment
  • Best practices and performance tips

Overview

FileFortress provides two distinct operations for gathering file information from cloud storage: scanning and enriching. Understanding the difference between these operations is crucial for leveraging the full power of FileFortress's advanced search capabilities.

Key Insight

While remotes scan gives you a quick inventory of your files, remotes enrich unlocks the full potential of metadata-based searching, including EXIF data, file hashes, image dimensions, and provider-specific attributes.

Scan vs Enrich: What's the Difference?

Scanning (Basic Metadata)

The remotes scan command performs a quick listing of files from your cloud storage provider.

What You Get

File name and path
File size
Modified date
File type/extension
Basic folder structure

Characteristics

Fast operation
Minimal API calls
Low resource usage
filefortress remotes scan "My Drive"

Enriching (Comprehensive Metadata)

The remotes enrich command fetches detailed metadata from cloud providers.

What You Get (in addition to scan data)

EXIF data (camera, lens, GPS, settings)
File hashes (MD5, SHA1, SHA256)
Image dimensions (width, height, color space)
Video metadata (duration, codec, resolution)
Owner information (creator, last modifier)
Sharing details (permissions, shared status)
Provider-specific attributes

Characteristics

Slower operation
More API calls
Richer data for search
filefortress remotes enrich

Why Enrich Your Files?

Enrichment is essential for leveraging FileFortress's advanced search capabilities. Without enrichment, you're limited to searching by basic file attributes like name, size, and date.

What Enrichment Enables

1. Advanced Photo Searches

Find photos by camera model, lens, aperture, ISO, location, and more:

# Find all photos taken with a Canon camera at high ISO
filefortress search --meta "exif.cameraMake=Canon" --meta "exif.iso>3200"

# Find landscape photos taken in specific date range
filefortress search --meta "image.width>image.height" --modified-after 2024-01-01

2. Duplicate Detection

Identify duplicate files across different remotes using file hashes:

# Search for files with a specific hash
filefortress search --meta "hash.sha256=abc123..."

# Find potential duplicates by filtering enriched files
filefortress find duplicates --use-hash

3. Image Dimension Filtering

Find images by resolution, aspect ratio, or size:

# Find 4K images
filefortress search --meta "image.width>=3840" --meta "image.height>=2160"

# Find portrait-oriented photos
filefortress search --meta "image.height>image.width" --media-type Image

4. Ownership and Sharing Queries

Filter by file ownership and sharing status:

# Find files shared with specific email
filefortress search --meta "[email protected]"

# Find files you don't own
filefortress search --meta "[email protected]"

How to Enrich Files

Basic Enrichment

Enrich all files across all remotes:

filefortress remotes enrich

Targeted Enrichment

For large collections, you may want to enrich only specific files to save time and API quota. FileFortress provides extensive filtering options:

Filter Option Description Example
--remote Enrich specific remote --remote "My Google Drive"
--remote-type Filter by provider type --remote-type GoogleDrive
--extension Filter by file extension --extension .jpg --extension .png
--media-type Filter by media category --media-type Image
--size-min
--size-max
Filter by file size range --size-min 10MB --size-max 100MB
--modified-after Only recent files --modified-after 2024-01-01
--min-depth
--max-depth
Filter by folder depth --min-depth 2 --max-depth 5
--exclude Exclude patterns --exclude "*/temp/*"
--meta Filter by metadata --meta "exif.cameraMake=Canon"
--query-name Use saved query --query-name "my-photos"

Practical Examples

Example 1: Enrich All Photos

Enrich all image files to enable EXIF-based searches:

filefortress remotes enrich --media-type Image

This fetches EXIF data, dimensions, and color space for all photos.

Example 2: Enrich Recent Large Files

Enrich recently modified large files to get hash information for duplicate detection:

filefortress remotes enrich --size-min 50MB --modified-after 2024-01-01

Useful for identifying duplicate large files like videos or archives.

Example 3: Enrich Specific File Types

Enrich only JPEG and RAW photo files:

filefortress remotes enrich --extension .jpg --extension .jpeg --extension .cr2 --extension .nef

Perfect for photographers who want to search their RAW and JPEG collections by camera settings.

Example 4: Enrich Using a Saved Query

First, save a query for your workflow:

filefortress search "vacation photos" --extension .jpg --size-min 1MB --save-query "vacation-photos"

Then enrich files matching that query:

filefortress remotes enrich --query-name "vacation-photos"

Queries allow you to define complex filters once and reuse them.

Example 5: Enrich Specific Remote Only

Enrich files from a single cloud provider:

filefortress remotes enrich --remote "My Google Drive"

Useful when you've just added a new remote or want to focus on one provider.

Example 6: Incremental Enrichment

Enrich files modified in the last 7 days:

filefortress remotes enrich --modified-after (Get-Date).AddDays(-7).ToString("yyyy-MM-dd")

Great for regular enrichment runs to keep metadata up-to-date.

Recommended Workflow

Initial Setup: Scan First

When you first add a remote, perform a scan to get a quick inventory:

filefortress remotes scan "My Google Drive"
Targeted Enrichment

Enrich only the files you need for your use case. For example, if you primarily search photos:

filefortress remotes enrich --media-type Image
Regular Updates

Periodically re-scan to detect new files, then enrich recent additions:

filefortress remotes scan "My Google Drive"
filefortress remotes enrich --modified-after 2024-11-01
Use Saved Queries

Define common filter patterns as saved queries for easier enrichment:

filefortress search --extension .jpg --size-min 1MB --save-query "photos"
filefortress remotes enrich --query-name "photos"

Performance and Best Practices

Tips for Efficient Enrichment
Use Filters to Limit Scope
Don't enrich everything at once. Use --media-type, --extension, or --size-min to target specific files.
Enrich During Off-Peak Hours
For large collections, run enrichment overnight or during times when you don't need immediate access.
Incremental Enrichment
Use --modified-after to enrich only recently changed files rather than re-enriching everything.
Be Mindful of API Quotas
Cloud providers have API rate limits. FileFortress respects these, but enriching millions of files may take time.
Already Enriched Files are Skipped
FileFortress automatically skips files that already have metadata, so running enrich multiple times is safe.
Save Common Filters as Queries
Create reusable queries for your most common enrichment patterns using --query-name.

Provider-Specific Metadata

Different cloud storage providers offer different types of metadata. FileFortress normalizes this data where possible but also preserves provider-specific attributes.

Google Drive

  • Owner information
  • Sharing permissions
  • File hashes (MD5)
  • EXIF data for photos
  • Video metadata
  • Document properties

OneDrive

  • Owner details
  • Sharing information
  • File hashes (SHA1, QuickXor)
  • Photo metadata
  • Location information
  • Image dimensions

Amazon S3

  • ETag (MD5-based)
  • Storage class
  • Object metadata tags
  • Custom user metadata
  • Encryption details
  • Versioning info

Backblaze B2

  • Content SHA1
  • File info metadata
  • Custom headers
  • Upload timestamps
  • File actions history

Understanding Enrichment Progress

When you run remotes enrich, FileFortress provides detailed progress information:

Progress Display

Total Files Detected
Shows how many files match your filters and will be processed
Successfully Enriched
Files that received new metadata from the cloud provider
Skipped Files
Files already enriched or with no additional metadata available

It's normal to see files skipped during enrichment. This typically means they were already enriched in a previous run or the provider doesn't have additional metadata for those particular files.

Troubleshooting

No additional metadata found for my photos

Not all cloud providers store EXIF data. Google Drive and OneDrive typically preserve EXIF information for photos, while S3 and Backblaze B2 only store what you explicitly upload. If you uploaded photos without EXIF data, enrichment won't find any.

Solution: Ensure photos were uploaded with their original EXIF data intact. Check if your upload tool preserves metadata.

Enrichment is very slow

Enrichment requires individual API calls for each file, which can be slow for large collections. Cloud providers also impose rate limits that FileFortress respects.

Solution: Use filters to enrich only necessary files. Consider running enrichment in the background or during off-peak hours.

All files are being skipped

If all files show as "skipped (already enriched)", this means they've been enriched in a previous run.

Solution: This is expected behavior. FileFortress remembers enriched files to avoid redundant API calls. Only new or modified files need re-enrichment.

Can't find files by metadata even after enrichment

Verify that the metadata exists and you're using the correct field names in your search.

Solution: Use filefortress ls --detailed to inspect a specific file and see what metadata was actually enriched.

Related Resources

Metadata Filtering
Complete guide to filtering files by metadata
Search Syntax
Master search operators and patterns
Remotes Command
Complete remotes command reference