Understanding File Enrichment in FileFortress
Unlock advanced search with comprehensive metadata
Overview
FileFortress provides two distinct operations for gathering file information from cloud storage: scanning and enriching. Understanding the difference between these operations is crucial for leveraging the full power of FileFortress's advanced search capabilities.
Scan vs Enrich: What's the Difference?
Scanning (Basic Metadata)
The remotes scan command performs a quick listing of files from your cloud storage provider.
What You Get
Characteristics
filefortress remotes scan "My Drive"
Enriching (Comprehensive Metadata)
The remotes enrich command fetches detailed metadata from cloud providers.
What You Get (in addition to scan data)
Characteristics
filefortress remotes enrich
Why Enrich Your Files?
Enrichment is essential for leveraging FileFortress's advanced search capabilities. Without enrichment, you're limited to searching by basic file attributes like name, size, and date.
What Enrichment Enables
1. Advanced Photo Searches
Find photos by camera model, lens, aperture, ISO, location, and more:
# Find all photos taken with a Canon camera at high ISO
filefortress search --meta "exif.cameraMake=Canon" --meta "exif.iso>3200"
# Find landscape photos taken in specific date range
filefortress search --meta "image.width>image.height" --modified-after 2024-01-01
2. Duplicate Detection
Identify duplicate files across different remotes using file hashes:
# Search for files with a specific hash
filefortress search --meta "hash.sha256=abc123..."
# Find potential duplicates by filtering enriched files
filefortress find duplicates --use-hash
3. Image Dimension Filtering
Find images by resolution, aspect ratio, or size:
# Find 4K images
filefortress search --meta "image.width>=3840" --meta "image.height>=2160"
# Find portrait-oriented photos
filefortress search --meta "image.height>image.width" --media-type Image
4. Ownership and Sharing Queries
Filter by file ownership and sharing status:
# Find files shared with specific email
filefortress search --meta "[email protected]"
# Find files you don't own
filefortress search --meta "[email protected]"
How to Enrich Files
Basic Enrichment
Enrich all files across all remotes:
filefortress remotes enrich
Targeted Enrichment
For large collections, you may want to enrich only specific files to save time and API quota. FileFortress provides extensive filtering options:
| Filter Option | Description | Example |
|---|---|---|
--remote |
Enrich specific remote | --remote "My Google Drive" |
--remote-type |
Filter by provider type | --remote-type GoogleDrive |
--extension |
Filter by file extension | --extension .jpg --extension .png |
--media-type |
Filter by media category | --media-type Image |
--size-min--size-max |
Filter by file size range | --size-min 10MB --size-max 100MB |
--modified-after |
Only recent files | --modified-after 2024-01-01 |
--min-depth--max-depth |
Filter by folder depth | --min-depth 2 --max-depth 5 |
--exclude |
Exclude patterns | --exclude "*/temp/*" |
--meta |
Filter by metadata | --meta "exif.cameraMake=Canon" |
--query-name |
Use saved query | --query-name "my-photos" |
Practical Examples
Example 1: Enrich All Photos
Enrich all image files to enable EXIF-based searches:
filefortress remotes enrich --media-type Image
This fetches EXIF data, dimensions, and color space for all photos.
Example 2: Enrich Recent Large Files
Enrich recently modified large files to get hash information for duplicate detection:
filefortress remotes enrich --size-min 50MB --modified-after 2024-01-01
Useful for identifying duplicate large files like videos or archives.
Example 3: Enrich Specific File Types
Enrich only JPEG and RAW photo files:
filefortress remotes enrich --extension .jpg --extension .jpeg --extension .cr2 --extension .nef
Perfect for photographers who want to search their RAW and JPEG collections by camera settings.
Example 4: Enrich Using a Saved Query
First, save a query for your workflow:
filefortress search "vacation photos" --extension .jpg --size-min 1MB --save-query "vacation-photos"
Then enrich files matching that query:
filefortress remotes enrich --query-name "vacation-photos"
Queries allow you to define complex filters once and reuse them.
Example 5: Enrich Specific Remote Only
Enrich files from a single cloud provider:
filefortress remotes enrich --remote "My Google Drive"
Useful when you've just added a new remote or want to focus on one provider.
Example 6: Incremental Enrichment
Enrich files modified in the last 7 days:
filefortress remotes enrich --modified-after (Get-Date).AddDays(-7).ToString("yyyy-MM-dd")
Great for regular enrichment runs to keep metadata up-to-date.
Recommended Workflow
When you first add a remote, perform a scan to get a quick inventory:
filefortress remotes scan "My Google Drive"
Enrich only the files you need for your use case. For example, if you primarily search photos:
filefortress remotes enrich --media-type Image
Periodically re-scan to detect new files, then enrich recent additions:
filefortress remotes scan "My Google Drive"
filefortress remotes enrich --modified-after 2024-11-01
Define common filter patterns as saved queries for easier enrichment:
filefortress search --extension .jpg --size-min 1MB --save-query "photos"
filefortress remotes enrich --query-name "photos"
Performance and Best Practices
Provider-Specific Metadata
Different cloud storage providers offer different types of metadata. FileFortress normalizes this data where possible but also preserves provider-specific attributes.
Google Drive
- Owner information
- Sharing permissions
- File hashes (MD5)
- EXIF data for photos
- Video metadata
- Document properties
OneDrive
- Owner details
- Sharing information
- File hashes (SHA1, QuickXor)
- Photo metadata
- Location information
- Image dimensions
Amazon S3
- ETag (MD5-based)
- Storage class
- Object metadata tags
- Custom user metadata
- Encryption details
- Versioning info
Backblaze B2
- Content SHA1
- File info metadata
- Custom headers
- Upload timestamps
- File actions history
Understanding Enrichment Progress
When you run remotes enrich, FileFortress provides detailed progress information:
Progress Display
Troubleshooting
Not all cloud providers store EXIF data. Google Drive and OneDrive typically preserve EXIF information for photos, while S3 and Backblaze B2 only store what you explicitly upload. If you uploaded photos without EXIF data, enrichment won't find any.
Solution: Ensure photos were uploaded with their original EXIF data intact. Check if your upload tool preserves metadata.
Enrichment requires individual API calls for each file, which can be slow for large collections. Cloud providers also impose rate limits that FileFortress respects.
Solution: Use filters to enrich only necessary files. Consider running enrichment in the background or during off-peak hours.
If all files show as "skipped (already enriched)", this means they've been enriched in a previous run.
Solution: This is expected behavior. FileFortress remembers enriched files to avoid redundant API calls. Only new or modified files need re-enrichment.
Verify that the metadata exists and you're using the correct field names in your search.
Solution: Use filefortress ls --detailed to inspect a specific file and see what metadata was actually enriched.