This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Technical Deep Dives

Technical articles about RawCull’s implementation, architecture, and advanced concepts.

Technical Deep Dive: Thumbnails, Memory Cache, and Evictions

Thumbnails, Memory Cache & Evictions

Overview

RawCull processes Sony ARW (Alpha Raw) image files through two mechanisms:

  1. Thumbnail Generation: Creates optimized 2048×1372 thumbnails for the culling UI
  2. Embedded Preview Extraction: Extracts full-resolution JPEG previews from ARW metadata for detailed inspection

Both systems integrate with a hierarchical two-tier caching architecture (RAM → Disk) to minimize repeated file processing. The system has been refactored to maximise memory utilisation and minimise unnecessary evictions.


Thumbnail Specifications

Standard Dimensions

All thumbnails are created at a fixed size to ensure consistent performance and caching:

PropertyValue
Width2048 pixels
Height1372 pixels
Aspect Ratio~1.49:1 (rectangular)
Color SpaceRGBA
Cost Per Pixel6 bytes (configurable 4–8)
Memory Per Thumbnail16.86 MB base + ~10% overhead = ~19.4 MB

Why 2048×1372?

Original ARW dimensions: 6000×4000 pixels (typical Sony Alpha)
                            ↓
            Downsampled by factor of ~3x
                            ↓
        2048×1372 thumbnails
                            ↓
    Perfect balance:
    - Large enough for detail recognition
    - Small enough for reasonable memory footprint
    - Maintains original aspect ratio

ARW File Format

Structure

Sony ARW files are TIFF-based containers with multiple embedded images:

ARW File (TIFF-based)
├── Index 0: Small thumbnail (≤256×256px)
├── Index 1: Preview JPEG (variable resolution)
├── Index 2: Maker Notes & EXIF Data
└── Index 3+: Raw Sensor Data

Image Discovery

The extraction system uses CGImageSource to enumerate all images:

let imageCount = CGImageSourceGetCount(imageSource)

for index in 0 ..< imageCount {
    let properties = CGImageSourceCopyPropertiesAtIndex(imageSource, index, nil)
    let width = getWidth(from: properties)
    let isJPEG = detectJPEGFormat(properties)
}

JPEG Detection

Identifies JPEG payloads using two markers:

  1. JFIF Dictionary: Presence of kCGImagePropertyJFIFDictionary
  2. TIFF Compression Tag: Compression value of 6 (TIFF 6.0 JPEG)
let hasJFIF = (properties[kCGImagePropertyJFIFDictionary] as? [CFString: Any]) != nil
let compression = tiffDict?[kCGImagePropertyTIFFCompression] as? Int
let isJPEG = hasJFIF || (compression == 6)

Dimension Extraction

Retrieves image dimensions from multiple sources in priority order:

1. Root Properties: kCGImagePropertyPixelWidth
2. EXIF Dictionary: kCGImagePropertyExifPixelXDimension
3. TIFF Dictionary: kCGImagePropertyTIFFImageWidth
4. Fallback: Return nil if none available

Thumbnail Creation Pipeline

Source File Processing

When a user opens a RawCull project with ARW files:

ARW File (10-30 MB)
    ↓
[RAW Decoder]
    - Load raw sensor data
    - Apply Bayer demosaicing
    - Color correction
    ↓
Full Resolution Image (RGB, 3 bytes/pixel)
    ↓
[Resize Engine]
    - Maintain aspect ratio
    - Bilinear or lanczos filtering
    ↓
2048 × 1372 RGB Thumbnail
    - 16.86 MB uncompressed
    - 6 bytes/pixel (including alpha)

Extraction Process

private nonisolated func extractSonyThumbnail(
    from url: URL,
    maxDimension: CGFloat,  // 2048 for standard size
    qualityCost: Int = 6     // Configurable 4-8 bytes/pixel
) async throws -> CGImage

Phase 1: Image Source Creation

let options = [kCGImageSourceShouldCache: false] as CFDictionary
guard let source = CGImageSourceCreateWithURL(url as CFURL, options) else {
    throw ThumbnailError.invalidSource
}
  • Opens ARW file via ImageIO
  • kCGImageSourceShouldCache: false prevents intermediate caching

Phase 2: Thumbnail Generation

let thumbOptions: [CFString: Any] = [
    kCGImageSourceCreateThumbnailFromImageAlways: true,
    kCGImageSourceCreateThumbnailWithTransform: true,
    kCGImageSourceThumbnailMaxPixelSize: maxDimension,
    kCGImageSourceShouldCacheImmediately: false
]

guard var image = CGImageSourceCreateThumbnailAtIndex(
    source, 0, thumbOptions as CFDictionary
) else {
    throw ThumbnailError.generationFailed
}
OptionValuePurpose
kCGImageSourceCreateThumbnailFromImageAlwaystrueAlways create, even if embedded exists
kCGImageSourceCreateThumbnailWithTransformtrueApply EXIF orientation
kCGImageSourceThumbnailMaxPixelSize2048Constrains to 2048×1372
kCGImageSourceShouldCacheImmediatelyfalseWe manage caching

Phase 3: Quality Enhancement (Optional)

If costPerPixel ≠ 6, the image is re-rendered with appropriate interpolation:

let qualityMapping: [Int: CGInterpolationQuality] = [
    4: .low,
    5: .low,
    6: .medium,   // Default, balanced
    7: .high,
    8: .high
]

Phase 4: Return Thread-Safe Image

return image  // CGImage is Sendable, safe for actor boundary

CGImage is returned (not NSImage) because it is Sendable and can cross actor boundaries safely.

Phase 5: Storage (in Actor Context)

let nsImage = NSImage(cgImage: image, size: NSSize(...))
storeInMemoryCache(nsImage, for: url)  // RAM cache immediately

Task.detached(priority: .background) { [cgImage] in
    await self.diskCache.save(cgImage, for: url)
}

System Architecture: Two-Tier Cache

Cache Tiers

┌─────────────────────────────────────────────┐
│          Thumbnail Requested                │
└────────────────┬────────────────────────────┘
                 │
                 ▼
        ┌────────────────────┐
        │  Memory Cache?     │
        │  (NSCache)         │
        └────────┬───────────┘
                 │
       ┌─────────┴──────────┐
       │ HIT (70.2%)        │ MISS (29.8%)
       ▼                    ▼
    Return from       Disk Cache?
    Memory            (FileSystem)
                           │
                    ┌──────┴──────┐
                    │ HIT          │ MISS
                    │ (29.8%)      │
                    ▼              ▼
                 Read from     Decompress
                 Disk, Add     Original ARW,
                 to Memory     Create Thumbnail

    Performance: ~instant    ~instant      ~100-500ms
                 (in-memory)  (disk I/O)    (CPU-bound)

Tier 1: RAM Cache (NSCache)

Managed by SharedMemoryCache actor with dynamic configuration:

let memoryCache = NSCache<NSURL, DiscardableThumbnail>()
memoryCache.totalCostLimit = dynamicLimit  // Based on system RAM
memoryCache.countLimit = 10_000             // High; memory is limiting factor

Characteristics:

  • LRU Eviction: Least-recently-used thumbnails removed when cost limit exceeded
  • Protocol: Implements NSDiscardableContent for OS-level memory reclamation
  • Thread-Safe: Built-in synchronization by NSCache
  • Cost-Aware: Respects pixel memory, not item count
  • Hit Rate: 70.2% (observed in typical workflows)

Tier 2: Disk Cache

// Location: ~/.RawCull/thumbcache/[projectID]/
// Format: JPEG compressed at 0.7 quality
// Size: 3-5 MB per thumbnail (82-91% compression)

Characteristics:

  • Hit Rate: 29.8% (complements memory cache)
  • Latency: 50-200 ms (disk I/O + decompression)
  • Persistence: Survives app restart
  • Automatic Promotion: Disk hits loaded to memory for next access

Disk cache representation formats:

FormatSizeAdvantages
PNG3-5 MBLossless, fast decode
HEIF2-4 MBBetter compression, hardware acceleration
JPEG1-2 MBFastest, good for fast browsing

Storage location: ~/.RawCull/thumbcache/[projectID]/


Memory Cache Policy

Cost is calculated per cached image as:

$$\text{Cost} = (\text{width in pixels}) \times (\text{height in pixels}) \times \text{bytes per pixel} \times 1.1$$

Where:

  • Pixel dimensions: Actual pixel size from image.representations or logical image size fallback
  • Bytes per pixel: Default is 4 (RGBA: Red, Green, Blue, Alpha), but configured to 6 in this case
  • 1.1 multiplier: 10% overhead buffer for NSImage wrapper and caching metadata

With 2048×1372 thumbnail size and 6 bytes/pixel:

$$\text{Cost per image} = 2048 \times 1372 \times 6 \text{ bytes/pixel} \times 1.1$$

$$= 4,194,304 \times 6 \times 1.1 = 19.4 \text{ MB}$$

Count Limit Calculation

Count limit is set now to fixed 10,000 as cap, but it is controlled by maxium memory allocated for app. Max memory allocated is 10,000 MB (10 GB).

$$\text{Count limit} = \frac{\text{Total RAM Cache}}{\text{Cost per image}}$$

$$= \frac{10000 \text{ MB}}{19.4 \text{ MB}} \approx 515 \text{ images}$$

Allocation Strategy

Available System Memory Detection

let physicalMemory = ProcessInfo.processInfo.physicalMemory
let memoryThresholdPercent = 80  // 80% of available RAM
let maxCacheSize = (physicalMemory * memoryThresholdPercent) / 100

// Example Results:
// 8 GB Mac:  6.4 GB available for cache
// 16 GB Mac: 12.8 GB available for cache
// 32 GB Mac: 25.6 GB available for cache

User Configuration

SettingDefaultRangeImpact
Allocated MemoryAuto (80% RAM)500 MB - 25 GBControls total cache capacity
Cost Per Pixel6 bytes4-8 bytesQuality/Memory tradeoff

Capacity Planning

Allocated Memory: 10,000 MB
Per-Thumbnail Cost: 19.4 MB

Maximum Thumbnails = 10,000 MB ÷ 19.4 MB per thumbnail
                   = ~515 thumbnails

In Practice (observed): 571 thumbnails
Reason: NSCache's cost calculation accounts for various
        representation formats, slightly improving efficiency
System80% ThresholdUser SettingThumbnailsTypical Workload
8 GB Mac6.4 GB5 GB~257Light editing
16 GB Mac12.8 GB10 GB~515Production
32 GB Mac25.6 GB16 GB~824Professional

Cost Calculation

// For 2048×1372 thumbnail at 6 bytes/pixel:
Cost = 2048 × 1372 × 6 = 16,860,096 bytes
With 10% overhead:  19.4 MB per thumbnail

// Cost impacts:
// 4 bytes: Lower quality, more capacity (~645 thumbnails in 10 GB)
// 6 bytes: Balanced quality/capacity (~515 thumbnails)
// 8 bytes: Maximum quality, less capacity (~385 thumbnails)

Eviction Policy

NSCache LRU (Least Recently Used) Strategy

Cache Full → New Item Added
    ↓
[Eviction Engine]
    - Identify least recently used items
    - Remove oldest accessed thumbnails first
    - Continue until space available
    ↓
New Item Inserted

Memory Pressure Monitoring

Background Monitoring Loop (every 100ms):

if usage > 95% of allocation:
    → Aggressive eviction (trim 20%)
    → Log warning

if usage > 80% of allocation:
    → Normal eviction (trim 10% on next cache miss)

if usage < 50% of allocation:
    → No eviction
    → Cache can grow freely

Thresholds by System Configuration

Low Memory Mac (< 8 GB):
    Memory Threshold = 60%
    Default Cache = 3 GB
    Typical Items = ~155 thumbnails

Standard Mac (8-16 GB):
    Memory Threshold = 80%
    Default Cache = 6-10 GB
    Typical Items = ~300-515 thumbnails

High-End Mac (> 16 GB):
    Memory Threshold = 80%
    Default Cache = 12-25 GB
    Typical Items = ~600-1200 thumbnails

Eviction Analysis (Post-Refactor)

Test Parameters

ParameterValue
Total ARW Files618
Cost Per Pixel6 bytes
Thumbnail Size (Actual)2048 × 1372 pixels (rectangular)
Allocated Memory10,000 MB (10 GB)
Cache countLimit10,000 items (memory is the real constraint)

Key insight: The original analysis used 24 MB (from 2048²), but actual thumbnails are rectangular, giving ~19.4 MB per thumbnail.

Phase 1: Initial Thumbnail Scan

MetricBeforeAfterImprovement
Total Files Scanned618618
Images Evicted237 (38.3%)47 (7.6%)~405% better
Images Retained38157150% more cached
Memory Utilization2.4%100%Perfect fit
Actual cached thumbnails = 571 images at ~19 MB each
                         = ~10,849 MB
                         ≈ 10 GB utilization

Matches allocated memory perfectly!

Phase 2: Interactive Browse (Sequential Access)

MetricBeforeAfterImprovement
Memory Cache Hits23.5%70.2%3x better
Disk Cache Hits76.5%29.8%Shifted to memory
Evictions709 (115% of collection)231 (37% of collection)67% fewer

Why 231 evictions in Phase 2?

  • Capacity: 571 thumbnails
  • Browse order: 618 images
  • Items browsed beyond capacity: 47
  • LRU churn from sequential access: ~184 additional
  • Total: 47 + 184 = 231 evictions

Refactored Implementation

Changes Made

  1. Removed dimension guessing — Cache relies on NSCache’s actual cost calculations
  2. Set countLimit to 10,000 — High enough that memory is the only real constraint
  3. Memory threshold increased to 80% — Allows 10 GB allocations on 16 GB+ systems
  4. Diagnostic logging — Logs actual cache configuration at startup

Code Changes

File: SharedMemoryCache.swift

// BEFORE: Used estimated 2048×2048, calculated countLimit dynamically
let estimatedCostPerImage = (thumbnailSize * thumbnailSize * costPerPixel * 11) / 10
let countLimit = totalCostLimit / estimatedCostPerImage

// AFTER: Let NSCache calculate actual costs, countLimit is high
let countLimit = 10000  // Very high, memory (totalCostLimit) is real constraint

File: SettingsViewModel.swift

// BEFORE: Memory threshold was 50%
let memoryThresholdPercent = 50  // Restricted 10GB on 16GB Mac

// AFTER: Memory threshold is 80%
let memoryThresholdPercent = 80  // Allows 10GB on 16GB Mac

Embedded Preview Extraction

For detailed inspection, RawCull can extract full-resolution JPEG previews directly from ARW metadata, providing superior quality compared to generated thumbnails.

Selection Strategy

The system selects the widest JPEG from all images embedded in the ARW:

for index in 0 ..< imageCount {
    let properties = CGImageSourceCopyPropertiesAtIndex(imageSource, index, nil)
    if let width = getWidth(from: properties), isJPEG(properties) {
        if width > targetWidth {
            targetIndex = index
            targetWidth = width
        }
    }
}

Sony typically stores higher-quality previews at later indices, so the widest JPEG maximises quality.

Thumbnail vs. Full Preview

AspectThumbnailFull Preview
SourceGeneric ImageIO (may use embedded or generate)ARW embedded JPEG specifically
Quality ControlParameter-driven (cost per pixel)Full resolution preservation
DownsamplingAutomatic via CGImageSourceThumbnailMaxPixelSizeConditional, only if needed
Use CaseCulling grid, rapid browsingDetailed inspection, full-screen
PerformanceFast (200-500 ms)Medium (500 ms–2s with decode)

Downsampling Decision

let maxPreviewSize: CGFloat = fullSize ? 8640 : 4320

if CGFloat(embeddedJPEGWidth) > maxPreviewSize {
    // Downsample to reasonable size
} else {
    // Use original size (never upscale)
}
  • If embedded JPEG is larger than target: downsample to preserve memory
  • If embedded JPEG is smaller: preserve original (never upscale)
  • fullSize=true: 8640px threshold (professional workflows)
  • fullSize=false: 4320px threshold (balanced quality/performance)

Resizing Implementation

private func resizeImage(_ image: CGImage, maxPixelSize: CGFloat) -> CGImage? {
    let scale = min(maxPixelSize / CGFloat(image.width), maxPixelSize / CGFloat(image.height))
    guard scale < 1.0 else { return image }  // Already smaller

    // Draw into new context with .high interpolation
    context.interpolationQuality = .high
    context.draw(image, in: CGRect(x: 0, y: 0, width: newWidth, height: newHeight))
    return context.makeImage()
}

JPEG Export

@concurrent
nonisolated func save(image: CGImage, originalURL: URL) async {
    // Saves alongside original ARW as .jpg at maximum quality (1.0)
    let options: [CFString: Any] = [
        kCGImageDestinationLossyCompressionQuality: 1.0
    ]
}

Thumbnail Generation System: Preload Workflow

Architecture Overview

The ScanAndCreateThumbnails actor manages complete thumbnail lifecycle:

File Processing Pipeline
    ↓
[RAM Cache Check (NSCache)]
    ├─ HIT (70.2%): Return from memory
    └─ MISS (29.8%): Continue
         ↓
    [Disk Cache Check]
    ├─ HIT: Load from disk, promote to RAM
    └─ MISS: Continue
         ↓
    [Extract from ARW]
    ├─ Open ARW file
    ��─ Extract or generate thumbnail
    ├─ Store in RAM Cache
    └─ Save to disk asynchronously

Performance: ~instant (~1ms) → disk (~100ms) → extraction (~200-500ms)

Preload Workflow

func preloadCatalog(at catalogURL: URL, targetSize: Int) async -> Int

Step 1: File Discovery

let urls = await DiscoverFiles().discoverFiles(at: catalogURL, recursive: false)

Step 2: Concurrent Processing with Smart Throttling

let maxConcurrent = ProcessInfo.processInfo.activeProcessorCount * 2

for (index, url) in urls.enumerated() {
    if Task.isCancelled { break }
    if index >= maxConcurrent {
        try? await group.next()  // Sliding window throttle
    }
    group.addTask {
        await self.processSingleFile(url, targetSize: targetSize, itemIndex: index)
    }
}
  • Spawns up to 2× processor count tasks
  • After reaching limit, waits for one task per new task
  • Prevents memory exhaustion on large catalogs (1000+ files)

Cache Lifecycle Management

Initialization

1. Detect system memory
    16 GB Mac  threshold = 12.8 GB

2. Load user settings
    Last setting: 10 GB  use 10 GB

3. Configure NSCache
    Set totalCostLimit = 10,000,000,000 bytes
    Set countLimit = 10,000 items (high, not limiting)

4. Initialize background monitoring
    Start memory pressure checks

5. Log configuration
    "Cache ready: 10GB, ~515 thumbnails"

Cache Invalidation

TriggerActionEffect
Project reloadedClear both cachesFull refresh required
User settings changedResize memory cacheEvictions may occur
Disk cache corruptedDetect, clear, recreateTransparent to user
App backgroundedCompress in memorySlight performance loss
Low memory warningAggressive evictionFrees 1-2 GB

Concurrency Model

Actor-Based Architecture

All extraction systems use Swift actors for thread-safe state:

actor ScanAndCreateThumbnails { }
actor ExtractSonyThumbnail { }
actor ExtractEmbeddedPreview { }
actor DiskCacheManager { }

Benefits:

  • Serial execution prevents data races
  • State mutations are automatically serialized
  • No manual locks required
  • Safe concurrent calls from multiple views

Isolated State

actor ScanAndCreateThumbnails {
    private var successCount = 0
    private var processingTimes: [TimeInterval] = []
    private var totalFilesToProcess = 0
    private var preloadTask: Task<Int, Never>?
}

Concurrent Extraction Without Isolation Violation

ImageIO operations are nonisolated to avoid blocking the actor:

@concurrent
nonisolated func extractSonyThumbnail(from url: URL, maxDimension: CGFloat) async throws -> CGImage {
    try await Task.detached(priority: .userInitiated) {
        let source = CGImageSourceCreateWithURL(url as CFURL, options)
        // ...
    }.value
}

Cancellation Support

func cancelPreload() {
    preloadTask?.cancel()
    preloadTask = nil
}

Error Handling

Extraction Errors

enum ThumbnailError: Error {
    case invalidSource
    case generationFailed
    case decodingFailed
}

Error Recovery

Batch Processing (non-fatal — continues to next file):

do {
    let cgImage = try await ExtractSonyThumbnail().extractSonyThumbnail(from: url, ...)
    storeInMemoryCache(cgImage, for: url)
} catch {
    Logger.process.warning("Failed to extract \(url.lastPathComponent): \(error)")
}

On-Demand Requests (returns nil; UI shows placeholder):

func thumbnail(for url: URL, targetSize: Int) async -> CGImage? {
    do { return try await resolveImage(for: url, targetSize: targetSize) }
    catch { return nil }
}

Performance Characteristics

Typical Timings (Apple Silicon, 40-50 ARW files, 16 GB Mac)

OperationDurationNotes
File discovery<100 msNon-recursive enumeration
Thumbnail generation (1st pass)5-20 sFull extraction
Thumbnail generation (2nd pass)<500 msAll from RAM cache
Disk cache promotion100-500 msLoad + store to RAM
Embedded preview extraction500 ms–2 sJPEG decode + optional resize
Single thumbnail generation200-500 msCPU-bound ARW decode/resize
JPEG export100-300 msDisk write + finalize

Memory Usage per Configuration

ScenarioCache AllocationThumbnail CapacityHit RateUse Case
Light editing5 GB~25760-70%Casual culling
Production10 GB~51570-75%Typical workflow
Professional16 GB~82475-80%Large batches

Quality/Performance Tradeoff

Cost Per Pixel | Memory Per Image | 10 GB Capacity | Quality      | Speed
───────────────────────────────────────────────────────────────────────
4 bytes        | ~15 MB           | ~667           | Good         | Fast
6 bytes        | ~19.4 MB         | ~515           | Excellent    | Balanced
8 bytes        | ~25.8 MB         | ~387           | Outstanding  | Slower

Concurrency Impact

Processor Cores | Max Concurrent Tasks | Benefit
───────────────────────────────────────────────
4-core Mac      | 8 tasks              | 2-3x faster
8-core Mac      | 16 tasks             | 4-6x faster
10-core Mac     | 20 tasks             | 6-8x faster

Monitoring and Diagnostics

Startup Log

[Cache] Initialization Report
────────────────────────────────
System Memory: 16 GB
Memory Threshold: 80% = 12.8 GB
Allocated to Cache: 10 GB
Cost Per Pixel: 6 bytes
Expected Capacity: ~515 thumbnails
Count Limit: 10,000 items (not used as constraint)
LRU Strategy: Enabled
Disk Cache: ~/.RawCull/thumbcache/

✓ Cache initialized and ready

Runtime Statistics

[Cache] Runtime Statistics
──────────────────────────────
Current Usage: 9.87 GB (98.7% of 10 GB)
Thumbnails Cached: 508 items
Memory Hits: 156 | Disk Hits: 68 | Cache Misses: 24 | Evictions: 12
Hit Rate: Memory 70.2% | Disk 29.8%

Configuration Reference

Programmatic Configuration

// File: SharedMemoryCache.swift
let totalCostLimit = 10_000_000_000  // 10 GB in bytes
let costPerPixel = 6                  // bytes per pixel
let countLimit = 10_000               // Very high, not limiting
let memoryThresholdPercent = 80       // 80% of available RAM
let memoryCheckInterval = 0.1         // seconds
let aggressiveEvictionThreshold = 95  // percent of allocation
let normalEvictionThreshold = 80      // percent of allocation

Relevant Source Files

  • SharedMemoryCache.swift — Memory cache configuration
  • SettingsViewModel.swift — Memory threshold and user settings
  • ExtractSonyThumbnail.swift — Quality mapping and thumbnail generation
  • ExtractEmbeddedPreview.swift — Preview thresholds (4320/8640 px)
  • CacheConfig.swift — Cache limit constants

Best Practices

For Users

  1. Match allocation to workflow: 5 GB (8 GB Mac) / 10 GB (16 GB Mac) / 16+ GB (32 GB Mac)
  2. Monitor memory usage: leave 2-3 GB free for system and other apps
  3. Quality settings: 6 bytes/pixel (default); reduce to 4 for more capacity; increase to 8 for highest quality

For Developers

  1. Cache configuration: always query system memory on startup; apply thresholds dynamically
  2. Cost calculations: use realistic estimates; account for ~10% overhead
  3. Eviction handling: implement LRU consistently; monitor frequency (target < 10 evictions per 100 accesses)
  4. Performance profiling: target 70% memory hit rate; profile real-world patterns

Troubleshooting

ProblemCauseSolutions
High eviction rate (> 50%)Allocation too smallIncrease cache allocation; reduce cost per pixel; browse in smaller batches
Low memory hit rate (< 50%)Cache too small or thrashingIncrease allocation; profile access pattern
Disk cache missing thumbnailsCorruption or deletionClear project cache in settings; check disk space and permissions
Memory usage not decreasingEviction not triggeringVerify threshold; check background monitoring; restart app

Data Flow Summary

User initiates bulk thumbnail load
    ↓
[ScanAndCreateThumbnails.preloadCatalog()]
    ├─ Discover files (non-recursive)
    ├─ For each file (concurrency controlled):
    │   ├─ Check RAM cache
    │   │   ✓ HIT (70%): Return immediately
    │   │   ✗ MISS (30%):
    │   ├─ Check disk cache
    │   │   ✓ HIT: Load and promote to RAM
    │   │   ✗ MISS:
    │   ├─ Extract thumbnail:
    │   │   ├─ Open ARW via ImageIO
    │   │   ├─ Generate 2048×1372 thumbnail
    │   │   ├─ Apply quality enhancement (optional)
    │   │   └─ Wrap in NSImage
    │   ├─ Store in RAM (immediate)
    │   └─ Schedule async disk save (background)
    └─ Return success count

On detailed inspection:
    ↓
[JPGPreviewHandler.handle(file)]
    ├─ Check if JPG exists
    │   ✓ YES: Load and display
    │   ✗ NO:
    ├─ Call ExtractEmbeddedPreview
    │   ├─ Find all images in ARW
    │   ├─ Identify widest JPEG
    │   ├─ Decide: downsample or original?
    │   ├─ Decode JPEG
    │   └─ Return CGImage
    └─ Display full preview

Apple Frameworks Used

FrameworkKey APIsPurpose
ImageIOCGImageSource, CGImageDestinationImage decoding, thumbnail generation, embedded preview extraction
CoreGraphicsCGContext, CGImageRendering, resizing, interpolation
AppKitNSImage, NSCacheDisplay-ready images, LRU cache
FoundationURL, ProcessInfoFile operations, system memory query
Concurrencyactors, task groups, async/awaitSafe parallel processing
CryptoKitInsecure.MD5Disk cache filename generation
OSLogLoggerDiagnostics and monitoring

Stress testing

Stress testing

This test demonstrates RawCull’s performance with a moderate-sized catalog. The results show that the application handles this workload efficiently with minimal resource consumption. Time to create 483, 2048 px thumbnails by 6 bytes pr pixel in 51 seconds

Catalog information for 600 images
Time metrics for 600 image catalog

Memory and disk cache to the right are after browsing all 483 images in the filetable. All images are collected from memory.

Catalog information for 600 images
Time metrics for 600 image catalog
Catalog information for 600 images

Heavy Synchronous Code

A Guide to Handling Heavy Synchronous Code in Swift Concurrency

1. The Core Problem: The Swift Cooperative Thread Pool

To understand why heavy synchronous code breaks modern Swift, you have to understand the difference between older Apple code (Grand Central Dispatch / GCD) and new Swift Concurrency.

  • GCD (DispatchQueue) uses a dynamic thread pool. If a thread gets blocked doing heavy work, GCD notices and simply spawns a new thread. This prevents deadlocks but causes Thread Explosion (which drains memory and battery).
  • Swift Concurrency (async/await/Task) uses a fixed-size cooperative thread pool. It strictly limits the number of background threads to exactly the number of CPU cores your device has (e.g., 6 cores = exactly 6 threads). It will never spawn more.

Because there are so few threads, Swift relies on cooperation. When an async function hits an await, it says: “I’m pausing to wait for something. Take my thread and give it to another task!” This allows 6 threads to juggle thousands of concurrent tasks.

The “Choke” (Thread Pool Starvation)

If you run heavy synchronous code (code without await) on the Swift thread pool, it hijacks the thread and refuses to give it back. If you request 6 heavy image extractions at the same time, all 6 Swift threads are paralyzed. Your entire app’s concurrency system freezes until an image finishes. Network requests halt, and background tasks deadlock.


2. What exactly is “Blocking Synchronous Code”?

Synchronous code executes top-to-bottom without ever pausing (it lacks the await keyword). Blocking code is synchronous code that takes a “long time” to finish (usually >10–50 milliseconds), thereby holding a thread hostage.

The 3 Types of Blocking Code:

  1. Heavy CPU-Bound Work: Number crunching, image processing (CoreGraphics, ImageIO), video encoding, parsing massive JSON files.
  2. Synchronous I/O: Reading massive files synchronously (e.g., Data(contentsOf: URL)) or older synchronous database queries. The thread is completely frozen waiting for the hard drive.
  3. Locks and Semaphores: Using DispatchSemaphore.wait() or NSLock intentionally pauses a thread. (Apple strictly forbids these inside Swift Concurrency).

The Checklist to Identify Blocking Code:

Ask yourself these questions about a function:

  1. Does it lack the async keyword in its signature?
  2. Does it lack internal await calls (or await Task.yield())?
  3. Does it take more than a few milliseconds to run?
  4. Is it a “Black Box” from an Apple framework (like ImageIO) or C/C++?

If the answer is Yes, it is blocking synchronous code and does not belong in the Swift Concurrency thread pool.


3. The Traps: Why Task and Actor Don’t Fix It

It is highly intuitive to try and fix blocking code using modern Swift features. However, these common approaches are dangerous traps:

Trap 1: Using Task or Task.detached

// ❌ TRAP: Still causes Thread Pool Starvation!
func extract() async throws -> CGImage {
    return try await Task.detached {
        return try Self.extractSync() // Blocks one of the 6 Swift threads
    }.value
}

Task and Task.detached do not create new background threads. They simply place work onto that same strict 6-thread cooperative pool. It might seem to “work” if you only test one image at a time, but at scale, it will deadlock your app.

Trap 2: Putting it inside an actor

Actors process their work one-by-one to protect state. However, Actors do not have their own dedicated threads. They borrow threads from the cooperative pool. If you run heavy sync code inside an Actor, you cause a Double Whammy:

  1. Thread Pool Starvation: You choked one of the 6 Swift workers.
  2. Actor Starvation: The Actor is locked up and cannot process any other messages until the heavy work finishes.

Trap 3: Using nonisolated

Marking an Actor function as nonisolated just means “this doesn’t touch the Actor’s private state.” It prevents Actor Starvation, but the function still physically runs on the exact same 6-thread pool, causing Thread Pool Starvation.


4. The Correct Solution: The GCD Escape Hatch

Apple’s official stance is that if you have heavy, blocking synchronous code that you cannot modify, Grand Central Dispatch (GCD) is still the correct tool for the job.

By wrapping the work in DispatchQueue.global().async and withCheckedThrowingContinuation, you push the heavy work out of Swift’s strict 6-thread pool and into GCD’s flexible thread pool (which is allowed to spin up extra threads).

This leaves the precious Swift Concurrency threads completely free to continue juggling all the other await tasks in your app.

The Final, Correct Code:

actor ExtractSonyThumbnail {
    
    /// Extract thumbnail using generic ImageIO framework.
    func extractSonyThumbnail(
        from url: URL,
        maxDimension: CGFloat,
        qualityCost: Int = 4
    ) async throws -> CGImage {
        
        // `extractSync` is a heavy synchronous blocking operation.
        // If we run it directly inside this async function, Task, or Task.detached,
        // it will hijack a thread in Swift's limited cooperative thread pool (Thread Pool Starvation).
        // Therefore, we explicitly offload the blocking work to GCD.

        try await withCheckedThrowingContinuation { continuation in
            // Push the heavy work out of Swift Concurrency and into GCD
            DispatchQueue.global(qos: .userInitiated).async {
                do {
                    let image = try Self.extractSync(
                        from: url, 
                        maxDimension: maxDimension, 
                        qualityCost: qualityCost
                    )
                    // Bridge the result back into the Swift Concurrency world
                    continuation.resume(returning: image)
                } catch {
                    continuation.resume(throwing: error)
                }
            }
        }
    }
}

or as enum

import AppKit
import Foundation

enum enumExtractSonyThumbnail {
    /// Extract thumbnail using generic ImageIO framework.
    /// - Parameters:
    ///   - url: The URL of the RAW image file.
    ///   - maxDimension: Maximum pixel size for the longest edge of the thumbnail.
    ///   - qualityCost: Interpolation cost.
    /// - Returns: A `CGImage` thumbnail.
    static func extractSonyThumbnail(
        from url: URL,
        maxDimension: CGFloat,
        qualityCost: Int = 4
    ) async throws -> CGImage {
        // We MUST explicitly hop off the current thread.
        // Since we are an enum and static, we have no isolation of our own.
        // If we don't do this, we run on the caller's thread (the Actor), causing serialization.

        try await withCheckedThrowingContinuation { continuation in
            DispatchQueue.global(qos: .userInitiated).async {
                do {
                    let image = try Self.extractSync(
                        from: url, 
                        maxDimension: maxDimension, 
                        qualityCost: qualityCost
                    )
                    // Bridge the result back into the Swift Concurrency world
                    continuation.resume(returning: image)
                } catch {
                    continuation.resume(throwing: error)
                }
            }
        }
    }

5. The “Modern Swift” Alternative (If you own the code)

If extractSync was your own custom Swift code (and not an opaque framework like ImageIO), the truly “Modern Swift” way to fix it is to rewrite the synchronous loop to be cooperative.

You do this by sprinkling await Task.yield() inside heavy loops to voluntarily give the thread back:

func extractSyncCodeMadeAsync() async -> CGImage {
    for pixelRow in image {
        process(pixelRow)
        
        // Every few rows, pause and let another part of the app use the thread!
        if pixelRow.index % 10 == 0 {
            await Task.yield() 
        }
    }
}

If you can do this, you don’t need DispatchQueue! But if you are using black-box code that you can’t add await to, the GCD Escape Hatch is the perfect, Apple-approved architecture.

Number of files

Numbers updated: February 18, 2026 (version 1.0.3.)

RawCull depends only on the standard Swift and SwiftUI toolchain—no external libraries.

cloc RawCull/RawCull DecodeEncodeGeneric/Sources ParseRsyncOutput/Sources RsyncArguments/Sources  RsyncProcessStreaming/Sources RsyncAnalyse/Sources
      98 text files.
      97 unique files.                              
       8 files ignored.

github.com/AlDanial/cloc v 2.08  T=0.04 s (2515.7 files/s, 315239.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Swift                           96           1649           1496           8980
XML                              1              0              0             30
-------------------------------------------------------------------------------
SUM:                            97           1649           1496           9010
-------------------------------------------------------------------------------

Main Repository

Swift Packages used by RawCull

All packages track the main branch and are updated to latest revisions as of v0.6.1:

  1. RsyncProcessStreaming - Streaming process handler

  2. DecodeEncodeGeneric - Generic JSON codec

  3. ParseRsyncOutput - Rsync output parser

  4. RsyncArguments - Rsync argument builder

  5. RsyncAnalyse - Enhanced rsync output analysis

Security Scoped URLs

Security-scoped URLs are a cornerstone of macOS app sandbox security. RawCull uses them extensively to gain persistent access to user-selected folders and files while maintaining sandbox compliance. This section provides a comprehensive walkthrough of how they work in the application.

What Are Security-Scoped URLs?

A security-scoped URL is a special form of file URL that:

  • Can be created only from user-granted file access (via file pickers or drag-drop)
  • Grants an app temporary or persistent access to files outside the app sandbox
  • Must be explicitly “accessed” and “released” to work properly
  • Can optionally be serialized as a “bookmark” for persistent access

Key API:

// Start accessing a security-scoped URL (required before file operations)
url.startAccessingSecurityScopedResource() -> Bool

// Stop accessing it (must be paired)
url.stopAccessingSecurityScopedResource()

// Serialize for persistent storage
try url.bookmarkData(options: .withSecurityScope, ...)

// Restore from serialized bookmark
let url = try URL(resolvingBookmarkData: bookmarkData, 
                  options: .withSecurityScope, ...)

Architecture in RawCull

RawCull implements a multi-layer security-scoped URL system with two primary workflows:

Layer 1: Initial User Selection (OpencatalogView)

When users select a folder via the file picker, OpencatalogView handles the initial security setup:

File: RawCull/Views/CopyFiles/OpencatalogView.swift

struct OpencatalogView: View {
    @Binding var selecteditem: String
    @State private var isImporting: Bool = false
    let bookmarkKey: String  // e.g., "destBookmark"
    
    var body: some View {
        Button(action: { isImporting = true }) {
            Image(systemName: "folder.fill")
        }
        .fileImporter(isPresented: $isImporting,
                      allowedContentTypes: [.directory],
                      onCompletion: { result in
                          handleFileSelection(result)
                      })
    }
    
    private func handleFileSelection(_ result: Result<URL, Error>) {
        switch result {
        case let .success(url):
            // STEP 1: Start accessing immediately after selection
            guard url.startAccessingSecurityScopedResource() else {
                Logger.process.errorMessageOnly("Failed to start accessing resource")
                return
            }
            
            // STEP 2: Store the path for immediate use
            selecteditem = url.path
            
            // STEP 3: Create and persist bookmark for future launches
            do {
                let bookmarkData = try url.bookmarkData(
                    options: .withSecurityScope,
                    includingResourceValuesForKeys: nil,
                    relativeTo: nil
                )
                // Store bookmark in UserDefaults
                UserDefaults.standard.set(bookmarkData, forKey: bookmarkKey)
                Logger.process.debugMessageOnly("Bookmark saved for key: \(bookmarkKey)")
            } catch {
                Logger.process.warning("Could not create bookmark: \(error)")
            }
            
            // STEP 4: Stop accessing (will be restarted when needed)
            url.stopAccessingSecurityScopedResource()
            
        case let .failure(error):
            Logger.process.errorMessageOnly("File picker error: \(error)")
        }
    }
}

Key Points:

  • ✅ Access/release happen in the same scope (guaranteed cleanup)
  • ✅ Bookmark created while resource is being accessed (more reliable)
  • ✅ Path stored in @Binding for immediate UI feedback
  • ⚠️ Access is briefly held (during bookmark creation), then released

Layer 2: Persistent Restoration (ExecuteCopyFiles)

When the app needs to use previously selected folders, ExecuteCopyFiles restores access from the bookmark:

File: RawCull/Model/ParametersRsync/ExecuteCopyFiles.swift

@Observable @MainActor
final class ExecuteCopyFiles {
    func getAccessedURL(fromBookmarkKey key: String, 
                       fallbackPath: String) -> URL? {
        // STEP 1: Try to restore from bookmark first
        if let bookmarkData = UserDefaults.standard.data(forKey: key) {
            do {
                var isStale = false
                
                // Resolve bookmark with security scope
                let url = try URL(
                    resolvingBookmarkData: bookmarkData,
                    options: .withSecurityScope,
                    relativeTo: nil,
                    bookmarkDataIsStale: &isStale
                )
                
                // STEP 2: Start accessing the resolved URL
                guard url.startAccessingSecurityScopedResource() else {
                    Logger.process.errorMessageOnly(
                        "Failed to start accessing bookmark for \(key)"
                    )
                    return tryFallbackPath(fallbackPath, key: key)
                }
                
                Logger.process.debugMessageOnly(
                    "Successfully resolved bookmark for \(key)"
                )
                
                // Check if bookmark became stale (update if needed)
                if isStale {
                    Logger.process.warning("Bookmark is stale for \(key)")
                    // Optionally refresh bookmark here
                }
                
                return url
                
            } catch {
                Logger.process.errorMessageOnly(
                    "Bookmark resolution failed for \(key): \(error)"
                )
                return tryFallbackPath(fallbackPath, key: key)
            }
        }
        
        // STEP 3: Fallback to direct path access if no bookmark
        return tryFallbackPath(fallbackPath, key: key)
    }
    
    private func tryFallbackPath(_ fallbackPath: String, 
                                key: String) -> URL? {
        Logger.process.warning(
            "No bookmark found for \(key), attempting direct path access"
        )
        
        let fallbackURL = URL(fileURLWithPath: fallbackPath)
        
        // Try direct path access (works if recently accessed)
        guard fallbackURL.startAccessingSecurityScopedResource() else {
            Logger.process.errorMessageOnly(
                "Failed to access fallback path for \(key)"
            )
            return nil
        }
        
        Logger.process.debugMessageOnly(
            "Successfully accessed fallback path for \(key)"
        )
        
        return fallbackURL
    }
}

Key Points:

  • ✅ Tries bookmark first (most reliable)
  • ✅ Falls back to direct path if bookmark fails
  • ✅ Detects stale bookmarks via isStale flag
  • ✅ Starts access only after successful resolution
  • ⚠️ Caller is responsible for stopping access after use

Layer 3: Active File Operations (ScanFiles)

When scanning files, the security-scoped URL access is properly managed:

File: RawCull/Actors/ScanFiles.swift

actor ScanFiles {
    func scanFiles(url: URL) async -> [FileItem] {
        // CRITICAL: Must start access before any file operations
        guard url.startAccessingSecurityScopedResource() else {
            return []
        }
        
        // Guarantee cleanup with defer (Swift best practice)
        defer { url.stopAccessingSecurityScopedResource() }
        
        // Now safe to access files
        let manager = FileManager.default
        let contents = try? manager.contentsOfDirectory(
            at: url,
            includingPropertiesForKeys: [...],
            options: [.skipsHiddenFiles]
        )
        
        // Process contents and return
        return processContents(contents)
    }
}

Key Points:

  • ✅ Uses defer for guaranteed cleanup
  • ✅ Access is granted only during actual file operations
  • ✅ Prevents leaking security-scoped access
  • ✅ Actor isolation ensures thread-safe operations

Complete End-to-End Flow

User selects folder via picker
    ↓
[OpencatalogView]
    1. startAccessingSecurityScopedResource()
    2. Store path in UI binding
    3. Create bookmark from URL
    4. Save bookmark to UserDefaults
    5. stopAccessingSecurityScopedResource()
    ↓
[Later: User initiates copy task]
    ↓
[ExecuteCopyFiles.performCopyTask()]
    1. getAccessedURL(fromBookmarkKey: "destBookmark", ...)
        a. Retrieve bookmark from UserDefaults
        b. URL(resolvingBookmarkData:options:.withSecurityScope)
        c. url.startAccessingSecurityScopedResource()
        d. Return accessed URL (or nil)
    2. Append URL path to rsync arguments
    3. Execute rsync process
    ↓
[ScanFiles.scanFiles()]
    1. url.startAccessingSecurityScopedResource()
    2. defer { url.stopAccessingSecurityScopedResource() }
    3. Scan directory contents
    4. Return file items
    ↓
[After operations complete]
    Access is automatically cleaned up via defer/scope

Security Model

RawCull’s security-scoped URL implementation adheres to Apple’s sandbox guidelines:

AspectImplementationBenefit
User ConsentFiles only accessible after user selection in pickerUser controls what app can access
Persistent AccessBookmarks serialized for cross-launch accessUX: Users don’t re-select folders each launch
Temporary AccessAccess explicitly granted/revoked with start/stopResources properly released after use
Scope Managementdefer ensures cleanup even on errorsPrevents resource leaks
Fallback StrategyDirect path access if bookmark failsGraceful degradation
Audit TrailOSLog captures all access attemptsSecurity debugging and compliance

Error Handling & Resilience

The implementation handles three failure modes:

1. Bookmark Stale (User moved folder)

if isStale {
    Logger.process.warning("Bookmark is stale for \(key)")
    // Could refresh by having user re-select
    // Or use fallback path
}

2. Bookmark Resolution Fails

} catch {
    Logger.process.errorMessageOnly(
        "Bookmark resolution failed: \(error)"
    )
    return tryFallbackPath(...)  // Try direct access instead
}

3. Direct Access Denied

guard url.startAccessingSecurityScopedResource() else {
    Logger.process.errorMessageOnly("Failed to start accessing")
    return nil  // Operation cannot proceed
}

Best Practices Demonstrated

  1. Always pair start/stop calls

    • Use defer for guaranteed cleanup
    • Never leave access “hanging”
  2. Handle both paths (bookmark + fallback)

    • Bookmarks are primary (persistent)
    • Fallback ensures resilience
  3. Log access attempts

    • Enables security auditing
    • Helps with debugging user issues
  4. Check return values

    • startAccessingSecurityScopedResource() can fail
    • Always guard the return value
  5. Detect stale bookmarks

    • Use bookmarkDataIsStale to detect moved files
    • Can trigger user re-selection

Future Improvements

  1. Refresh Stale Bookmarks

    • When isStale is detected, prompt user to reselect
    • Automatically create new bookmark
  2. Bookmark Management UI

    • Show all bookmarked folders
    • Allow users to revoke/refresh bookmarks
    • Display bookmark creation date
  3. Access Duration Tracking

    • Monitor how long URLs remain accessed
    • Alert on unusually long access durations
  4. Batch Operations

    • Consider shared access context for multiple files
    • Reduce start/stop overhead for bulk operations

Compiling RawCull

Overview

The easiest method is by using the included Makefile. The default make in /usr/bin/make does the job.

Compile by make

If you have an Apple Developer account, you should open the RawCull project and replace the Signing & Capabilities section with your own Apple Developer ID before using make and the procedure outlined below.

The use of the make command necessitates the application-specific password. There are two commands available for use with make: one creates a release build exclusively for RawCull, while the other generates a signed version that includes a DMG file.

If only utilizing the make archive command did not necessitate the application-specific password, it would suffice to update only the Signing & Capabilities section. It is highly probable that the make archive command will still function even if set to Sign to Run Locally.

To create a DMG file, the make command is dependent on the create-dmg tool. The instructions for create-dmg are included in the Makefile. Ensure that the fork of create-dmg is on the same level as the fork of RawCull. Before using make, create and store an app-specific password.

The following procedure creates and stores an app-specific password:

  1. Visit appleid.apple.com and log in with your Apple ID.
  2. Navigate to the Sign-In and Security section and select App-Specific Passwords → Generate an App-Specific Password.
  3. Provide a label to help identify the purpose of the password (e.g., notarytool).
  4. Click Create. The password will be displayed once; copy it and store it securely.

After creating the app-specific password, execute the following command and follow the prompts:

xcrun notarytool store-credentials --apple-id "youremail@gmail.com" --team-id "A1B2C3D4E5"

  • Replace youremail@gmail.com and A1B2C3D4E5 with your actual credentials.

Name the app-specific password RawCull (in appleid.apple.com) and set Profile name: RawCull when executing the above command.

The following dialog will appear:

This process stores your credentials securely in the Keychain. You reference these credentials later using a profile name.

Profile name:
RawCull
App-specific password for youremail@gmail.com: 
Validating your credentials...
Success. Credentials validated.
Credentials saved to Keychain.
To use them, specify `--keychain-profile "RawCull"`

Following the above steps, the following make commands are available from the root of RawCull’s source catalog:

  • make - will generate a signed and notified DMG file including the release version of RawCull.
  • make archive - will produce a release version, removing all debug information, without signing within the build catalog.
  • make clean - will delete all build data.

Compile by Xcode

If you have an Apple Developer account, use your Apple Developer ID in Xcode.

Apple Developer account

Open the RawCull project by Xcode. Choose the top level of the project, and select the tab Signing & Capabilities. Replace Team with your team.

No Apple Developer account

As above, but choose in Signing Certificate to Sign to Run Locally.

To compile or run

Use Xcode for run, debug or build. You choose.