Skip to content

Incremental Materialized View Refresh

Version: 3.4.0 Status: Implemented Author: HeliosDB Team Date: 2025-12-25

Overview

Incremental computation for materialized views enables efficient delta-based updates instead of full recomputation. This feature dramatically improves refresh performance by processing only the changes (deltas) to base tables rather than rescanning the entire dataset.

New in v3.4.0

REFRESH MATERIALIZED VIEW INCREMENTALLY

Explicit SQL syntax to request incremental refresh:

-- Request incremental refresh explicitly
REFRESH MATERIALIZED VIEW sales_summary INCREMENTALLY;

-- Falls back to full refresh if incremental not possible

ALTER MATERIALIZED VIEW SET

Configure MV options after creation:

-- Enable incremental refresh strategy
ALTER MATERIALIZED VIEW sales_summary SET (refresh_strategy = 'incremental');

-- Configure staleness threshold (seconds) and CPU limits
ALTER MATERIALIZED VIEW sales_summary SET (
    staleness_threshold = 1800,
    max_cpu_percent = 25,
    incremental_enabled = true
);

-- Set refresh priority (0=low, 1=medium, 2=high, 3=critical)
ALTER MATERIALIZED VIEW sales_summary SET (priority = 2);

AutoRefreshWorker Integration

The AutoRefreshWorker now integrates with the incremental refresh system: - CPU-aware scheduling prevents system overload - Priority-based refresh queue - Automatic strategy selection (incremental vs full) - Delta tracking cleanup after successful refresh

Key Features

1. Delta Tracking

The system tracks all changes (inserts, updates, deletes) to base tables:

use heliosdb_lite::storage::{DeltaTracker, DeltaOperation};

// Track changes to base tables
tracker.record_insert("users", tuple, timestamp);
tracker.record_update("users", old_tuple, new_tuple, timestamp);
tracker.record_delete("users", tuple, timestamp);

// Retrieve changes since last refresh
let deltas = tracker.get_deltas_since("users", last_refresh_ts);

2. Incremental Refresh Strategies

Three refresh strategies are supported:

  • Full: Complete recomputation from base tables
  • Incremental: Delta-based updates only
  • Hybrid: Automatic selection based on cost estimation

3. Cost-Based Optimization

The system automatically estimates costs and recommends the optimal strategy:

use heliosdb_lite::storage::IncrementalRefresher;

let refresher = IncrementalRefresher::new(storage, delta_tracker);
let cost = refresher.estimate_refresh_cost(&mv_metadata)?;

println!("Incremental cost: {:.2}s", cost.incremental_cost);
println!("Full refresh cost: {:.2}s", cost.full_cost);
println!("Recommendation: {:?}", cost.recommendation);

4. Supported Operations

Aggregate Views

Efficiently update aggregates without full recomputation:

  • COUNT: Increment/decrement counters
  • SUM: Add/subtract values
  • AVG: Update running averages
  • MIN/MAX: Track extrema (with validation)
CREATE MATERIALIZED VIEW sales_summary AS
SELECT product_id,
       COUNT(*) as order_count,
       SUM(amount) as total_sales,
       AVG(amount) as avg_sale
FROM orders
GROUP BY product_id;

-- Incremental refresh applies deltas efficiently
REFRESH MATERIALIZED VIEW sales_summary;

Filter Views

Apply filter predicates to deltas:

CREATE MATERIALIZED VIEW active_users AS
SELECT * FROM users
WHERE status = 'active' AND last_login > NOW() - INTERVAL '30 days';

-- Only processes changed rows
REFRESH MATERIALIZED VIEW active_users;

Join Views

Match deltas against join partners:

CREATE MATERIALIZED VIEW user_orders AS
SELECT u.name, o.order_id, o.amount
FROM users u
JOIN orders o ON u.user_id = o.user_id;

-- Processes deltas from both sides efficiently
REFRESH MATERIALIZED VIEW user_orders;

Architecture

Delta Tracking System

┌──────────────┐
│ Base Table   │
│  (users)     │
└──────┬───────┘
       │ Changes (Insert/Update/Delete)
┌──────────────────────┐
│   Delta Tracker      │
│                      │
│  ┌────────────────┐  │
│  │ Delta Queue    │  │
│  │ - Insert ops   │  │
│  │ - Update ops   │  │
│  │ - Delete ops   │  │
│  └────────────────┘  │
└──────┬───────────────┘
       │ Get deltas since last refresh
┌──────────────────────┐
│ Incremental          │
│ Refresher            │
│                      │
│  ┌────────────────┐  │
│  │ Apply Deltas   │  │
│  │ - Aggregates   │  │
│  │ - Filters      │  │
│  │ - Joins        │  │
│  └────────────────┘  │
└──────┬───────────────┘
       │ Updated results
┌──────────────────────┐
│ Materialized View    │
│   (refreshed)        │
└──────────────────────┘

Cost Estimation Algorithm

// Heuristic-based cost model
fn estimate_cost(&self, mv: &MV) -> Cost {
    let delta_count = tracker.count_deltas(mv.base_tables);
    let mv_size = count_tuples(mv.name);
    let base_size = count_tuples(mv.base_tables[0]);

    // Cost factors (configurable)
    let incremental = delta_count * DELTA_COST;  // 1ms per delta
    let full = base_size * BASE_SCAN_COST;       // 10ms per row

    if incremental < full * 0.5 {
        Recommend::Incremental
    } else {
        Recommend::Full
    }
}

Performance Characteristics

Incremental vs Full Refresh

Dataset Size Delta Ratio Incremental Time Full Refresh Time Speedup
10K rows 1% 10ms 100ms 10x
100K rows 1% 100ms 1,000ms 10x
1M rows 1% 1s 10s 10x
10K rows 50% 500ms 100ms 0.2x

Key Insights:

  • Incremental refresh excels when delta ratio is low (<10%)
  • Full refresh is better when most of the dataset changes
  • Cost estimation automatically selects the optimal strategy

Memory Usage

  • Delta Tracking: O(deltas) - Only stores changes since last refresh
  • Incremental Refresh: O(deltas + groups) - Proportional to changes
  • Full Refresh: O(base_table_size) - Must process entire dataset

API Reference

DeltaTracker

pub struct DeltaTracker {
    // Track changes to base tables
}

impl DeltaTracker {
    pub fn new(storage: Arc<StorageEngine>) -> Self;

    pub fn record_insert(&self, table: &str, tuple: Tuple, ts: u64);
    pub fn record_update(&self, table: &str, old: Tuple, new: Tuple, ts: u64);
    pub fn record_delete(&self, table: &str, tuple: Tuple, ts: u64);

    pub fn get_deltas_since(&self, table: &str, since: u64) -> Vec<Delta>;
    pub fn count_deltas_since(&self, tables: &[String], since: u64) -> Result<usize>;

    pub fn clear_deltas_until(&self, table: &str, until: u64);
    pub fn clear_all_deltas(&self, table: &str);
}

IncrementalRefresher

pub struct IncrementalRefresher {
    // Performs incremental refresh operations
}

impl IncrementalRefresher {
    pub fn new(
        storage: Arc<StorageEngine>,
        delta_tracker: Arc<DeltaTracker>
    ) -> Self;

    pub fn refresh_incremental(&self, mv_name: &str) -> Result<RefreshResult>;

    pub fn can_refresh_incrementally(&self, mv: &MVMetadata) -> Result<bool>;

    pub fn estimate_refresh_cost(&self, mv: &MVMetadata) -> Result<RefreshCost>;
}

RefreshResult

pub struct RefreshResult {
    pub strategy_used: RefreshStrategy,
    pub rows_inserted: usize,
    pub rows_updated: usize,
    pub rows_deleted: usize,
    pub duration: Duration,
}

Usage Examples

Basic Incremental Refresh

use heliosdb_lite::storage::{IncrementalRefresher, DeltaTracker};

// Setup
let storage = Arc::new(StorageEngine::open("data", &config)?);
let tracker = Arc::new(DeltaTracker::new(Arc::clone(&storage)));
let refresher = IncrementalRefresher::new(storage, tracker);

// Track changes
tracker.record_insert("orders", new_order, timestamp);
tracker.record_update("orders", old, new, timestamp);

// Refresh incrementally
let result = refresher.refresh_incremental("sales_summary")?;

println!("Refresh completed in {:?}", result.duration);
println!("  Inserted: {} rows", result.rows_inserted);
println!("  Updated: {} rows", result.rows_updated);
println!("  Deleted: {} rows", result.rows_deleted);

Cost-Based Refresh

// Estimate cost before refresh
let cost = refresher.estimate_refresh_cost(&mv_metadata)?;

match cost.recommendation {
    RefreshStrategy::Incremental => {
        println!("Using incremental refresh ({}s vs {}s)",
                 cost.incremental_cost, cost.full_cost);
        refresher.refresh_incremental(mv_name)?;
    }
    RefreshStrategy::Full => {
        println!("Using full refresh (cheaper than incremental)");
        // Use standard REFRESH MATERIALIZED VIEW
    }
    RefreshStrategy::Hybrid => {
        // Try incremental, fallback to full if needed
    }
}

Automatic Scheduling

use heliosdb_lite::storage::MVScheduler;

// Configure automatic refresh with incremental computation
let mut scheduler = MVScheduler::new(storage, SchedulerConfig {
    enabled: true,
    check_interval_seconds: 60,
    use_incremental: true,  // Enable incremental refresh
    ..Default::default()
});

scheduler.start()?;

// Scheduler automatically:
// 1. Checks staleness of materialized views
// 2. Estimates cost (incremental vs full)
// 3. Selects optimal strategy
// 4. Refreshes views in priority order

Limitations

Current Limitations (v2.3.0)

  1. MIN/MAX Aggregates: Require validation when deleted value is extremum
  2. Complex Joins: Multi-way joins not fully optimized
  3. Subqueries: Not yet supported for incremental refresh
  4. Window Functions: Not supported incrementally

Workarounds

For unsupported operations, the system automatically falls back to full refresh:

if !refresher.can_refresh_incrementally(&mv)? {
    // Automatically use full refresh
    executor.execute_refresh_full(mv_name)?;
}

Configuration

Delta Tracker Settings

// Configure delta retention policy
tracker.set_retention_policy(RetentionPolicy {
    max_age_seconds: 3600,        // Keep deltas for 1 hour
    max_deltas_per_table: 100000, // Max 100K deltas per table
    auto_compact: true,            // Compact old deltas
});

Cost Estimation Tuning

// Adjust cost model parameters
refresher.set_cost_parameters(CostParameters {
    delta_cost_ms: 1.0,      // 1ms per delta operation
    base_scan_cost_ms: 10.0, // 10ms per base table row
    threshold_ratio: 0.5,    // Recommend incremental if <50% of full cost
});

Best Practices

1. Regular Cleanup

Clear old deltas to prevent unbounded memory growth:

// Clear deltas after successful refresh
let last_refresh_ts = mv_metadata.last_refresh.timestamp();
tracker.clear_deltas_until(&table_name, last_refresh_ts);

2. Monitor Delta Growth

Track delta accumulation to prevent performance degradation:

let delta_count = tracker.count_deltas_since(&tables, last_refresh)?;
if delta_count > THRESHOLD {
    warn!("High delta count: {}, consider full refresh", delta_count);
}

3. Validate Correctness

Periodically validate incremental results against full refresh:

#[cfg(test)]
fn validate_incremental_correctness() {
    let incremental_result = refresher.refresh_incremental(mv_name)?;
    let full_result = executor.refresh_full(mv_name)?;

    assert_eq!(incremental_result.row_count, full_result.row_count);
    // Compare actual data...
}

4. Choose Appropriate Refresh Interval

Balance freshness vs overhead:

// High-frequency updates -> longer intervals
scheduler.set_interval(mv_name, Duration::from_secs(300)); // 5 min

// Low-frequency updates -> shorter intervals
scheduler.set_interval(mv_name, Duration::from_secs(60)); // 1 min

Testing

Unit Tests

cargo test mv_incremental --lib

Integration Tests

cargo test mv_incremental_test --test mv_incremental_test

Benchmarks

cargo bench mv_incremental_bench

Expected results: - Incremental refresh: 10-100x faster for small delta ratios - Cost estimation: <1ms overhead - Delta tracking: O(1) per operation

Future Enhancements

Planned for v2.4.0

  1. Materialized View Dependencies: Cascade incremental refresh
  2. Parallel Delta Processing: Multi-threaded delta application
  3. Delta Compression: Reduce memory footprint
  4. Smart Batching: Group deltas for better cache locality

Under Research

  1. Streaming Incremental Refresh: Real-time view updates
  2. Approximate Incremental Refresh: Trade accuracy for speed
  3. ML-Based Cost Models: Learned cost estimation

References

Changelog

v3.4.0 (2025-12-25)

  • Added REFRESH MATERIALIZED VIEW ... INCREMENTALLY SQL syntax
  • Added ALTER MATERIALIZED VIEW ... SET for post-creation configuration
  • Integrated AutoRefreshWorker with incremental refresh system
  • Added CPU-aware scheduling and priority queue
  • Improved delta tracking cleanup after refresh
  • Comprehensive integration tests for new features

v2.3.0 (2025-01-24)

  • Initial implementation of incremental computation
  • Delta tracking system
  • Cost-based strategy selection
  • Support for aggregates, filters, and joins
  • Comprehensive test suite and benchmarks