Incremental Materialized View Refresh¶
Version: 3.4.0 Status: Implemented Author: HeliosDB Team Date: 2025-12-25
Overview¶
Incremental computation for materialized views enables efficient delta-based updates instead of full recomputation. This feature dramatically improves refresh performance by processing only the changes (deltas) to base tables rather than rescanning the entire dataset.
New in v3.4.0¶
REFRESH MATERIALIZED VIEW INCREMENTALLY¶
Explicit SQL syntax to request incremental refresh:
-- Request incremental refresh explicitly
REFRESH MATERIALIZED VIEW sales_summary INCREMENTALLY;
-- Falls back to full refresh if incremental not possible
ALTER MATERIALIZED VIEW SET¶
Configure MV options after creation:
-- Enable incremental refresh strategy
ALTER MATERIALIZED VIEW sales_summary SET (refresh_strategy = 'incremental');
-- Configure staleness threshold (seconds) and CPU limits
ALTER MATERIALIZED VIEW sales_summary SET (
staleness_threshold = 1800,
max_cpu_percent = 25,
incremental_enabled = true
);
-- Set refresh priority (0=low, 1=medium, 2=high, 3=critical)
ALTER MATERIALIZED VIEW sales_summary SET (priority = 2);
AutoRefreshWorker Integration¶
The AutoRefreshWorker now integrates with the incremental refresh system: - CPU-aware scheduling prevents system overload - Priority-based refresh queue - Automatic strategy selection (incremental vs full) - Delta tracking cleanup after successful refresh
Key Features¶
1. Delta Tracking¶
The system tracks all changes (inserts, updates, deletes) to base tables:
use heliosdb_lite::storage::{DeltaTracker, DeltaOperation};
// Track changes to base tables
tracker.record_insert("users", tuple, timestamp);
tracker.record_update("users", old_tuple, new_tuple, timestamp);
tracker.record_delete("users", tuple, timestamp);
// Retrieve changes since last refresh
let deltas = tracker.get_deltas_since("users", last_refresh_ts);
2. Incremental Refresh Strategies¶
Three refresh strategies are supported:
- Full: Complete recomputation from base tables
- Incremental: Delta-based updates only
- Hybrid: Automatic selection based on cost estimation
3. Cost-Based Optimization¶
The system automatically estimates costs and recommends the optimal strategy:
use heliosdb_lite::storage::IncrementalRefresher;
let refresher = IncrementalRefresher::new(storage, delta_tracker);
let cost = refresher.estimate_refresh_cost(&mv_metadata)?;
println!("Incremental cost: {:.2}s", cost.incremental_cost);
println!("Full refresh cost: {:.2}s", cost.full_cost);
println!("Recommendation: {:?}", cost.recommendation);
4. Supported Operations¶
Aggregate Views¶
Efficiently update aggregates without full recomputation:
- COUNT: Increment/decrement counters
- SUM: Add/subtract values
- AVG: Update running averages
- MIN/MAX: Track extrema (with validation)
CREATE MATERIALIZED VIEW sales_summary AS
SELECT product_id,
COUNT(*) as order_count,
SUM(amount) as total_sales,
AVG(amount) as avg_sale
FROM orders
GROUP BY product_id;
-- Incremental refresh applies deltas efficiently
REFRESH MATERIALIZED VIEW sales_summary;
Filter Views¶
Apply filter predicates to deltas:
CREATE MATERIALIZED VIEW active_users AS
SELECT * FROM users
WHERE status = 'active' AND last_login > NOW() - INTERVAL '30 days';
-- Only processes changed rows
REFRESH MATERIALIZED VIEW active_users;
Join Views¶
Match deltas against join partners:
CREATE MATERIALIZED VIEW user_orders AS
SELECT u.name, o.order_id, o.amount
FROM users u
JOIN orders o ON u.user_id = o.user_id;
-- Processes deltas from both sides efficiently
REFRESH MATERIALIZED VIEW user_orders;
Architecture¶
Delta Tracking System¶
┌──────────────┐
│ Base Table │
│ (users) │
└──────┬───────┘
│ Changes (Insert/Update/Delete)
▼
┌──────────────────────┐
│ Delta Tracker │
│ │
│ ┌────────────────┐ │
│ │ Delta Queue │ │
│ │ - Insert ops │ │
│ │ - Update ops │ │
│ │ - Delete ops │ │
│ └────────────────┘ │
└──────┬───────────────┘
│ Get deltas since last refresh
▼
┌──────────────────────┐
│ Incremental │
│ Refresher │
│ │
│ ┌────────────────┐ │
│ │ Apply Deltas │ │
│ │ - Aggregates │ │
│ │ - Filters │ │
│ │ - Joins │ │
│ └────────────────┘ │
└──────┬───────────────┘
│ Updated results
▼
┌──────────────────────┐
│ Materialized View │
│ (refreshed) │
└──────────────────────┘
Cost Estimation Algorithm¶
// Heuristic-based cost model
fn estimate_cost(&self, mv: &MV) -> Cost {
let delta_count = tracker.count_deltas(mv.base_tables);
let mv_size = count_tuples(mv.name);
let base_size = count_tuples(mv.base_tables[0]);
// Cost factors (configurable)
let incremental = delta_count * DELTA_COST; // 1ms per delta
let full = base_size * BASE_SCAN_COST; // 10ms per row
if incremental < full * 0.5 {
Recommend::Incremental
} else {
Recommend::Full
}
}
Performance Characteristics¶
Incremental vs Full Refresh¶
| Dataset Size | Delta Ratio | Incremental Time | Full Refresh Time | Speedup |
|---|---|---|---|---|
| 10K rows | 1% | 10ms | 100ms | 10x |
| 100K rows | 1% | 100ms | 1,000ms | 10x |
| 1M rows | 1% | 1s | 10s | 10x |
| 10K rows | 50% | 500ms | 100ms | 0.2x |
Key Insights:
- Incremental refresh excels when delta ratio is low (<10%)
- Full refresh is better when most of the dataset changes
- Cost estimation automatically selects the optimal strategy
Memory Usage¶
- Delta Tracking: O(deltas) - Only stores changes since last refresh
- Incremental Refresh: O(deltas + groups) - Proportional to changes
- Full Refresh: O(base_table_size) - Must process entire dataset
API Reference¶
DeltaTracker¶
pub struct DeltaTracker {
// Track changes to base tables
}
impl DeltaTracker {
pub fn new(storage: Arc<StorageEngine>) -> Self;
pub fn record_insert(&self, table: &str, tuple: Tuple, ts: u64);
pub fn record_update(&self, table: &str, old: Tuple, new: Tuple, ts: u64);
pub fn record_delete(&self, table: &str, tuple: Tuple, ts: u64);
pub fn get_deltas_since(&self, table: &str, since: u64) -> Vec<Delta>;
pub fn count_deltas_since(&self, tables: &[String], since: u64) -> Result<usize>;
pub fn clear_deltas_until(&self, table: &str, until: u64);
pub fn clear_all_deltas(&self, table: &str);
}
IncrementalRefresher¶
pub struct IncrementalRefresher {
// Performs incremental refresh operations
}
impl IncrementalRefresher {
pub fn new(
storage: Arc<StorageEngine>,
delta_tracker: Arc<DeltaTracker>
) -> Self;
pub fn refresh_incremental(&self, mv_name: &str) -> Result<RefreshResult>;
pub fn can_refresh_incrementally(&self, mv: &MVMetadata) -> Result<bool>;
pub fn estimate_refresh_cost(&self, mv: &MVMetadata) -> Result<RefreshCost>;
}
RefreshResult¶
pub struct RefreshResult {
pub strategy_used: RefreshStrategy,
pub rows_inserted: usize,
pub rows_updated: usize,
pub rows_deleted: usize,
pub duration: Duration,
}
Usage Examples¶
Basic Incremental Refresh¶
use heliosdb_lite::storage::{IncrementalRefresher, DeltaTracker};
// Setup
let storage = Arc::new(StorageEngine::open("data", &config)?);
let tracker = Arc::new(DeltaTracker::new(Arc::clone(&storage)));
let refresher = IncrementalRefresher::new(storage, tracker);
// Track changes
tracker.record_insert("orders", new_order, timestamp);
tracker.record_update("orders", old, new, timestamp);
// Refresh incrementally
let result = refresher.refresh_incremental("sales_summary")?;
println!("Refresh completed in {:?}", result.duration);
println!(" Inserted: {} rows", result.rows_inserted);
println!(" Updated: {} rows", result.rows_updated);
println!(" Deleted: {} rows", result.rows_deleted);
Cost-Based Refresh¶
// Estimate cost before refresh
let cost = refresher.estimate_refresh_cost(&mv_metadata)?;
match cost.recommendation {
RefreshStrategy::Incremental => {
println!("Using incremental refresh ({}s vs {}s)",
cost.incremental_cost, cost.full_cost);
refresher.refresh_incremental(mv_name)?;
}
RefreshStrategy::Full => {
println!("Using full refresh (cheaper than incremental)");
// Use standard REFRESH MATERIALIZED VIEW
}
RefreshStrategy::Hybrid => {
// Try incremental, fallback to full if needed
}
}
Automatic Scheduling¶
use heliosdb_lite::storage::MVScheduler;
// Configure automatic refresh with incremental computation
let mut scheduler = MVScheduler::new(storage, SchedulerConfig {
enabled: true,
check_interval_seconds: 60,
use_incremental: true, // Enable incremental refresh
..Default::default()
});
scheduler.start()?;
// Scheduler automatically:
// 1. Checks staleness of materialized views
// 2. Estimates cost (incremental vs full)
// 3. Selects optimal strategy
// 4. Refreshes views in priority order
Limitations¶
Current Limitations (v2.3.0)¶
- MIN/MAX Aggregates: Require validation when deleted value is extremum
- Complex Joins: Multi-way joins not fully optimized
- Subqueries: Not yet supported for incremental refresh
- Window Functions: Not supported incrementally
Workarounds¶
For unsupported operations, the system automatically falls back to full refresh:
if !refresher.can_refresh_incrementally(&mv)? {
// Automatically use full refresh
executor.execute_refresh_full(mv_name)?;
}
Configuration¶
Delta Tracker Settings¶
// Configure delta retention policy
tracker.set_retention_policy(RetentionPolicy {
max_age_seconds: 3600, // Keep deltas for 1 hour
max_deltas_per_table: 100000, // Max 100K deltas per table
auto_compact: true, // Compact old deltas
});
Cost Estimation Tuning¶
// Adjust cost model parameters
refresher.set_cost_parameters(CostParameters {
delta_cost_ms: 1.0, // 1ms per delta operation
base_scan_cost_ms: 10.0, // 10ms per base table row
threshold_ratio: 0.5, // Recommend incremental if <50% of full cost
});
Best Practices¶
1. Regular Cleanup¶
Clear old deltas to prevent unbounded memory growth:
// Clear deltas after successful refresh
let last_refresh_ts = mv_metadata.last_refresh.timestamp();
tracker.clear_deltas_until(&table_name, last_refresh_ts);
2. Monitor Delta Growth¶
Track delta accumulation to prevent performance degradation:
let delta_count = tracker.count_deltas_since(&tables, last_refresh)?;
if delta_count > THRESHOLD {
warn!("High delta count: {}, consider full refresh", delta_count);
}
3. Validate Correctness¶
Periodically validate incremental results against full refresh:
#[cfg(test)]
fn validate_incremental_correctness() {
let incremental_result = refresher.refresh_incremental(mv_name)?;
let full_result = executor.refresh_full(mv_name)?;
assert_eq!(incremental_result.row_count, full_result.row_count);
// Compare actual data...
}
4. Choose Appropriate Refresh Interval¶
Balance freshness vs overhead:
// High-frequency updates -> longer intervals
scheduler.set_interval(mv_name, Duration::from_secs(300)); // 5 min
// Low-frequency updates -> shorter intervals
scheduler.set_interval(mv_name, Duration::from_secs(60)); // 1 min
Testing¶
Unit Tests¶
Integration Tests¶
Benchmarks¶
Expected results: - Incremental refresh: 10-100x faster for small delta ratios - Cost estimation: <1ms overhead - Delta tracking: O(1) per operation
Future Enhancements¶
Planned for v2.4.0¶
- Materialized View Dependencies: Cascade incremental refresh
- Parallel Delta Processing: Multi-threaded delta application
- Delta Compression: Reduce memory footprint
- Smart Batching: Group deltas for better cache locality
Under Research¶
- Streaming Incremental Refresh: Real-time view updates
- Approximate Incremental Refresh: Trade accuracy for speed
- ML-Based Cost Models: Learned cost estimation
References¶
Changelog¶
v3.4.0 (2025-12-25)¶
- Added
REFRESH MATERIALIZED VIEW ... INCREMENTALLYSQL syntax - Added
ALTER MATERIALIZED VIEW ... SETfor post-creation configuration - Integrated AutoRefreshWorker with incremental refresh system
- Added CPU-aware scheduling and priority queue
- Improved delta tracking cleanup after refresh
- Comprehensive integration tests for new features
v2.3.0 (2025-01-24)¶
- Initial implementation of incremental computation
- Delta tracking system
- Cost-based strategy selection
- Support for aggregates, filters, and joins
- Comprehensive test suite and benchmarks